Internet-Exchange-LevelAdversaries
StevenJ.MurdochandPiotrZieli´nski
UniversityofCambridge,ComputerLaboratoryhttp://www.cl.cam.ac.uk/users/{sjm217,pz215}
Abstract.Existinglow-latencyanonymitynetworksarevulnerabletotrafficanalysis,solocationdiversityofnodesisessentialtodefendagainstattacks.Previousworkhasshownthatsimplyensuringgeographicaldi-versityofnodesdoesnotresist,andinsomecasesexacerbates,theriskoftrafficanalysisbyISPs.Ensuringhighautonomous-system(AS)diver-sitycanresistthisweakness.However,ISPscommonlyconnecttomanyotherISPsinasinglelocation,knownasanInterneteXchange(IX).ThispapershowsthatIXesareasinglepointwheretrafficanalysiscanbeperformed.Weexaminetowhatextentthisistrue,throughacasestudyofTornodesintheUK.Also,someIXessamplepacketsflowingthroughthemforperformanceanalysisreasons,andthisdatacouldbeexploitedtode-anonymizetraffic.WethendevelopandevaluateBayesiantrafficanalysistechniquescapableofprocessingthissampleddata.
1Introduction
Anonymitynetworksmaybesplitintotwocategories:highlatency(e.g.Mixmin-ion[1]andMixmaster[2])andlowlatency(e.g.Tor[3],JAP[4]andFree-dom[5]).Highlatencynetworksmaydelaymessagesforseveraldays[6]butaredesignedtoresistverypowerfulattackerswhichareassumedtobecapa-bleofmonitoringallcommunicationlinks,socalledglobalpassiveadversaries.However,thelongpotentialdelaymakesthesesystemsinappropriateforpop-ularactivitiessuchasweb-browsing,wherelow-latencyisrequired.Although,inlow-latencyanonymitynetworks,communicationsareencryptedtomaintainbitwise-unlinkability,timingpatternsarehardlydistorted,allowinganattackertodeploytrafficanalysistode-anonymizeusers[7,8,9].Whiletechniquestoresisttrafficanalysishavebeenproposed,suchaslinkpadding[10],theircostishighandtheyhavenotbeenincorporatedintodeployednetworks.
Instead,thesesystemshavereliedontheassumptionthattheglobalpassiveadversaryisunrealistic,oratleastthosewhoarethetargetofsuchadversarieshavelargerproblemsthananonymousInternetaccess.Butevenexcludingtheglobalpassiveadversary,thepossibilityofpartialadversariesremainsreason-able.TheseattackershavetheabilitytomonitoraportionofInternettrafficbutnottheentirety.Distributedlow-latencyanonymitysystems,suchasTor,aimtoresistthistypeofadversarybydistributingnodes,inthehopethatconnec-tionsthroughthenetworkwillpassthroughenoughadministrativedomainstopreventasingleentityfromtrackingusers.
Thisraisesthequestionofhowtoselectpathsthroughtheanonymitynet-worktomaximizetrafficanalysisresistance.Section2discussesdifferenttopol-ogymodelsoftheInternetandtheirimpactonpathselection.Wesuggestthatexistingmodels,basedonAutonomousSystem(AS)diversity,donotproperlytakeaccountofthefactthatwhile,attheASlevelabstraction,apathmayhavegoodadministrativedomaindiversity,physicallyitcouldrepeatedlypassthroughthesameInterneteXchange(IX).Section3establishes,basedonIn-ternettopologymeasurements,towhatextenttheToranonymitynetworkisvulnerabletotrafficanalysisatIXes.
Section4describeshowIXesareparticularlyrelevantsince,toassistloadmanagement,theyrecordtrafficdatafromthepacketsbeingsentthroughthem.Asaggregatestatisticsarerequiredandthecostofrecordingfulltrafficwouldbeprohibitive,onlysampleddataisstored.Hence,thequalityofdataissubstan-tiallypoorerthanwasenvisagedduringthedesignandevaluationofprevioustrafficanalysistechniques.Section5showsthat,despitelowsamplingrates,thisdataisadequateforde-anonymizingusersoflow-latencyanonymitynetworks.Finally,Section6discussesfurtheravenuesofresearchunderinvestigation.
2LocationDiversityinAnonymityNetworks
Torhasbeenlongsuspected,andlaterconfirmed[11,12],tobevulnerabletoanattackerwhocouldobserveboththeentryandexitpointofaconnectionthroughananonymitynetwork.Asnointentionallatencyisintroduced,timingpatternspropagatethroughthenetworkandmaybeusedtocorrelateinputandoutputtraffic,allowinganattackertotrackconnectionendpoints.
Delayingmessages,asdonewithemailanonymitysystems,wouldimproveresistancetotheseattacks,atleastforasmallnumberofmessages.However,theadditionallatencyhere(hourstodays)would,ifappliedtowebbrowsing,determostusersandsodecreaseanonymityfortheremainder[13].Inadditiontothescarcebandwidthinavolunteernetwork,fulllink-paddingwouldalsointroducecatastrophicdenialofservicevulnerabilities,becauseallpartieswouldneedtostopcommunicatingandre-negotiateflowlevelswhenonepartyleft.Hence,theonlyremainingdefenseagainsttrafficanalysisistoensurethattheadversaryconsideredinthesystemthreatmodelisnotcapableofsimultaneouslymonitoringenoughpointsinthenetworktobreakusers’anonymity.
Whilethisapproachwouldbeofnohelpagainstaglobalpassiveadversary,morerealisticattackers’trafficmonitoringcapabilitiesarelikelytobelimitedtoparticularjurisdiction(s),whethertheyderivefromlegalorextra-legalpowers.Thisintuitivelyleadstotheideathatpathsthroughanonymitynetworksshouldbeselectedtogothroughasmanydifferentcountriesaspossible.Thehopehereisthatanattackerattemptingtotrackconnectionsmighthavetheabilitytomonitortrafficinsomecountries,butnotallthoseonthepath.
Unfortunately,FeamsterandDingledine[14]showedthisapproachcouldactuallyhurtanonymitybecauseinternationalconnectionswerelikelytogothroughoneofaverysmallnumberoftier-1InternetServiceProviders(ISP)–
.usAS1.seIXAS2.cn.au.brFig.1.Multiple-countrypaththroughahypotheticalanonymitynetworkatgeograph-icalandASlevelabstractions.Here,despitethepathtravelingthrough3countriesbetweenBrazil(.br)andtheUS(.us),therearetwotier-1ISPswhichseealllinks.Forexample,thehopthroughChina(.cn)isvulnerablesincetheincomingandoutgoinglinksareobservedbyAS2.Atfirstglance,theSwedish(.se)hopseemssecure,astheincominglinkisseenbyAS2andtheoutgoingbyAS1.However,theSwedishISPconnectstoAS1andAS2atLINX(IX),openinguptheriskofobservationthere.
thosewhooffertransittothefullInternet.Thus,connectionstoandfromafar-flungTornodearelikelytobothpassthroughasingletier-1ISP,negatingtheanonymitybenefitagainstanISPleveladversary.So,while–attheabstractionlevelofdirectconnections–amulti-countrypathmayappeartohavelocationdiversity,bytakingintoaccounttheISPsthatthedatapassesthroughbetweenTornodes,weakpointsbecomeclear,asshowninFig.1.
Instead,FeamsterandDingledinepropose,whenselectingpaths,therela-tionshipbetweenISPscarryingdatabetweenpairsofTornodesistakenintoaccount.TheydidthisbycollectingBorderGatewayProtocol(BGP)data,whichcontrolshowpacketsareroutedbetweenentitiesontheInternet,knownasAutonomousSystems(AS)androughlycorrespondtoISPs.Fromthisdata,assumptionsaboutcommercialrelationshipsbetweenISPs,andheuristicsaboutroutingpatterns,itispossibletoestimatetheASeswhichwillbeoneachpath.
OptimizingpathselectiontomaximizeASdiversityreducesthelikelihoodthattherewillbeoneISPwhocanobservetheconnectionthoughtheanonymitynetworkatenoughpointstode-anonymizetheuser.However,althoughthislevelofabstractionisasubstantialimprovementoverthena¨ıvemodelofdirectnodeconnection,itdoesnotfullytakeinaccountallpotentialmonitoringpoints.Thiswillbeillustratedinthefollowingsection.
3ImpactofInternetExchangesonPhysicalTopology
Intheprevioussection,wediscussedtheadvantagesofselectingpathsthroughanonymitynetworkssuchthattherewasnosingleASwhichcouldmonitorallhopsbetweenanonymitynetworknodes.ThismaybeachievedbyselectingnodesonASeswithhigh-degreei.e.thosewhichareconnectedtomultipleotherASes.ISPsowningsuchASesmightpurchasecableconnectionstomanyotherISPs,
butdoingsowouldbeextremelyexpensive.Instead,ISPsmayconnecttheirnetworktoanIX,whichwillprovideconnectivitytoallotherISPswithapres-enceatthatIX.ThisapproachismoreprevalentinEuropethanintheUS,duetodifferingcommercialstructuresandhistoricaldevelopment;alsobecauseoflanguagedifferences,intra-countrytrafficissubstantial.
Thus,whileattheASlevelitappearsthatthepathmakesmultipletran-sitionsbetweendistinctASes,physically,eachoftheseconnectionsmightpassthroughthesameIX.Hence,despitethepathattaininghighASdiversity,thereremainsoneentitywhoisabletode-anonymizethetraffic.Inordertoestablishhowmuchofaproblemthisisfordeployedanonymitynetworks,wesetouttodeterminehowsuccessfulanIXleveladversarywouldbe,comparedtoanASlevelone,inde-anonymizingTorusers.
ThetechniquesofFeamsterandDingledine[14]relyonbuildingamapofASpathsfromBGPdata,butthisisnothelpfulforourpurposesastheIXesdonotappearatthislevel.FromtheperspectiveofarouterinanIX,packetstraveldirectlytothedestinationAS.Furthermore,theirapproachdependsoninforma-tionaboutISPrelationshipsandroutingpolicieswhichareacarefullyguardedsecretandsomustbeguessed.However,itiscommonpracticetoallocateeachrouterinanIXanIPaddressfromasinglesubnet.
Hence,whiletheASpathofaconnectionwillnotrevealwhetheritisgoingthroughanIX,atraceroute[15]islikelyto.UnlikefindingASpaths,collectingtraceroutedatarequiresaccesstothesystematbothendsofthepath.AsTordoesnotcurrentlyimplementamechanismforperformingtraceroutes,theoperatorofthenodemustdosomanually.Tolimittheefforttoafeasiblelevel,herewetaketheUKasacasestudy.3.1
ExperimentalResults
Basedongeo-locationdatabasesandmanualinvestigation,weidentifiedTornodeshostedintheUKandcontactedtheoperatorstorequestthattheyrunascripttocollectdatatovalidateourhypothesis.Oneofourconstraintswasthatnocustombinaryapplicationscouldbeused,astherecipientcouldnoteasilyconfirmtheywerebenign.Instead,wesimplyinvokedtheOSprovidedtraceroute(oronWindows,tracert).Thesearenotdesignedwithspeedorparallelisminmind,sotokeeptheruntimereasonable(2–24hours,dependingontimeouts)ontheslowerWindowstestmachinesweonlytraced140destinations,andon*nixmachines,tested595destinations.Thesedestinationsconsistedofthesame15websitesand11USconsumerISPstestedin[14]andtheremainderwererandomlyselectedTornodes.
Wereceived19(14*nix,5Windows)responsesfromthe33operatorswewereabletocontact.Thistotaled9025pathswithanaveragepathlengthof14hops(excludingfailedresponses).ForeachhopweestablishedwhetheritwasinoneofthesubnetsofLINX(LondonInterNeteXchange),AMS-IX(AMSterdamInterneteXchange)orDE-CIX(theGermanInternetexchange,inFrankfurt).Also,usingtheTeamCymruDatabase[16],weestablishedtheBGPoriginASforeachIPaddress.NotethatalthoughwearearrangingdatabyAS,thispath
Table1.NumberofpathspassingthroughASesandIXes.
ASname(ASN)Level3(3356)NTL(5089)Zen(13037)JANET(786)Datahop(6908)Tiscali(3257)Sprint(1239)Cogent(174)Telewest(5462)Telia(1299)
Paths1961144512581224996953935871698697
%22%16%14%14%11%11%10%10%8%8%
IXname(subnet)
Paths
%
LINX(195.66.224.0/22)239227%DE-CIX(80.81.192.0/22)2313%AMS-IX(195.69.144.0/22)2022%
isnotthesameastheBGPpathdiscussedin[14].Importantly,whileIXesmayhaveanAS,theydonotbroadcastroutes,andsodonotappearinBGPpaths,whereastracerouteestablishestheIPaddressoftheborderrouters,fromwhichtheIXcanbeinferred.
TheresultsaresummarizedinTable1.Ascanbeseen,Level3,alargetier-1ISPappearsatleastonceon22%ofpathsandothertier-1ISPs,suchasTiscali,Sprint,CogentandTeliaalsoappear.SinceourtestswereallfromUKTornodes,mainlyrunbyvolunteers,consumerISPsalsofeature,suchasNTL,ZenandTelewest,asdoestheUKacademicnetworkoperator,JANET.Finally,Datahop,whoprovideconnectivitybetween10data-centersinLondon,arepresenton11%ofpaths.Thisbroadlymatchestheresultsof[14],inthatasmallnumberofISPsarepresentmanypaths.
However,ifwenowexaminewhetheranIXisonthepath,wefindanewclassofobservationpoints.DespitebeinginvisibleattheBGPlevel,LINXispresenton27%ofpaths.Thereare22distinctASesintheprevioushoptoLINXand109followingtheLINXhop,soAS-diversepathswillnotsubstantiallyimpactLINXcrossings.Hence,exploitingtheIXasanobservationpointisaneffectiveattackagainstbothexistingandproposedanonymitynetworkroutingschemes.TheconnectivitygraphofselectedASes,basedonourdata,isshowninFig.2.
4TrafficAnalysisfromSampledData
TheprevioussectionhasshownhowanadversarypositionedatanIXwouldbecapableofmonitoringasubstantialquantityoftrafficthroughtheTornetwork.Apowerfuladversarywouldbeinapositiontoinstallexpensivehardwaretomountconventionaltrafficanalysisattacksbutsuchanadversarywouldlikelybeabletodeployother,moreeffective,attacks.However,thenetworkinfrastructureprovidedbyanIXmayalreadyhavethetrafficanalysiscapabilitiesthatamoremodestattackercoulduse.
DestinationSprintDatahopZenLevel 3TiscaliCogentTelewestNTLJANETTeliaSourceFig.2.ASconnectivityviaIXgraph.OnlyASesinTable1areshownandallsourcesanddestinationsarecollapsedtosinglenodes.LinksbetweenASeswhichpassthroughLINXareshownassolidlines,AMS-IXisshownbydottedlinesandDE-CIXbydashedlines.PathswhichgothroughnoneoftheseIXesareomitted.Fromthiswecanseethat,inourdata,connectionsthroughSprintandDatahopgofromsourcetodestinationwithoutpassingthroughanyoftheIXeswehaveselected.
Toaidnetworkmanagement,high-endswitchesandroutershavemonitoringfeatureswhich,althoughnotdesignedforthispurpose,maystillbeeffectiveintracingusersofanonymitynetworks.Thissectionwillevaluatethesuitabilityofnetworkmonitoringdatafortrafficanalysis.4.1
TrafficMonitoringinHigh-SpeedNetworks
Onlow-bandwidthsmall-officeorbusinessnetworks,fullpacketanalysistoolssuchastcpdump[17]areadequatetomonitortrafficfordebuggingortomeasureload.However,onlinksfoundonhigh-speednetworks,thecapacityrequiredtostoreallpacketsrapidlybecomesinfeasible.Forexample,attimeofwriting,bothLINXandAMS-IXcarryapproximately150Gb/s,whichexceedsthetheoreticalmaximumcapacityofthehigh-speedPCIebus,64Gb/s(32lanesat2Gb/seach).Despitethesedifficulties,thereishighdemandformonitoringofsuchhigh-speedlinks,todetectproblemssuchasroutingloops,balanceloadacrossnetworkinfrastructureandanticipatefuturedemandsoncapacity.
Theseapplicationsdonotrelyonpacketcontent,andforprivacyreasonsitmaybedesirablenottorecordthisatall.Thus,mediumtohigh-endnetworkingequipmentiscommonlyequippedwiththeabilitytorecordaggregatedataonthetrafficpassingthroughit.OnesuchmechanismisNetFlow[18],developedbyCiscoSystemsbutsupportedbyotherequipmentmanufacturers.NetFlowequippedinfrastructurerecordsunidirectionalflowsasdefinedbyatuple(sourceIP,destinationIP,sourceport,destinationport,IPprotocol,interface,classofservice).Foreachofthese,thedevicewillrecordinformationsuchasthenumberofpackets,totalbytecountandbitwise-orofTCPflags.
Adisadvantageofthisapproachisthatitrequiresthenetworkhardwaretoinspecteverypacketflowingthroughit.Thiscanincursubstantialloadathighernetworkspeeds,sotocounterthisdifficultysampledNetFlowonlyinspectsaproportionqofpackets.WhilesamplingreducesCPUload,thenetworkhard-waremuststillstorestateforeveryflowitconsiderstobelive,whichcouldpotentiallybeverylarge.Analternative,asadoptedbysFlow[19],istomovetheaggregationoutofthenetworkdevicebyimmediatelyexportingsampledpacketheaders.Thisapproachalsogivesaccesstoadditionalfieldsinpacketheaders,suchasthesequencenumber,whichcouldbeusefulfortrafficanaly-sis.However,toensuregenerality,wewillconcentrateoninformationavailableinsampledNetFlowstyledata,whichcouldbeconstructedfromsFlowlogsifneeded(theconverseisnottrue).
Notonlyishigh-speedtrafficmonitoringpossiblewithstandardnetworkingequipment,butitiscommonpracticetodoso.Twoexampleswhichareparticu-larlyrelevanttothispaperarethatAMS-IXrecorddatafortrafficmanagementmonitoring[20]andLINX(whorecord1in2048packets[21])additionallyareconsideringusingsFlowdatafordetectingemailspam[22].ThesamedatacouldalsoassisttrackingusersofananonymitynetworkbecauseSection3.1showedthatasignificantnumberofTorflowspassthroughanIX.Inthefollowingsectionwewillexaminehowsuccessfulthistypeoftrafficanalysiswouldbe.4.2
TrafficAnalysisAssumptions
Therearetwobasictypesoftrafficanalysis.Thefirsttreatstheanonymitynetworkasa“black-box”andonlyinspectstrafficenteringandleavingthenet-work.Thesecondapproachadditionallyexaminesflowswithinthenetwork,andsoimprovestheaccuracyoftheattack.Inthispaper,wewillconcentrateontheformercategory.Asthisdoesnotmakeanyassumptionsaboutthestructureofthenetwork,itisthemoregeneralapproach.However,thetechniqueswepresentherecouldalsobeappliedtothelattercategoryofattacks,asintra-networkTortrafficwillalsooftencrossasmallnumberofInternetexchanges.
WeassumethattheattackedflowpassesthroughanattackercontrolledIXonbothitspathintoandoutoftheanonymitynetwork.Thiswouldbethecaseif,forexample,boththecustomerandsitearehostedonISPswhoseback-boneconnectionwasthroughanIXundersurveillance.Also,weassumethatpacketsamplingisindependentlyandidenticallydistributedovertheflow.Al-thoughsomemodelsofnetworkhardwareimplementperiodicsampling,ratherthanrandom,thisassumptionwillremaintruebecauseTortrafficmakesupaninsignificantproportionofoveralltraffic.
Theattackerobservesasingleflowgoingintothenetworkandwishestoestablishwhichofseveraloutgoingflowsitcorrespondsto.Thiscouldbe,forexample,findingwhichwebsiteaknowncriminalisuploadingstolendatato.Alternatively,theattackermightwishtodiscoverwhohasuploadedaparticularvideotoanewswebsite–nowthereisoneoutgoingflowandmanyincomingcandidates.Inbothcases,theattackerwillhaveanumberofcandidatesinmindwhoarealsogeneratingtrafficatthesametime,andforoursimulationwe
assumethattheseproducearound1000flowsperhour.WealsoassumethattheadversarycandistinguishTortrafficfromothertraffic,whichmaytriviallydonebyIPaddressandportnumber,basedoninformationintheTordirectory.
5
5.1
MathematicalAnalysis
Model
Ourmodelconsistsofnclient-serverflows.Eachflowp=p1,...,pmisacol-lectionofpacketssentattimest1,...,tm.WemodelthetimesasaPoissonprocesswithastarttimes,durationl,andrater(averagepacketspersecond).Thesethreeparametersarechosenindependentlyatrandomforeachflow.
Neithers,l,rnortheflowparedirectlyobservable.Theattackerseesadown-sampledversionofp,inwhicheachpacketisretainedindependentlywithafixedprobabilityq,calledasamplingrate(typicallyabout1/2000).Eachflowissampledattheinputandattheoutput,resultingintwovectorsoftimes:xandy.Givenaflowp,thesamplingprocessesp→xandp→yareindependent:
s,l,r
Poisson
−→px
sampling
←−p
sampling
−→y(1)
Inann-flowsystem,theattackerseesallnoutputvectorsy1,...,yn,andoneinputvectorx,whichcorrespondstosomeyk.ThetaskoftheattackeristocomputetheprobabilityP(Tk)thatxcorrespondstoyk,foreachk.
Tosimplifythemodel,weassumethatnopacketfrompappearssimultane-ouslyinbothxandy.Sincexandyareindependentlysampledfromp,agivenpacketfrompappearsinbothxandywiththeprobabilityofq2=2.5·10−7,thatis,onceevery1/q2=4·106packets(≈2GB).Seeingthesamepacketontheinputandtheoutputisthusveryunlikely,whichpreventspacket-matchingat-tacks[9]andmakesindependentrandomdelaysofindividualpacketspracticallyunobservableinthesampleddata.Forsimplicity,wethereforeassumeinstanta-neouspackettransmission.Section5.5showsthatintroducingamoderatedelaytothesystemdoesnotchangetheeffectivenessofourattack.
Theassumptionofnocommonpacketsinxandyallowsustosimplify(1)byobservingthatxandyarenowindependentPoissonprocesseswithraterq.
x
Poisson
←−s,l,rq
Poisson
−→y(2)
Thissimplificationeliminatestheoriginal(unobservable)flowpfromthemodel.5.2
BasicSolution
LetTkdenotetheeventinwhichinputxandoutputykbelongtothesameflow.Inourmodel,theexactprobabilitiesP(Tk)canbeuniquelydeterminedfromBayes’formula:
P(y1..n|Tk,x)P(Tk|x)
.P(Tk|y1..n,x)=P(y|T,x)P(T|x)1..niii
(3)
ProbabilitiesP(Tk|x)expressourpriorinformationaboutthetarget,possiblybasedonthesampledinputflowx(butnotoutputflowy).Forexample,wemightknowthataparticularserverkisjustmorepopularthanothers,orthatitistheonlyonetoregularlyreceivehigh-volumetrafficandxlookstobehigh-volume.Forsimplicity,intherestoftheanalysis,wetreatallserversequally;anypriorinformationcanbeeasilytakenintoaccountusing(3).
TheprobabilitiesP(y1..n|x,Tk)in(3)canbecomputedasfollows:
P(y1..n|x,Tk)=P(yk|x,Tk)
i=k
P(yk|x,Tk)
P(yi)=P(yi).
P(yk)i
(4)
Here,weusedthefactthatoutputflowsyiareindependent,andthatP(yi|Tk)=
P(yi):theinformationaboutinput-outputconnectionTkisonlyrelevantforstatementsthatinvolvebothinputsandoutputs(suchasP(yk,x|Tk)).Sinceweareonlyinterestedinrelativeprobabilitiesfordifferentk’s,wecan
ignoreallfactorsindependentofk,suchasP(x|Tk)=P(x)oriP(yi),astheywouldcanceloutin(3)anyway:
P(yk|x,Tk)P(yk,x|Tk)P(x,yk|Tk)
=∼.
P(yk)P(x|Tk)P(yk)P(yk)
(5)
WethereforeneedtocomputeP(yk)andP(x,yk|Tk).Wearedealingwithasingleflowx←p→yk,so–toavoidnotationalclutter–wewilldroptheexplicitindexkandassumptionTkfromourformulae.Inthenewnotation,wehaveP(y)andP(x,y),whichcanbecomputedfromappropriateconditionalprobabilitiesbyintegratingouttheunknownparameterss,l,r:
P(y)=P(y|s,l,r)P(s,l,r).(6)
s,l,r
P(x,y)=P(x,y|s,l,r)P(s,l,r)=P(x|s,l,r)P(y|s,l,r)P(s,l,r).(7)P(Tk|y1..n,x)∼P(y1..n|x,Tk)∼
(3)
(4)
s,l,r
s,l,r
Thelastequalityholdsbecausexandy,generatedbymodel(2),areindependentgivens,l,r.ThedistributionP(s,l,r)expressesourpriorknowledgeaboutflowstartingtimes,durations,andrates.
Wedividetheinterval[s,s+l]intoinfinitesimallysmallwindowsofsizedt.SinceyisaPoissonprocess(2),theprobabilityofobservingasinglepacketinonesuchwindowisrqdt.Theprobabilityofnopacketsin[s,s+l]ise−rql.Thus,
e−rql(rqdt)nyifalltimesiny∈[s,s+l],
P(y|s,l,r)=(8)
0otherwise.Here,nyisthenumberofpacketsiny.Thesameformula(withnx)holdsfor
P(x|s,l,r).SinceP(x,y|s,l,r)=P(x|s,l,r)P(y|s,l,r),wealsohave
e−2rql(rqdt)nx+nyifalltimesinx,y∈[s,s+l],
P(x,y|s,l,r)=(9)
0otherwise.
5.3Long-LivedFlows
Wefirstconsiderasimplifiedmodel,inwhichallflowsstartatthesameknowntimesandhavethesameknowndurationl(basically,[s,s+l]isourobservationwindow).Theonlyfactordistinguishingtheflowsistheir(unknown)rater.From(8),weget:
P(y)=
r
P(y|r)P(r)=
r
e−rql(rqdt)nyP(r).
(10)
whereP(r)isourpriorinformationabouttherater.Sincerisapositiveparam-eter,weexpressourcompletelackofpriorknowledgebyusingthescale-invariantJeffrey’signorancepriorP(r)∼r−1dr[23].Thisbasicallysaysthatlogrisdis-tributeduniformly:theprobabilityofr∈[a,b]isproportionaltolog(b/a).Forexample,r∈[1,10]andr∈[10,100]havethesameprobability.
dtny
P(y)=(rqdt)eP(r)=(qdt)redr=nyΓ(ny).
lrr=0
(11)∞a−1−bz
a
Weused0zedz=Γ(a)/b;forintegernwehaveΓ(n)=(n−1)!.
Similarly,from(9),
(10)
ny−rql
ny
ny−1−rql
∞
P(x,y)=
r
(rqdt)
nx+ny−2rql
e
dtnx+ny
Γ(nx+ny).P(r)=
(2l)nx+ny
(12)
Wecannowuse(5)tocomputethefinalprobability:
dtnxΓ(nx+nyk)Γ(nx+nyk)P(x,yk|Tk)=·∼.P(Tk|y1..n,x)∼
P(yk)(2l)nx2nykΓ(nyk)2nykΓ(nyk)
(13)
Interpretation.Fig.3(a)showsanormalizedplotof(13)fornx=5asa
functionofny.Themaximumprobabilityisassignedtony≈nx,whenthenumbersofobservedpacketsontheinputandontheoutputaresimilar.Thisconfirmsourintuitionandalsoyieldsquantitativeprobabilitiesfordifferentny’s,whichcanbeusedforcombiningevidencefrommultipleobservations.
Theexactmaximumoccursforny>nxbecausethepriorP(r)∼r−1dr
6
causesP(r∈[4,5])>P(r∈[5,6])(because54>5).Thismakessmallny’smoreprobabletobeproducedbychancethanlargerones,decreasingtheirmatchprobability.UsingStirling’sapproximationofn!,weget(seeappendix):
P(Tk|y1..n,x)∼
(nx+ny−1)nx+ny−22ny(ny−1)ny−211
,(14)
whichverycloselymatchestheoriginal,asshowninFig.3(a).Themaximumof(14),obtainedbycomparingitsderivativetozero,isny≈nx+12.
P( Tk | x, y1..n )maxy051015202530real (13)approx (14)
350−5(18)
−25−20−15−10
−5miny
0
5
10
15
02468ny
101214
(a)P(Tk|x,y1..n)givenby(13)forfixed(b)logP(Tk|x,y1..n)givenby(18)fornx=5andnyrangingfrom0to15.nx=ny=5,minx=0,maxx=10,
andvariableminyandmaxy.Fig.3.Relativeprobabilitiesbasedon(a)observedpacketcountsand(b)lengths.
5.4GeneralFlows
Now,weconsiderthegeneralcase,inwhichflowshavedifferent(unknown)du-rationslandstartingtimess.From(8),wecancomputeP(y|l,r)byintegratingsout.Foragivendurationl,thepossiblestartingtimessbelongtotheinterval[maxy−l,miny].Ifly=maxy−minyistheobservedlengthofy,thenthisintervalofpossiblevaluesofshasthelength(l−ly)0=max{l−ly,0}.Assuminglackofpriorknowledgeabouts(uniformpriorP(s)∼ds),wehave
(8)
P(y|l,r)=P(y|s,l,r)P(s)∼(l−ly)0e−rql(rqdt)ny.(15)
s
UsingJeffrey’spriorsP(l)∼l−1dlandP(r)∼r−1dr,weget:P(y)=
P(y|l,r)P(l,r)=(l−ly)0e−rql(rqdt)nyl−1r−1drdl=l,rl,r
(qdt)ny(l−ly)0l−1e−rqlrny−1drdl=
lr
(qdt)ny(l−ly)0l−1Γ(ny)(ql)−nydl=dt
ny
Γ(ny)
l∞
(l−ly)l
−ny−1
dl=dt
ny
l=ly
lyy
Γ(ny).(16)
ny(ny−1)
−n+1
WecancomputeP(x,y)inasimilarway.Letnxy=nx+nybethetotalnumberofpacketsinxandy,andlxy=max{maxx,maxy}−min{minx,miny}
theobservedlengthofsuperimposedsequencesxandy.Ingeneral,lxy=lx+ly.P(x,y)=
l,r
(l−lxy)0e−2rql(rqdt)nxyl−1r−1drdl=
Γ(nxy)dtnxy
xy
2nxy(nxy)(nxy−1)lxy
n−1.
(17)
Ignoringallfactorsindependentofk,(5)givesusthefinalprobabilitynyk(nyk−1)Γ(nxyk)lykkP(x,yk|Tk)
·∼ny·nxy−1.(18)P(Tk|x,y1..n)=
kP(yk)2xkΓ(nyk)nxyk(nxyk−1)lxy
k
ny−1
Interpretation.Formula(18)consistsofthreefactors:(i)therateformula
(13),(ii)arate-dependentcorrectionny(ny−1)/(nxy(nxy−1)),and(iii)the
n−1nxy−1
length-dependentfactorlyy/lxy,whichisofthemostinteresttoushere.
Considermatchinganinputflowwiththeobservedstartingtimeminx=0,endingtimemaxx=10,andnx=5observedpackets,againstoutputflowsywiththesamenumberofobservedpacketsny=5.Forvariousstartingandend-ingtimesminyandmaxy,Fig.3(b)presentsthematchinglikelihoodassignedby(18)(sincenxandnyareconstant,soarethefirsttwofactors).
Asexpected,themaximumisattainedwhentheobservedstartingandendingtimesofbothflowscoincide:minx=miny=0andmaxx=maxy=10.Eachcontourlineconsistsoftwoparallelstraightlinesjoinedbytwocurves.Thetwostraightlinescorrespondtotheobservedinputflowperiodcompletelycontainingtheobservedoutputflowperiod,andviceversa.
Optimality.Thederivationof(18)isstrictlyBayesian,so–giventhemodelassumptions–theresultisexactandusesallrelevantinformation.Notethat,despitethetimingsofallpacketsbeingavailablethroughxandy,formula(18)usesonlythetotalpacketcounts(ny,nxy)andtheobservedlengths(ly,lxy).Thisshowsthattheexacttimingsofindividualpackets(usedbytiming-basedattacks)areirrelevantfortheinferenceinourmodel.5.5
Evaluation
ToevaluatetheeffectivenessofourmethodinattackinganindividualTornode,wefirstcollectedrealtrafficdistributionsofobservedflowratesanddurations(Fig.4).Then,weperformedanumberofsimulationsofa120minexecutionofanode.Flowdurations(1–30min)andrates(0.1–50packets/s)weredrawnfromthelog-uniform(P(z)∼z−1dz)prior,consistentwithFig.4.Startingtimeswereselecteduniformlyfromtheinterval[0,120min−l].
Ourscoringmethodwas“1”ifthehighestprobabilitywasassignedtothecorrecttarget,and“0”otherwise(ifi>1targetssharedthetopprobability,
Rate (packets/s)0.0250
0.100.502.0010.0050.0010020050010002000
Flow duration (s)
thenthescorewas1/iinsteadof1).Foreachsimulation,weappliedtheattackindependentlytoeachinput,andthenaveragedtheresults.
Wevariedthefollowingparameters:thenumberofflowsperhour(50–1000),thesamplingrateq(1/100–1/2000),themeannetworklatency(0–10min),andtheattackmethod.Ourparameterrangesareconsistentwiththeirrealvalues:ourTornodetransmitted479flows/honaverage,theaverageTornetworkla-tencywas0.5s,andthecurrenttypicalsamplingrateis1/2048,butmayincreaseinthefuture.TheresultsofoursimulationsaresummarizedinFig.5.Averagenumberofflows.Fig.5(a)confirmsthatmoreflowsprovidemorepro-tection.Foratypicalnumberof500flows/h,theattackhada50%chanceofsuccesswhenthetargetsends≈20000packets,thatis≈10MBofdata.With50flows/h,thesamesuccessraterequiredonly7000packets(3.5MB).Samplingrate.Fig.5(b)suggeststhattheeffectivenessoftheattackdependsonlyonthenumberofsampledpackets,sodoublingthesamplingrateisequiva-lenttodoublingthenumberoftransmittedpackets.Forthetechnicallyfeasiblesamplingrateof1/100,asuccessrateof50%requiredonly1000transmittedpackets(500kB).
Attackmethods.Wecomparedthefollowingattacks:(i)rateattack,whichap-plies(13),takingintoaccounttheobservednumberofpacketsandignoring
(a) variable flows/hour
1.050 flows/h100 flows/h250 flows/h500 flows/h1000 flows/h
1.0(b) variable sampling rate
1 per 100 1 per 2001 per 5001 per 10001 per 2000
0.8P(correct target)P(correct target)1,000
10,000
100,000
0.60.40.20.010100
0.010
0.20.40.60.81001,00010,000100,000
transmitted packetstransmitted packets
(c) variable attack method
1.0full attackdurationsrate+overlaprate only
1.0(d) variable delay
no delay30 s1 min2 min5 min
0.8P(correct target)P(correct target)1,000
10,000
100,000
0.60.40.20.010100
0.010
0.20.40.60.81001,00010,000100,000
transmitted packetstransmitted packets
Fig.5.Simulationresults:theprobabilityofchoosingthecorrecttarget,asafunctionofthenumberoftransmittedpackets,forvaryingnumbersofflows/hour(default1000),samplingrate(1/2000,except(c)),attackmethod(fullattack),anddelay(0).
packettimes;(ii)rate+overlapattack,whichadditionallyignoresoutputswithobservedpackettimingsdisjointwiththeinput;(iii)lengthattack,whichselectstheoutputywiththehighestratioly/lxy;(iv)fullattack,whichuses(18).
Fig.5(c)showstheeffectivenessofthesefourmethodsinasystemwithasamplingrateof1/100.Thecombinedrateandlengthinformation(18)resultedina50%successratefor≈1000packets(10sampled).Incomparison,takingonlyonefactor(rateorlength)intoaccount,required100timesmorepacketstoachievethesameaccuracy.
Delays.Fig.5(d)showstheeffectsofintroducinganexponentiallydistributedrandomdelaytothesystem.Theeffectivenessofourattackstayedapproximatelythesamefordelaysupto30s,andthenstartedtodeteriorate,reachingthe0%levelfora5mindelay.Note,however,thatourattackexplicitlyassumesnodelaywhatsoever,thereforethisresultdoesnotmeanthata5-minuterandomdelaysafeguardsagainstallsamplingattacks.
6FutureWork
Forsimplicity,weignoredseveralphenomenathatoccurinpractice,suchasdifferentsamplingratesandhowTorcellsaresplitoverIPpackets.Generalizingouranalysistosupportdifferentknownsamplingratesatinputandoutputseemsstraightforward(butanattackbyasingleadversarywithafixedsamplingrateismostlikely).Similarly,theeffectofpacketsplittingbyTornodesseemstobestatisticallyequivalenttodifferentsamplingrates.OuranalysiscouldalsobemodifiedtotakeTCPsequencenumbers,availablefromsFlowrecords,intoaccount,togivemoreaccurateratecalculation.
Asreasonablerandomdelaysdonotprotectagainstourattack,weplantoexamineotherdefenses,suchasamoderateamountofdummytraffic.Wewouldalsoliketomeasuretheeffectivenessofourattackagainstrealsystems,usinganempiricallydeterminedpriordistributionondurationsandrates,forboththeanalysis(numericalintegrationrequired)andtheevaluation.Ideally,suchanevaluationshouldbeperformedfortheentireTorsystem,withitsaverage1millionflowsperhour.
Furthermore,weareconsideringhowintra-networktrafficanalysiscouldbeperformed.Similartechniquescouldbeused,andarelikelytoworkbetterthanwhole-networkanalysissincethenumberofflowswillbesmaller.However,therearecomplicationswhichmustbeconsidered,inparticularthatmultipleflowsbetweenthesamepairofTornodesmaybemultiplexedwithinoneencryptedTLStunnel.Animprovedanalysiswouldtakethispossibilityintoaccountandempiricalstudieswouldshowtowhatextentthisinterfereswithanalysis.
7Conclusion
WehavedemonstratedthatInternetexchangesareaviable,andpreviouslyunexamined,monitoringpointfortrafficanalysispurposes.TheyarepresentonmanypathsthroughoursampleoftheTornetwork,evenwhereBGPdatawouldnotdetectanycommonpointsoffailure.Furthermore,Internetexchangesareparticularlyrelevantasinsomecasestheymayrecord,andpotentiallyretaindataadequatetoperformtrafficanalysis.
Tovalidatetowhatextentthiswastrue,wedevelopedtrafficanalysistech-niqueswhichworkonthesampleddatawhichisbeingcollectedinpracticebyInternetexchanges.UsingaBayesianapproach,weobtainedthebestpossibleinference,whichmeansthatwecannotonlyattackvulnerablesystems,butalsodeclareothersassafeunderourthreatmodel.Ourprobabilityformulaisdifficulttoobtainbytrial-and-error,and–asweshow–cangiveordersofmagnitudebetterresultsthansimpleintuitiveschemes.
Wealsoshowthatexact“internal”packettimingsareirrelevantforoptimuminference,sotiming-basedattackscannotworkwithsparselysampleddata.Forthesamereason,deliberaterandompacketdelaysdonotprotectlow-latencyanonymitysystemsagainstourattack,astheminimumsensiblelatency(1min)isunacceptableforwebbrowsingandsimilaractivities.
AcknowledgmentsWethankRichardClayton,ChrisHall,MarkusKuhn,AndreiSerjantovandtheanonymousreviewersforproductivecomments,andalsotheTornodeoperatorswhocollectedthedatausedinthispaper.
References
1.Danezis,G.,Dingledine,R.,Mathewson,N.:Mixminion:DesignofaTypeIIIAnonymousRemailerProtocol.In:Proceedingsofthe2003IEEESymposiumonSecurityandPrivacy.(2003)2.M¨oller,U.,Cottrell,L.,Palfrader,P.,Sassaman,L.:MixmasterProtocol–Version2.Draft(2003)
3.Dingledine,R.,Mathewson,N.,Syverson,P.:Tor:Thesecond-generationonionrouter.In:Proceedingsofthe13thUSENIXSecuritySymposium.(2004)4.Berthold,O.,Federrath,H.,K¨opsell,S.:WebMIXes:AsystemforanonymousandunobservableInternetaccess.InFederrath,H.,ed.:ProceedingsofDesigningPrivacyEnhancingTechnologies:WorkshoponDesignIssuesinAnonymityandUnobservability,Springer-Verlag,LNCS2009(2000)115–129
5.Boucher,P.,Shostack,A.,Goldberg,I.:Freedomsystems2.0architecture.Whitepaper,ZeroKnowledgeSystems,Inc.(2000)
6.Serjantov,A.,Murdoch,S.J.:Messagesplittingagainstthepartialadversary.In:ProceedingsofPrivacyEnhancingTechnologiesworkshop(PET2005),Springer-Verlag,LNCS3856(2005)
7.Serjantov,A.,Sewell,P.:Passiveattackanalysisforconnection-basedanonymitysystems.In:ProceedingsofESORICS2003.(2003)
8.Levine,B.N.,Reiter,M.K.,Wang,C.,Wright,M.K.:Timingattacksinlow-latencymix-basedsystems.InJuels,A.,ed.:ProceedingsofFinancialCryptography(FC’04),Springer-Verlag,LNCS3110(2004)
9.Danezis,G.:Thetrafficanalysisofcontinuous-timemixes.In:ProceedingsofPrivacyEnhancingTechnologiesworkshop(PET2004).Volume3424ofLNCS.(2004)
10.Dai,W.:Pipenet1.1.PosttoCypherpunksmailinglist(1998)http://www.
eskimo.com/~weidai/pipenet.txt.
11.Øverlier,L.,Syverson,P.:Locatinghiddenservers.In:Proceedingsofthe2006
IEEESymposiumonSecurityandPrivacy,IEEECS(2006)
12.Bauer,K.,McCoy,D.,Grunwald,D.,Kohno,T.,Sicker,D.:Low-resourcerouting
attacksagainstanonymoussystems.TechnicalReportCU-CS-1025-07,UniversityofColoradoatBoulder(2007)
13.Acquisti,A.,Dingledine,R.,Syverson,P.:OntheEconomicsofAnonymity.
InWright,R.N.,ed.:ProceedingsofFinancialCryptography(FC’03),Springer-Verlag,LNCS2742(2003)
14.Feamster,N.,Dingledine,R.:Locationdiversityinanonymitynetworks.In:Pro-ceedingsoftheWorkshoponPrivacyintheElectronicSociety(WPES2004),Washington,DC,USA(2004)
15.Jacobson,V.:traceroute(1)(1987)ftp://ftp.ee.lbl.gov/traceroute.tar.gz.16.TeamCymru:IPtoASNlookup(v1.0)http://asn.cymru.com/.
17.Jacobson,V.,Leres,C.,McCanne,S.:tcpdump(1)(1989)http://www.tcpdump.
org/.
18.Claise,B.:CiscosystemsNetFlowservicesexportversion9.RFC3954,IETF
(2004)
19.Phaal,P.,Panchen,S.,McKee,N.:InMoncorporation’ssFlow:Amethodfor
monitoringtrafficinswitchedandroutednetworks.RFC3176,IETF(2001)
20.Jasinska,E.:sFlow–Icanfeelyourtraffic.In:23C3:23rdChaosCom-municationCongress.(2006)http://events.ccc.de/congress/2006/Fahrplan/attachments/1137-sFlowPaper.pdf.
21.Hughes,M.:LINXnews.http://www.uknof.org.uk/uknof4/Hughes-LINX.pdf
(2006)
22.Clayton,R.:spamHINTSproject(2006)http://www.spamhints.org/.
23.Jaynes,E.T.:ProbabilityTheory:TheLogicofScience.CambridgeUniversity
Press(2003)
AAppendix
Theorem1.Formula(13)attainsmaximumforny≈nx+12.Proof.Stirling’sfactorialapproximationgivesus
nn√n!≈2πn.
e
Denotinga=nx,b=ny,andc=a+b,wehave:P(Tk|y1..n,x)∼
2π(c−1)(c−1)!Γ(a+b)e=≈∼b−1b−12bΓ(b)2b(b−1)!b22π(b−1)e(c−1)2b(b
c−12b−12c−1c−1
−1)
=X.(19)
InsteadoffindingthemaximumofX,itiseasiertofindthemaximumof
logX:
1
logX=(c−12)log(c−1)−blog2−(b−2)log(b−1).
(20)
WecanfindthemaximumoflogXbydifferentiatingitw.r.t.b,andremem-beringthatc=(a+b)=1:
1
c−2b−12(logX)=log(c−1)+−log2−log(b−1)−
c−1b−1
11
−log2−log(b−1)−=log(c−1)+
2(c−1)2(b−1)
1c−12≈log(c−12)−log2−log(b−2)=log2b−1.
(21)
Now,(logX)=0impliesc−
ny=nx+12.
1
2=2b−1,whichimpliesb=a+12,thatis
因篇幅问题不能全部显示,请点此查看更多更全内容