您的当前位置:首页正文

Sampled traffic analysis by Internetexchange-level adversaries

2022-02-20 来源:客趣旅游网
SampledTrafficAnalysisby

Internet-Exchange-LevelAdversaries

StevenJ.MurdochandPiotrZieli´nski

UniversityofCambridge,ComputerLaboratoryhttp://www.cl.cam.ac.uk/users/{sjm217,pz215}

Abstract.Existinglow-latencyanonymitynetworksarevulnerabletotrafficanalysis,solocationdiversityofnodesisessentialtodefendagainstattacks.Previousworkhasshownthatsimplyensuringgeographicaldi-versityofnodesdoesnotresist,andinsomecasesexacerbates,theriskoftrafficanalysisbyISPs.Ensuringhighautonomous-system(AS)diver-sitycanresistthisweakness.However,ISPscommonlyconnecttomanyotherISPsinasinglelocation,knownasanInterneteXchange(IX).ThispapershowsthatIXesareasinglepointwheretrafficanalysiscanbeperformed.Weexaminetowhatextentthisistrue,throughacasestudyofTornodesintheUK.Also,someIXessamplepacketsflowingthroughthemforperformanceanalysisreasons,andthisdatacouldbeexploitedtode-anonymizetraffic.WethendevelopandevaluateBayesiantrafficanalysistechniquescapableofprocessingthissampleddata.

1Introduction

Anonymitynetworksmaybesplitintotwocategories:highlatency(e.g.Mixmin-ion[1]andMixmaster[2])andlowlatency(e.g.Tor[3],JAP[4]andFree-dom[5]).Highlatencynetworksmaydelaymessagesforseveraldays[6]butaredesignedtoresistverypowerfulattackerswhichareassumedtobecapa-bleofmonitoringallcommunicationlinks,socalledglobalpassiveadversaries.However,thelongpotentialdelaymakesthesesystemsinappropriateforpop-ularactivitiessuchasweb-browsing,wherelow-latencyisrequired.Although,inlow-latencyanonymitynetworks,communicationsareencryptedtomaintainbitwise-unlinkability,timingpatternsarehardlydistorted,allowinganattackertodeploytrafficanalysistode-anonymizeusers[7,8,9].Whiletechniquestoresisttrafficanalysishavebeenproposed,suchaslinkpadding[10],theircostishighandtheyhavenotbeenincorporatedintodeployednetworks.

Instead,thesesystemshavereliedontheassumptionthattheglobalpassiveadversaryisunrealistic,oratleastthosewhoarethetargetofsuchadversarieshavelargerproblemsthananonymousInternetaccess.Butevenexcludingtheglobalpassiveadversary,thepossibilityofpartialadversariesremainsreason-able.TheseattackershavetheabilitytomonitoraportionofInternettrafficbutnottheentirety.Distributedlow-latencyanonymitysystems,suchasTor,aimtoresistthistypeofadversarybydistributingnodes,inthehopethatconnec-tionsthroughthenetworkwillpassthroughenoughadministrativedomainstopreventasingleentityfromtrackingusers.

Thisraisesthequestionofhowtoselectpathsthroughtheanonymitynet-worktomaximizetrafficanalysisresistance.Section2discussesdifferenttopol-ogymodelsoftheInternetandtheirimpactonpathselection.Wesuggestthatexistingmodels,basedonAutonomousSystem(AS)diversity,donotproperlytakeaccountofthefactthatwhile,attheASlevelabstraction,apathmayhavegoodadministrativedomaindiversity,physicallyitcouldrepeatedlypassthroughthesameInterneteXchange(IX).Section3establishes,basedonIn-ternettopologymeasurements,towhatextenttheToranonymitynetworkisvulnerabletotrafficanalysisatIXes.

Section4describeshowIXesareparticularlyrelevantsince,toassistloadmanagement,theyrecordtrafficdatafromthepacketsbeingsentthroughthem.Asaggregatestatisticsarerequiredandthecostofrecordingfulltrafficwouldbeprohibitive,onlysampleddataisstored.Hence,thequalityofdataissubstan-tiallypoorerthanwasenvisagedduringthedesignandevaluationofprevioustrafficanalysistechniques.Section5showsthat,despitelowsamplingrates,thisdataisadequateforde-anonymizingusersoflow-latencyanonymitynetworks.Finally,Section6discussesfurtheravenuesofresearchunderinvestigation.

2LocationDiversityinAnonymityNetworks

Torhasbeenlongsuspected,andlaterconfirmed[11,12],tobevulnerabletoanattackerwhocouldobserveboththeentryandexitpointofaconnectionthroughananonymitynetwork.Asnointentionallatencyisintroduced,timingpatternspropagatethroughthenetworkandmaybeusedtocorrelateinputandoutputtraffic,allowinganattackertotrackconnectionendpoints.

Delayingmessages,asdonewithemailanonymitysystems,wouldimproveresistancetotheseattacks,atleastforasmallnumberofmessages.However,theadditionallatencyhere(hourstodays)would,ifappliedtowebbrowsing,determostusersandsodecreaseanonymityfortheremainder[13].Inadditiontothescarcebandwidthinavolunteernetwork,fulllink-paddingwouldalsointroducecatastrophicdenialofservicevulnerabilities,becauseallpartieswouldneedtostopcommunicatingandre-negotiateflowlevelswhenonepartyleft.Hence,theonlyremainingdefenseagainsttrafficanalysisistoensurethattheadversaryconsideredinthesystemthreatmodelisnotcapableofsimultaneouslymonitoringenoughpointsinthenetworktobreakusers’anonymity.

Whilethisapproachwouldbeofnohelpagainstaglobalpassiveadversary,morerealisticattackers’trafficmonitoringcapabilitiesarelikelytobelimitedtoparticularjurisdiction(s),whethertheyderivefromlegalorextra-legalpowers.Thisintuitivelyleadstotheideathatpathsthroughanonymitynetworksshouldbeselectedtogothroughasmanydifferentcountriesaspossible.Thehopehereisthatanattackerattemptingtotrackconnectionsmighthavetheabilitytomonitortrafficinsomecountries,butnotallthoseonthepath.

Unfortunately,FeamsterandDingledine[14]showedthisapproachcouldactuallyhurtanonymitybecauseinternationalconnectionswerelikelytogothroughoneofaverysmallnumberoftier-1InternetServiceProviders(ISP)–

.usAS1.seIXAS2.cn.au.brFig.1.Multiple-countrypaththroughahypotheticalanonymitynetworkatgeograph-icalandASlevelabstractions.Here,despitethepathtravelingthrough3countriesbetweenBrazil(.br)andtheUS(.us),therearetwotier-1ISPswhichseealllinks.Forexample,thehopthroughChina(.cn)isvulnerablesincetheincomingandoutgoinglinksareobservedbyAS2.Atfirstglance,theSwedish(.se)hopseemssecure,astheincominglinkisseenbyAS2andtheoutgoingbyAS1.However,theSwedishISPconnectstoAS1andAS2atLINX(IX),openinguptheriskofobservationthere.

thosewhooffertransittothefullInternet.Thus,connectionstoandfromafar-flungTornodearelikelytobothpassthroughasingletier-1ISP,negatingtheanonymitybenefitagainstanISPleveladversary.So,while–attheabstractionlevelofdirectconnections–amulti-countrypathmayappeartohavelocationdiversity,bytakingintoaccounttheISPsthatthedatapassesthroughbetweenTornodes,weakpointsbecomeclear,asshowninFig.1.

Instead,FeamsterandDingledinepropose,whenselectingpaths,therela-tionshipbetweenISPscarryingdatabetweenpairsofTornodesistakenintoaccount.TheydidthisbycollectingBorderGatewayProtocol(BGP)data,whichcontrolshowpacketsareroutedbetweenentitiesontheInternet,knownasAutonomousSystems(AS)androughlycorrespondtoISPs.Fromthisdata,assumptionsaboutcommercialrelationshipsbetweenISPs,andheuristicsaboutroutingpatterns,itispossibletoestimatetheASeswhichwillbeoneachpath.

OptimizingpathselectiontomaximizeASdiversityreducesthelikelihoodthattherewillbeoneISPwhocanobservetheconnectionthoughtheanonymitynetworkatenoughpointstode-anonymizetheuser.However,althoughthislevelofabstractionisasubstantialimprovementoverthena¨ıvemodelofdirectnodeconnection,itdoesnotfullytakeinaccountallpotentialmonitoringpoints.Thiswillbeillustratedinthefollowingsection.

3ImpactofInternetExchangesonPhysicalTopology

Intheprevioussection,wediscussedtheadvantagesofselectingpathsthroughanonymitynetworkssuchthattherewasnosingleASwhichcouldmonitorallhopsbetweenanonymitynetworknodes.ThismaybeachievedbyselectingnodesonASeswithhigh-degreei.e.thosewhichareconnectedtomultipleotherASes.ISPsowningsuchASesmightpurchasecableconnectionstomanyotherISPs,

butdoingsowouldbeextremelyexpensive.Instead,ISPsmayconnecttheirnetworktoanIX,whichwillprovideconnectivitytoallotherISPswithapres-enceatthatIX.ThisapproachismoreprevalentinEuropethanintheUS,duetodifferingcommercialstructuresandhistoricaldevelopment;alsobecauseoflanguagedifferences,intra-countrytrafficissubstantial.

Thus,whileattheASlevelitappearsthatthepathmakesmultipletran-sitionsbetweendistinctASes,physically,eachoftheseconnectionsmightpassthroughthesameIX.Hence,despitethepathattaininghighASdiversity,thereremainsoneentitywhoisabletode-anonymizethetraffic.Inordertoestablishhowmuchofaproblemthisisfordeployedanonymitynetworks,wesetouttodeterminehowsuccessfulanIXleveladversarywouldbe,comparedtoanASlevelone,inde-anonymizingTorusers.

ThetechniquesofFeamsterandDingledine[14]relyonbuildingamapofASpathsfromBGPdata,butthisisnothelpfulforourpurposesastheIXesdonotappearatthislevel.FromtheperspectiveofarouterinanIX,packetstraveldirectlytothedestinationAS.Furthermore,theirapproachdependsoninforma-tionaboutISPrelationshipsandroutingpolicieswhichareacarefullyguardedsecretandsomustbeguessed.However,itiscommonpracticetoallocateeachrouterinanIXanIPaddressfromasinglesubnet.

Hence,whiletheASpathofaconnectionwillnotrevealwhetheritisgoingthroughanIX,atraceroute[15]islikelyto.UnlikefindingASpaths,collectingtraceroutedatarequiresaccesstothesystematbothendsofthepath.AsTordoesnotcurrentlyimplementamechanismforperformingtraceroutes,theoperatorofthenodemustdosomanually.Tolimittheefforttoafeasiblelevel,herewetaketheUKasacasestudy.3.1

ExperimentalResults

Basedongeo-locationdatabasesandmanualinvestigation,weidentifiedTornodeshostedintheUKandcontactedtheoperatorstorequestthattheyrunascripttocollectdatatovalidateourhypothesis.Oneofourconstraintswasthatnocustombinaryapplicationscouldbeused,astherecipientcouldnoteasilyconfirmtheywerebenign.Instead,wesimplyinvokedtheOSprovidedtraceroute(oronWindows,tracert).Thesearenotdesignedwithspeedorparallelisminmind,sotokeeptheruntimereasonable(2–24hours,dependingontimeouts)ontheslowerWindowstestmachinesweonlytraced140destinations,andon*nixmachines,tested595destinations.Thesedestinationsconsistedofthesame15websitesand11USconsumerISPstestedin[14]andtheremainderwererandomlyselectedTornodes.

Wereceived19(14*nix,5Windows)responsesfromthe33operatorswewereabletocontact.Thistotaled9025pathswithanaveragepathlengthof14hops(excludingfailedresponses).ForeachhopweestablishedwhetheritwasinoneofthesubnetsofLINX(LondonInterNeteXchange),AMS-IX(AMSterdamInterneteXchange)orDE-CIX(theGermanInternetexchange,inFrankfurt).Also,usingtheTeamCymruDatabase[16],weestablishedtheBGPoriginASforeachIPaddress.NotethatalthoughwearearrangingdatabyAS,thispath

Table1.NumberofpathspassingthroughASesandIXes.

ASname(ASN)Level3(3356)NTL(5089)Zen(13037)JANET(786)Datahop(6908)Tiscali(3257)Sprint(1239)Cogent(174)Telewest(5462)Telia(1299)

Paths1961144512581224996953935871698697

%22%16%14%14%11%11%10%10%8%8%

IXname(subnet)

Paths

%

LINX(195.66.224.0/22)239227%DE-CIX(80.81.192.0/22)2313%AMS-IX(195.69.144.0/22)2022%

isnotthesameastheBGPpathdiscussedin[14].Importantly,whileIXesmayhaveanAS,theydonotbroadcastroutes,andsodonotappearinBGPpaths,whereastracerouteestablishestheIPaddressoftheborderrouters,fromwhichtheIXcanbeinferred.

TheresultsaresummarizedinTable1.Ascanbeseen,Level3,alargetier-1ISPappearsatleastonceon22%ofpathsandothertier-1ISPs,suchasTiscali,Sprint,CogentandTeliaalsoappear.SinceourtestswereallfromUKTornodes,mainlyrunbyvolunteers,consumerISPsalsofeature,suchasNTL,ZenandTelewest,asdoestheUKacademicnetworkoperator,JANET.Finally,Datahop,whoprovideconnectivitybetween10data-centersinLondon,arepresenton11%ofpaths.Thisbroadlymatchestheresultsof[14],inthatasmallnumberofISPsarepresentmanypaths.

However,ifwenowexaminewhetheranIXisonthepath,wefindanewclassofobservationpoints.DespitebeinginvisibleattheBGPlevel,LINXispresenton27%ofpaths.Thereare22distinctASesintheprevioushoptoLINXand109followingtheLINXhop,soAS-diversepathswillnotsubstantiallyimpactLINXcrossings.Hence,exploitingtheIXasanobservationpointisaneffectiveattackagainstbothexistingandproposedanonymitynetworkroutingschemes.TheconnectivitygraphofselectedASes,basedonourdata,isshowninFig.2.

4TrafficAnalysisfromSampledData

TheprevioussectionhasshownhowanadversarypositionedatanIXwouldbecapableofmonitoringasubstantialquantityoftrafficthroughtheTornetwork.Apowerfuladversarywouldbeinapositiontoinstallexpensivehardwaretomountconventionaltrafficanalysisattacksbutsuchanadversarywouldlikelybeabletodeployother,moreeffective,attacks.However,thenetworkinfrastructureprovidedbyanIXmayalreadyhavethetrafficanalysiscapabilitiesthatamoremodestattackercoulduse.

DestinationSprintDatahopZenLevel 3TiscaliCogentTelewestNTLJANETTeliaSourceFig.2.ASconnectivityviaIXgraph.OnlyASesinTable1areshownandallsourcesanddestinationsarecollapsedtosinglenodes.LinksbetweenASeswhichpassthroughLINXareshownassolidlines,AMS-IXisshownbydottedlinesandDE-CIXbydashedlines.PathswhichgothroughnoneoftheseIXesareomitted.Fromthiswecanseethat,inourdata,connectionsthroughSprintandDatahopgofromsourcetodestinationwithoutpassingthroughanyoftheIXeswehaveselected.

Toaidnetworkmanagement,high-endswitchesandroutershavemonitoringfeatureswhich,althoughnotdesignedforthispurpose,maystillbeeffectiveintracingusersofanonymitynetworks.Thissectionwillevaluatethesuitabilityofnetworkmonitoringdatafortrafficanalysis.4.1

TrafficMonitoringinHigh-SpeedNetworks

Onlow-bandwidthsmall-officeorbusinessnetworks,fullpacketanalysistoolssuchastcpdump[17]areadequatetomonitortrafficfordebuggingortomeasureload.However,onlinksfoundonhigh-speednetworks,thecapacityrequiredtostoreallpacketsrapidlybecomesinfeasible.Forexample,attimeofwriting,bothLINXandAMS-IXcarryapproximately150Gb/s,whichexceedsthetheoreticalmaximumcapacityofthehigh-speedPCIebus,64Gb/s(32lanesat2Gb/seach).Despitethesedifficulties,thereishighdemandformonitoringofsuchhigh-speedlinks,todetectproblemssuchasroutingloops,balanceloadacrossnetworkinfrastructureandanticipatefuturedemandsoncapacity.

Theseapplicationsdonotrelyonpacketcontent,andforprivacyreasonsitmaybedesirablenottorecordthisatall.Thus,mediumtohigh-endnetworkingequipmentiscommonlyequippedwiththeabilitytorecordaggregatedataonthetrafficpassingthroughit.OnesuchmechanismisNetFlow[18],developedbyCiscoSystemsbutsupportedbyotherequipmentmanufacturers.NetFlowequippedinfrastructurerecordsunidirectionalflowsasdefinedbyatuple(sourceIP,destinationIP,sourceport,destinationport,IPprotocol,interface,classofservice).Foreachofthese,thedevicewillrecordinformationsuchasthenumberofpackets,totalbytecountandbitwise-orofTCPflags.

Adisadvantageofthisapproachisthatitrequiresthenetworkhardwaretoinspecteverypacketflowingthroughit.Thiscanincursubstantialloadathighernetworkspeeds,sotocounterthisdifficultysampledNetFlowonlyinspectsaproportionqofpackets.WhilesamplingreducesCPUload,thenetworkhard-waremuststillstorestateforeveryflowitconsiderstobelive,whichcouldpotentiallybeverylarge.Analternative,asadoptedbysFlow[19],istomovetheaggregationoutofthenetworkdevicebyimmediatelyexportingsampledpacketheaders.Thisapproachalsogivesaccesstoadditionalfieldsinpacketheaders,suchasthesequencenumber,whichcouldbeusefulfortrafficanaly-sis.However,toensuregenerality,wewillconcentrateoninformationavailableinsampledNetFlowstyledata,whichcouldbeconstructedfromsFlowlogsifneeded(theconverseisnottrue).

Notonlyishigh-speedtrafficmonitoringpossiblewithstandardnetworkingequipment,butitiscommonpracticetodoso.Twoexampleswhichareparticu-larlyrelevanttothispaperarethatAMS-IXrecorddatafortrafficmanagementmonitoring[20]andLINX(whorecord1in2048packets[21])additionallyareconsideringusingsFlowdatafordetectingemailspam[22].ThesamedatacouldalsoassisttrackingusersofananonymitynetworkbecauseSection3.1showedthatasignificantnumberofTorflowspassthroughanIX.Inthefollowingsectionwewillexaminehowsuccessfulthistypeoftrafficanalysiswouldbe.4.2

TrafficAnalysisAssumptions

Therearetwobasictypesoftrafficanalysis.Thefirsttreatstheanonymitynetworkasa“black-box”andonlyinspectstrafficenteringandleavingthenet-work.Thesecondapproachadditionallyexaminesflowswithinthenetwork,andsoimprovestheaccuracyoftheattack.Inthispaper,wewillconcentrateontheformercategory.Asthisdoesnotmakeanyassumptionsaboutthestructureofthenetwork,itisthemoregeneralapproach.However,thetechniqueswepresentherecouldalsobeappliedtothelattercategoryofattacks,asintra-networkTortrafficwillalsooftencrossasmallnumberofInternetexchanges.

WeassumethattheattackedflowpassesthroughanattackercontrolledIXonbothitspathintoandoutoftheanonymitynetwork.Thiswouldbethecaseif,forexample,boththecustomerandsitearehostedonISPswhoseback-boneconnectionwasthroughanIXundersurveillance.Also,weassumethatpacketsamplingisindependentlyandidenticallydistributedovertheflow.Al-thoughsomemodelsofnetworkhardwareimplementperiodicsampling,ratherthanrandom,thisassumptionwillremaintruebecauseTortrafficmakesupaninsignificantproportionofoveralltraffic.

Theattackerobservesasingleflowgoingintothenetworkandwishestoestablishwhichofseveraloutgoingflowsitcorrespondsto.Thiscouldbe,forexample,findingwhichwebsiteaknowncriminalisuploadingstolendatato.Alternatively,theattackermightwishtodiscoverwhohasuploadedaparticularvideotoanewswebsite–nowthereisoneoutgoingflowandmanyincomingcandidates.Inbothcases,theattackerwillhaveanumberofcandidatesinmindwhoarealsogeneratingtrafficatthesametime,andforoursimulationwe

assumethattheseproducearound1000flowsperhour.WealsoassumethattheadversarycandistinguishTortrafficfromothertraffic,whichmaytriviallydonebyIPaddressandportnumber,basedoninformationintheTordirectory.

5

5.1

MathematicalAnalysis

Model

Ourmodelconsistsofnclient-serverflows.Eachflowp=p1,...,pmisacol-lectionofpacketssentattimest1,...,tm.WemodelthetimesasaPoissonprocesswithastarttimes,durationl,andrater(averagepacketspersecond).Thesethreeparametersarechosenindependentlyatrandomforeachflow.

Neithers,l,rnortheflowparedirectlyobservable.Theattackerseesadown-sampledversionofp,inwhicheachpacketisretainedindependentlywithafixedprobabilityq,calledasamplingrate(typicallyabout1/2000).Eachflowissampledattheinputandattheoutput,resultingintwovectorsoftimes:xandy.Givenaflowp,thesamplingprocessesp→xandp→yareindependent:

s,l,r

Poisson

−→px

sampling

←−p

sampling

−→y(1)

Inann-flowsystem,theattackerseesallnoutputvectorsy1,...,yn,andoneinputvectorx,whichcorrespondstosomeyk.ThetaskoftheattackeristocomputetheprobabilityP(Tk)thatxcorrespondstoyk,foreachk.

Tosimplifythemodel,weassumethatnopacketfrompappearssimultane-ouslyinbothxandy.Sincexandyareindependentlysampledfromp,agivenpacketfrompappearsinbothxandywiththeprobabilityofq2=2.5·10−7,thatis,onceevery1/q2=4·106packets(≈2GB).Seeingthesamepacketontheinputandtheoutputisthusveryunlikely,whichpreventspacket-matchingat-tacks[9]andmakesindependentrandomdelaysofindividualpacketspracticallyunobservableinthesampleddata.Forsimplicity,wethereforeassumeinstanta-neouspackettransmission.Section5.5showsthatintroducingamoderatedelaytothesystemdoesnotchangetheeffectivenessofourattack.

Theassumptionofnocommonpacketsinxandyallowsustosimplify(1)byobservingthatxandyarenowindependentPoissonprocesseswithraterq.

x

Poisson

←−s,l,rq

Poisson

−→y(2)

Thissimplificationeliminatestheoriginal(unobservable)flowpfromthemodel.5.2

BasicSolution

LetTkdenotetheeventinwhichinputxandoutputykbelongtothesameflow.Inourmodel,theexactprobabilitiesP(Tk)canbeuniquelydeterminedfromBayes’formula:

P(y1..n|Tk,x)P(Tk|x)

.P(Tk|y1..n,x)=󰀇P(y|T,x)P(T|x)1..niii

(3)

ProbabilitiesP(Tk|x)expressourpriorinformationaboutthetarget,possiblybasedonthesampledinputflowx(butnotoutputflowy).Forexample,wemightknowthataparticularserverkisjustmorepopularthanothers,orthatitistheonlyonetoregularlyreceivehigh-volumetrafficandxlookstobehigh-volume.Forsimplicity,intherestoftheanalysis,wetreatallserversequally;anypriorinformationcanbeeasilytakenintoaccountusing(3).

TheprobabilitiesP(y1..n|x,Tk)in(3)canbecomputedasfollows:

P(y1..n|x,Tk)=P(yk|x,Tk)

󰀊

i=k

P(yk|x,Tk)󰀊

P(yi)=P(yi).

P(yk)i

(4)

Here,weusedthefactthatoutputflowsyiareindependent,andthatP(yi|Tk)=

P(yi):theinformationaboutinput-outputconnectionTkisonlyrelevantforstatementsthatinvolvebothinputsandoutputs(suchasP(yk,x|Tk)).Sinceweareonlyinterestedinrelativeprobabilitiesfordifferentk’s,wecan󰀈

ignoreallfactorsindependentofk,suchasP(x|Tk)=P(x)oriP(yi),astheywouldcanceloutin(3)anyway:

P(yk|x,Tk)P(yk,x|Tk)P(x,yk|Tk)

=∼.

P(yk)P(x|Tk)P(yk)P(yk)

(5)

WethereforeneedtocomputeP(yk)andP(x,yk|Tk).Wearedealingwithasingleflowx←p→yk,so–toavoidnotationalclutter–wewilldroptheexplicitindexkandassumptionTkfromourformulae.Inthenewnotation,wehaveP(y)andP(x,y),whichcanbecomputedfromappropriateconditionalprobabilitiesbyintegratingouttheunknownparameterss,l,r:

󰀋

P(y)=P(y|s,l,r)P(s,l,r).(6)

s,l,r󰀋󰀋

P(x,y)=P(x,y|s,l,r)P(s,l,r)=P(x|s,l,r)P(y|s,l,r)P(s,l,r).(7)P(Tk|y1..n,x)∼P(y1..n|x,Tk)∼

(3)

(4)

s,l,r

s,l,r

Thelastequalityholdsbecausexandy,generatedbymodel(2),areindependentgivens,l,r.ThedistributionP(s,l,r)expressesourpriorknowledgeaboutflowstartingtimes,durations,andrates.

Wedividetheinterval[s,s+l]intoinfinitesimallysmallwindowsofsizedt.SinceyisaPoissonprocess(2),theprobabilityofobservingasinglepacketinonesuchwindowisrqdt.Theprobabilityofnopacketsin[s,s+l]ise−rql.Thus,

󰀆

e−rql(rqdt)nyifalltimesiny∈[s,s+l],

P(y|s,l,r)=(8)

0otherwise.Here,nyisthenumberofpacketsiny.Thesameformula(withnx)holdsfor

P(x|s,l,r).SinceP(x,y|s,l,r)=P(x|s,l,r)P(y|s,l,r),wealsohave

󰀆

e−2rql(rqdt)nx+nyifalltimesinx,y∈[s,s+l],

P(x,y|s,l,r)=(9)

0otherwise.

5.3Long-LivedFlows

Wefirstconsiderasimplifiedmodel,inwhichallflowsstartatthesameknowntimesandhavethesameknowndurationl(basically,[s,s+l]isourobservationwindow).Theonlyfactordistinguishingtheflowsistheir(unknown)rater.From(8),weget:

P(y)=

󰀋

r

P(y|r)P(r)=

󰀋

r

e−rql(rqdt)nyP(r).

(10)

whereP(r)isourpriorinformationabouttherater.Sincerisapositiveparam-eter,weexpressourcompletelackofpriorknowledgebyusingthescale-invariantJeffrey’signorancepriorP(r)∼r−1dr[23].Thisbasicallysaysthatlogrisdis-tributeduniformly:theprobabilityofr∈[a,b]isproportionaltolog(b/a).Forexample,r∈[1,10]andr∈[10,100]havethesameprobability.

dtny

P(y)=(rqdt)eP(r)=(qdt)redr=nyΓ(ny).

lrr=0

(11)󰀉∞a−1−bz

a

Weused0zedz=Γ(a)/b;forintegernwehaveΓ(n)=(n−1)!.

Similarly,from(9),

(10)

ny−rql

ny

ny−1−rql

󰀋󰀋

P(x,y)=

󰀋

r

(rqdt)

nx+ny−2rql

e

dtnx+ny

Γ(nx+ny).P(r)=

(2l)nx+ny

(12)

Wecannowuse(5)tocomputethefinalprobability:

dtnxΓ(nx+nyk)Γ(nx+nyk)P(x,yk|Tk)=·∼.P(Tk|y1..n,x)∼

P(yk)(2l)nx2nykΓ(nyk)2nykΓ(nyk)

(13)

Interpretation.Fig.3(a)showsanormalizedplotof(13)fornx=5asa

functionofny.Themaximumprobabilityisassignedtony≈nx,whenthenumbersofobservedpacketsontheinputandontheoutputaresimilar.Thisconfirmsourintuitionandalsoyieldsquantitativeprobabilitiesfordifferentny’s,whichcanbeusedforcombiningevidencefrommultipleobservations.

Theexactmaximumoccursforny>nxbecausethepriorP(r)∼r−1dr

6

causesP(r∈[4,5])>P(r∈[5,6])(because54>5).Thismakessmallny’smoreprobabletobeproducedbychancethanlargerones,decreasingtheirmatchprobability.UsingStirling’sapproximationofn!,weget(seeappendix):

P(Tk|y1..n,x)∼

(nx+ny−1)nx+ny−22ny(ny−1)ny−211

,(14)

whichverycloselymatchestheoriginal,asshowninFig.3(a).Themaximumof(14),obtainedbycomparingitsderivativetozero,isny≈nx+12.

P( Tk | x, y1..n )maxy051015202530real (13)approx (14)

350−5(18)

−25−20−15−10

−5miny

0

5

10

15

02468ny

101214

(a)P(Tk|x,y1..n)givenby(13)forfixed(b)logP(Tk|x,y1..n)givenby(18)fornx=5andnyrangingfrom0to15.nx=ny=5,minx=0,maxx=10,

andvariableminyandmaxy.Fig.3.Relativeprobabilitiesbasedon(a)observedpacketcountsand(b)lengths.

5.4GeneralFlows

Now,weconsiderthegeneralcase,inwhichflowshavedifferent(unknown)du-rationslandstartingtimess.From(8),wecancomputeP(y|l,r)byintegratingsout.Foragivendurationl,thepossiblestartingtimessbelongtotheinterval[maxy−l,miny].Ifly=maxy−minyistheobservedlengthofy,thenthisintervalofpossiblevaluesofshasthelength(l−ly)0=max{l−ly,0}.Assuminglackofpriorknowledgeabouts(uniformpriorP(s)∼ds),wehave

󰀋

(8)

P(y|l,r)=P(y|s,l,r)P(s)∼(l−ly)0e−rql(rqdt)ny.(15)

s

UsingJeffrey’spriorsP(l)∼l−1dlandP(r)∼r−1dr,weget:P(y)=

󰀋

P(y|l,r)P(l,r)=(l−ly)0e−rql(rqdt)nyl−1r−1drdl=l,rl,r

󰀋󰀋

(qdt)ny(l−ly)0l−1e−rqlrny−1drdl=

lr󰀋

(qdt)ny(l−ly)0l−1Γ(ny)(ql)−nydl=dt

ny

󰀋

Γ(ny)

󰀋

l∞

(l−ly)l

−ny−1

dl=dt

ny

l=ly

lyy

Γ(ny).(16)

ny(ny−1)

−n+1

WecancomputeP(x,y)inasimilarway.Letnxy=nx+nybethetotalnumberofpacketsinxandy,andlxy=max{maxx,maxy}−min{minx,miny}

theobservedlengthofsuperimposedsequencesxandy.Ingeneral,lxy=lx+ly.P(x,y)=

󰀋

l,r

(l−lxy)0e−2rql(rqdt)nxyl−1r−1drdl=

Γ(nxy)dtnxy

xy

2nxy(nxy)(nxy−1)lxy

n−1.

(17)

Ignoringallfactorsindependentofk,(5)givesusthefinalprobabilitynyk(nyk−1)Γ(nxyk)lykkP(x,yk|Tk)

·∼ny·nxy−1.(18)P(Tk|x,y1..n)=

kP(yk)2xkΓ(nyk)nxyk(nxyk−1)lxy

k

ny−1

Interpretation.Formula(18)consistsofthreefactors:(i)therateformula

(13),(ii)arate-dependentcorrectionny(ny−1)/(nxy(nxy−1)),and(iii)the

n−1nxy−1

length-dependentfactorlyy/lxy,whichisofthemostinteresttoushere.

Considermatchinganinputflowwiththeobservedstartingtimeminx=0,endingtimemaxx=10,andnx=5observedpackets,againstoutputflowsywiththesamenumberofobservedpacketsny=5.Forvariousstartingandend-ingtimesminyandmaxy,Fig.3(b)presentsthematchinglikelihoodassignedby(18)(sincenxandnyareconstant,soarethefirsttwofactors).

Asexpected,themaximumisattainedwhentheobservedstartingandendingtimesofbothflowscoincide:minx=miny=0andmaxx=maxy=10.Eachcontourlineconsistsoftwoparallelstraightlinesjoinedbytwocurves.Thetwostraightlinescorrespondtotheobservedinputflowperiodcompletelycontainingtheobservedoutputflowperiod,andviceversa.

Optimality.Thederivationof(18)isstrictlyBayesian,so–giventhemodelassumptions–theresultisexactandusesallrelevantinformation.Notethat,despitethetimingsofallpacketsbeingavailablethroughxandy,formula(18)usesonlythetotalpacketcounts(ny,nxy)andtheobservedlengths(ly,lxy).Thisshowsthattheexacttimingsofindividualpackets(usedbytiming-basedattacks)areirrelevantfortheinferenceinourmodel.5.5

Evaluation

ToevaluatetheeffectivenessofourmethodinattackinganindividualTornode,wefirstcollectedrealtrafficdistributionsofobservedflowratesanddurations(Fig.4).Then,weperformedanumberofsimulationsofa120minexecutionofanode.Flowdurations(1–30min)andrates(0.1–50packets/s)weredrawnfromthelog-uniform(P(z)∼z−1dz)prior,consistentwithFig.4.Startingtimeswereselecteduniformlyfromtheinterval[0,120min−l].

Ourscoringmethodwas“1”ifthehighestprobabilitywasassignedtothecorrecttarget,and“0”otherwise(ifi>1targetssharedthetopprobability,

Rate (packets/s)0.0250

0.100.502.0010.0050.0010020050010002000

Flow duration (s)

thenthescorewas1/iinsteadof1).Foreachsimulation,weappliedtheattackindependentlytoeachinput,andthenaveragedtheresults.

Wevariedthefollowingparameters:thenumberofflowsperhour(50–1000),thesamplingrateq(1/100–1/2000),themeannetworklatency(0–10min),andtheattackmethod.Ourparameterrangesareconsistentwiththeirrealvalues:ourTornodetransmitted479flows/honaverage,theaverageTornetworkla-tencywas0.5s,andthecurrenttypicalsamplingrateis1/2048,butmayincreaseinthefuture.TheresultsofoursimulationsaresummarizedinFig.5.Averagenumberofflows.Fig.5(a)confirmsthatmoreflowsprovidemorepro-tection.Foratypicalnumberof500flows/h,theattackhada50%chanceofsuccesswhenthetargetsends≈20000packets,thatis≈10MBofdata.With50flows/h,thesamesuccessraterequiredonly7000packets(3.5MB).Samplingrate.Fig.5(b)suggeststhattheeffectivenessoftheattackdependsonlyonthenumberofsampledpackets,sodoublingthesamplingrateisequiva-lenttodoublingthenumberoftransmittedpackets.Forthetechnicallyfeasiblesamplingrateof1/100,asuccessrateof50%requiredonly1000transmittedpackets(500kB).

Attackmethods.Wecomparedthefollowingattacks:(i)rateattack,whichap-plies(13),takingintoaccounttheobservednumberofpacketsandignoring

(a) variable flows/hour

1.050 flows/h100 flows/h250 flows/h500 flows/h1000 flows/h

1.0(b) variable sampling rate

1 per 100 1 per 2001 per 5001 per 10001 per 2000

0.8P(correct target)P(correct target)1,000

10,000

100,000

0.60.40.20.010100

0.010

0.20.40.60.81001,00010,000100,000

transmitted packetstransmitted packets

(c) variable attack method

1.0full attackdurationsrate+overlaprate only

1.0(d) variable delay

no delay30 s1 min2 min5 min

0.8P(correct target)P(correct target)1,000

10,000

100,000

0.60.40.20.010100

0.010

0.20.40.60.81001,00010,000100,000

transmitted packetstransmitted packets

Fig.5.Simulationresults:theprobabilityofchoosingthecorrecttarget,asafunctionofthenumberoftransmittedpackets,forvaryingnumbersofflows/hour(default1000),samplingrate(1/2000,except(c)),attackmethod(fullattack),anddelay(0).

packettimes;(ii)rate+overlapattack,whichadditionallyignoresoutputswithobservedpackettimingsdisjointwiththeinput;(iii)lengthattack,whichselectstheoutputywiththehighestratioly/lxy;(iv)fullattack,whichuses(18).

Fig.5(c)showstheeffectivenessofthesefourmethodsinasystemwithasamplingrateof1/100.Thecombinedrateandlengthinformation(18)resultedina50%successratefor≈1000packets(10sampled).Incomparison,takingonlyonefactor(rateorlength)intoaccount,required100timesmorepacketstoachievethesameaccuracy.

Delays.Fig.5(d)showstheeffectsofintroducinganexponentiallydistributedrandomdelaytothesystem.Theeffectivenessofourattackstayedapproximatelythesamefordelaysupto30s,andthenstartedtodeteriorate,reachingthe0%levelfora5mindelay.Note,however,thatourattackexplicitlyassumesnodelaywhatsoever,thereforethisresultdoesnotmeanthata5-minuterandomdelaysafeguardsagainstallsamplingattacks.

6FutureWork

Forsimplicity,weignoredseveralphenomenathatoccurinpractice,suchasdifferentsamplingratesandhowTorcellsaresplitoverIPpackets.Generalizingouranalysistosupportdifferentknownsamplingratesatinputandoutputseemsstraightforward(butanattackbyasingleadversarywithafixedsamplingrateismostlikely).Similarly,theeffectofpacketsplittingbyTornodesseemstobestatisticallyequivalenttodifferentsamplingrates.OuranalysiscouldalsobemodifiedtotakeTCPsequencenumbers,availablefromsFlowrecords,intoaccount,togivemoreaccurateratecalculation.

Asreasonablerandomdelaysdonotprotectagainstourattack,weplantoexamineotherdefenses,suchasamoderateamountofdummytraffic.Wewouldalsoliketomeasuretheeffectivenessofourattackagainstrealsystems,usinganempiricallydeterminedpriordistributionondurationsandrates,forboththeanalysis(numericalintegrationrequired)andtheevaluation.Ideally,suchanevaluationshouldbeperformedfortheentireTorsystem,withitsaverage1millionflowsperhour.

Furthermore,weareconsideringhowintra-networktrafficanalysiscouldbeperformed.Similartechniquescouldbeused,andarelikelytoworkbetterthanwhole-networkanalysissincethenumberofflowswillbesmaller.However,therearecomplicationswhichmustbeconsidered,inparticularthatmultipleflowsbetweenthesamepairofTornodesmaybemultiplexedwithinoneencryptedTLStunnel.Animprovedanalysiswouldtakethispossibilityintoaccountandempiricalstudieswouldshowtowhatextentthisinterfereswithanalysis.

7Conclusion

WehavedemonstratedthatInternetexchangesareaviable,andpreviouslyunexamined,monitoringpointfortrafficanalysispurposes.TheyarepresentonmanypathsthroughoursampleoftheTornetwork,evenwhereBGPdatawouldnotdetectanycommonpointsoffailure.Furthermore,Internetexchangesareparticularlyrelevantasinsomecasestheymayrecord,andpotentiallyretaindataadequatetoperformtrafficanalysis.

Tovalidatetowhatextentthiswastrue,wedevelopedtrafficanalysistech-niqueswhichworkonthesampleddatawhichisbeingcollectedinpracticebyInternetexchanges.UsingaBayesianapproach,weobtainedthebestpossibleinference,whichmeansthatwecannotonlyattackvulnerablesystems,butalsodeclareothersassafeunderourthreatmodel.Ourprobabilityformulaisdifficulttoobtainbytrial-and-error,and–asweshow–cangiveordersofmagnitudebetterresultsthansimpleintuitiveschemes.

Wealsoshowthatexact“internal”packettimingsareirrelevantforoptimuminference,sotiming-basedattackscannotworkwithsparselysampleddata.Forthesamereason,deliberaterandompacketdelaysdonotprotectlow-latencyanonymitysystemsagainstourattack,astheminimumsensiblelatency(1min)isunacceptableforwebbrowsingandsimilaractivities.

AcknowledgmentsWethankRichardClayton,ChrisHall,MarkusKuhn,AndreiSerjantovandtheanonymousreviewersforproductivecomments,andalsotheTornodeoperatorswhocollectedthedatausedinthispaper.

References

1.Danezis,G.,Dingledine,R.,Mathewson,N.:Mixminion:DesignofaTypeIIIAnonymousRemailerProtocol.In:Proceedingsofthe2003IEEESymposiumonSecurityandPrivacy.(2003)2.M¨oller,U.,Cottrell,L.,Palfrader,P.,Sassaman,L.:MixmasterProtocol–Version2.Draft(2003)

3.Dingledine,R.,Mathewson,N.,Syverson,P.:Tor:Thesecond-generationonionrouter.In:Proceedingsofthe13thUSENIXSecuritySymposium.(2004)4.Berthold,O.,Federrath,H.,K¨opsell,S.:WebMIXes:AsystemforanonymousandunobservableInternetaccess.InFederrath,H.,ed.:ProceedingsofDesigningPrivacyEnhancingTechnologies:WorkshoponDesignIssuesinAnonymityandUnobservability,Springer-Verlag,LNCS2009(2000)115–129

5.Boucher,P.,Shostack,A.,Goldberg,I.:Freedomsystems2.0architecture.Whitepaper,ZeroKnowledgeSystems,Inc.(2000)

6.Serjantov,A.,Murdoch,S.J.:Messagesplittingagainstthepartialadversary.In:ProceedingsofPrivacyEnhancingTechnologiesworkshop(PET2005),Springer-Verlag,LNCS3856(2005)

7.Serjantov,A.,Sewell,P.:Passiveattackanalysisforconnection-basedanonymitysystems.In:ProceedingsofESORICS2003.(2003)

8.Levine,B.N.,Reiter,M.K.,Wang,C.,Wright,M.K.:Timingattacksinlow-latencymix-basedsystems.InJuels,A.,ed.:ProceedingsofFinancialCryptography(FC’04),Springer-Verlag,LNCS3110(2004)

9.Danezis,G.:Thetrafficanalysisofcontinuous-timemixes.In:ProceedingsofPrivacyEnhancingTechnologiesworkshop(PET2004).Volume3424ofLNCS.(2004)

10.Dai,W.:Pipenet1.1.PosttoCypherpunksmailinglist(1998)http://www.

eskimo.com/~weidai/pipenet.txt.

11.Øverlier,L.,Syverson,P.:Locatinghiddenservers.In:Proceedingsofthe2006

IEEESymposiumonSecurityandPrivacy,IEEECS(2006)

12.Bauer,K.,McCoy,D.,Grunwald,D.,Kohno,T.,Sicker,D.:Low-resourcerouting

attacksagainstanonymoussystems.TechnicalReportCU-CS-1025-07,UniversityofColoradoatBoulder(2007)

13.Acquisti,A.,Dingledine,R.,Syverson,P.:OntheEconomicsofAnonymity.

InWright,R.N.,ed.:ProceedingsofFinancialCryptography(FC’03),Springer-Verlag,LNCS2742(2003)

14.Feamster,N.,Dingledine,R.:Locationdiversityinanonymitynetworks.In:Pro-ceedingsoftheWorkshoponPrivacyintheElectronicSociety(WPES2004),Washington,DC,USA(2004)

15.Jacobson,V.:traceroute(1)(1987)ftp://ftp.ee.lbl.gov/traceroute.tar.gz.16.TeamCymru:IPtoASNlookup(v1.0)http://asn.cymru.com/.

17.Jacobson,V.,Leres,C.,McCanne,S.:tcpdump(1)(1989)http://www.tcpdump.

org/.

18.Claise,B.:CiscosystemsNetFlowservicesexportversion9.RFC3954,IETF

(2004)

19.Phaal,P.,Panchen,S.,McKee,N.:InMoncorporation’ssFlow:Amethodfor

monitoringtrafficinswitchedandroutednetworks.RFC3176,IETF(2001)

20.Jasinska,E.:sFlow–Icanfeelyourtraffic.In:23C3:23rdChaosCom-municationCongress.(2006)http://events.ccc.de/congress/2006/Fahrplan/attachments/1137-sFlowPaper.pdf.

21.Hughes,M.:LINXnews.http://www.uknof.org.uk/uknof4/Hughes-LINX.pdf

(2006)

22.Clayton,R.:spamHINTSproject(2006)http://www.spamhints.org/.

23.Jaynes,E.T.:ProbabilityTheory:TheLogicofScience.CambridgeUniversity

Press(2003)

AAppendix

Theorem1.Formula(13)attainsmaximumforny≈nx+12.Proof.Stirling’sfactorialapproximationgivesus

󰀂n󰀃n√n!≈2πn.

e

Denotinga=nx,b=ny,andc=a+b,wehave:P(Tk|y1..n,x)∼

2π(c−1)(c−1)!Γ(a+b)e=≈∼󰀍b−1󰀁b−1󰀌2bΓ(b)2b(b−1)!b22π(b−1)e(c−1)2b(b

c−12b−12󰀍c−1󰀁c−1󰀌

−1)

=X.(19)

InsteadoffindingthemaximumofX,itiseasiertofindthemaximumof

logX:

1

logX=(c−12)log(c−1)−blog2−(b−2)log(b−1).

(20)

WecanfindthemaximumoflogXbydifferentiatingitw.r.t.b,andremem-beringthatc󰀁=(a+b)󰀁=1:

1

c−2b−12(logX)=log(c−1)+−log2−log(b−1)−

c−1b−1

11

−log2−log(b−1)−=log(c−1)+

2(c−1)2(b−1)

󰀄1󰀅c−12≈log(c−12)−log2−log(b−2)=log2b−1.

󰀁

(21)

Now,(logX)󰀁=0impliesc−

ny=nx+12.

1

2=2b−1,whichimpliesb=a+12,thatis

因篇幅问题不能全部显示,请点此查看更多更全内容