MohitKumar
CarnegieMellonUniversity
Pittsburgh,USAmohitkum@cs.cmu.edu
NikeshGarera
JohnsHopkinsUniversity
Baltimore,USAngarera@cs.jhu.eduAlexanderI.Rudnicky∗CarnegieMellonUniversity
Pittsburgh,USAair@cs.cmu.edu
Abstract
Wedescribeabriefingsystemthatlearnstopredictthecontentsofreportsgeneratedbyuserswhocreateperiodic(weekly)reportsaspartoftheirnormalactivity.Wead-dressthequestionwhetherdataderivedfromtheimplicitsupervisionprovidedbyend-usersisrobustenoughtosup-portnotonlymodelparametertuningbutalsoaformoffeaturediscovery.Thesystemwasevaluatedunderrealis-ticconditions,bycollectingdatainaproject-baseduniver-sitycoursewherestudentgroupleadersweretaskedwithpreparingweeklyreportsforthebenefitoftheinstructors,usingthematerialfromindividualstudentreports.
1Introduction
Inthispaperwedescribeapersonalizedlearning-basedapproachtosummarizationthatminimizestheneedforlearning-experttimeandeliminatestheneedforexpert-generatedevaluationmaterialssuchasa“goldstandard”summary,sinceeachuserprovidestheirownstandard.Ofcoursethiscomesatacost,whichistheend-usertimeneededtoteachthesystemhowtoproducesatisfactorysummaries.Wewouldhoweverarguethatend-userinvolve-mentismorelikelytogeneratequalityproductsthatreflectbothfunctionalneedsanduserpreferencesandisindeedworththeeffort.
Thecurrentpaperdescribestheapplicationofthisap-proachinthecontextofanengineeringprojectclassinwhichstudentswereexpectedtoproduceweeklysum-mariesoftheirwork.Whileeachstudentproducestheirownlogs,theirteamleaderwasadditionallytaskedwith
query-relevanttextsummarysystembasedoninteractivelearning.Learningisintheformofqueryexpansionandsentencescoringbyclassification.[6]haveexploredin-teractivemulti-documentsummarization,wheretheinter-actionwiththeuserwasintermsofgivingtheusercontroloversummaryparameters,supportrapidbrowsingofdocu-mentsetandalternativeformsoforganizinganddisplayingsummaries.Theirapproachof‘contentselection’toiden-tifykeyconceptsinunigrams,bigramsandtrigramsbasedonthelikelihoodratio[4]isdifferentfromourstatisticalanalysisandisofsomeinterest.[13]haveproposedaper-sonalizedsummarizationsystembasedontheuser’sanno-tation.Theyhavepresentedagoodcaseoftheusefulnessofuser’sannotationsinobtainingpersonalizedsummaries.Howevertheirsystemdiffersfromthecurrentoneinsev-eralrespects.Theirscenarioisasingledocumentnewswiresummaryandisverydifferentfromabriefing.Also,theirsystemispurelystatisticalanddoesnotincludetheconceptofahuman-in-the-loopthatimprovesperformance.
[5]describeasummarizationsystemforarecurringweeklyreport-writingtakingplaceinaresearchproject.Theyfoundthatknowledge-engineeredfeaturesleadtothebestperformance,althoughthisperformanceisclosetothatbasedonn-gramfeatures.Giventhis,itwouldbedesir-abletohaveaprocedurethatleverageshumanknowledgetoidentifyhigh-performancefeaturesbutdoesnotrequiretheparticipationofexpertsintheprocess.Wedescribeanapproachtothisproblembelow.
3TargetDomain
Weidentifiedadomainthatwhilesimilarinitsreportingstructuretotheonestudiedby[5]differedinsomesignif-icantrespects.Specifically,thereport-writerswerenotex-periencedresearcherswhowereaskedtogenerateweeklyreportsbutwerestudentstakingacoursethatalreadyhadarequirementforweeklyreportgeneration.Thecoursewasproject-basedandtaughtinauniversityengineeringschool.Thestudents,whoweredividedintogroupsworkingondif-ferentprojectswererequiredtoproduceaweeklyreportoftheiractivities,tobesubmittedtotheinstructors.Eachgrouphadwell-definedroles,includingthatofaleader.Stu-dentsintheclassloggedthetimespentonthedifferentac-tivitiesrelatedtothecourse.Eachtime-logentryincludedthefollowingfields:date,categoryofactivity,timespentanddetailsoftheactivity.ThecategorywasselectedfromapredefinedsetthatincludedCoding,GroupMeeting,Re-searchandothers(allpreviouslysetbytheinstructors).Thetaskoftheteamleaderwastoprepareaweeklyre-portforthebenefitoftheinstructor,usingthetime-logen-triesoftheindividualteammembersasrawmaterial.Asthestudentswerealreadyusinganon-linesystemtocreatetheirlogs,itwasrelativelystraightforwardtoaugmentthis
applicationtoallowthecreationofleadersummaries.Theaugmentedapplicationprovidedaninterfacethatallowedtheleadertomoreeasilyprepareareportandwasalsoin-strumentedtocollectdataabouttheirbehavior.Instrumen-tationincludedmouseandkeyboardlevelevents(wedonotreportanyanalysisofthesedatainthispaper).
DataCollectionProcess:Following[5],theleaderse-lecteditemsfromadisplayofallitemsfromthestudentreports.Theleaderwasinstructedtogothroughtheitemsandselectasubsetforinclusioninthereport.Selectionwasdonebyhighlightingthe“important”words/phrasesintheitems(describedtotheparticipantasbeingthosewordsorphrasesthatledthemtoselectthatparticularitemforthereport).Theitemswithhighlightedtextautomaticallybe-comethecandidatebriefingitems.1Thehighlightedwordsandphrasessubsequentlyweredesignatedascustomuserfeaturesandwereusedtotrainamodeloftheuser’sselec-tionbehavior.
DataCollected:Wewereabletocollectatotalofcom-plete61group-weeksofdata.Onegroup-weekincludesthetimelogswrittenbythemembersofaparticulargroupandtheassociatedextractivesummaries.Theclassconsistedoftwostages,designandimplementation,lastingabout6and9weeksrespectively.Toprovideconsistentdatafordevel-opment,testingandanalysisofoursystem,weselected3groupsfromthelaterstageoftheclassthatproducedre-portsmostconsistently(thesearedescribedfurtherintheEvaluationsectionbelow).
4LearningSystem
WemodeledourLearningprocessontheonedescribedby[5];thatis,modelswererebuiltonaweeklybasis,usingalltrainingdataavailabletothatpoint(i.e.,fromtheprevi-ousweeks).Thismodelwasthenusedtopredicttheuser’sselectionsinthecurrentweek.Forexample,amodelbuiltonweeks1and2wastestedonweek3.Thenamodelbuiltonweeks,1,2and3wastestedonweek4,andsoon.
Becausethevocabularyvariedsignificantlyfromweektoweek,wetrainedmodelsusingonlythosewords(fea-tures)thatweretobefoundintherawdataforthetargetweek,sinceitwouldnotbemeaningfultotrainmodelsonnon-observedfeatures.
Theresultingmodelisusedtoclassifyeachcandidateitemasbelonginginthesummaryornot.Theconfidenceassignedtothisclassificationwasusedtorankordertherawitemsandthetop5itemsweredesignatedasthe(predicted)summaryforthatweek.Thefollowingsectionsexplainthelearningsysteminmoredetail.
TNI
1
10
3
18
5
15
7
25
9
Group1
NIS8
4
7
5
8.6
2
8.4
6
7.2
22.64.31.32.5
15241824-
Group2ANW
6
8.6
7
5.5
3
14.7
7
7.1
-
ANS2.82.76.11.7
TNI13111216
Group3
NIS
6.9
3
7.5
2
7.7
2
11.7
4
-
31.27.23.3-
weighingmethod-TF(termfrequency),TF.IDF,Salton-Buckley(SB)[12].b)CorpusformeasuringIDF:Foranyword,theinversedocumentfrequencycanbeobtainedbyconsideringeitherthedocumentsinthetrainingsetorthetestsetorboth.ThereforewehavethreedifferentwaysofcalculatingIDF.c)Normalizationschemeforthevariousscoringfunctions:nonormalization,L1andL2.
Featurescoringinthefirstsettingofextractinguni-gramfeaturesFRawisstraightforwardusingtheabovemen-tionedIRparameters(TF,TF.IDForSB).Forcombiningthescoresunderthesecondsettingwiththe‘user-specific’featuresweusedthefollowingequation:
Sf=(1+α)∗Sfbase
(1)
whereαistheweightcontributionfortheuser-specificfea-turesandSfbaseisthebasescore(TForTF.IDForSB).Weempiricallyfixedαto‘1’forthecurrentstudy.
Wetestedtheabovementionedvariationsoffeaturede-scription,featureextractionandfeaturescoringusingfourlearningschemes:NaiveBayes,VotedPerceptron,SupportVectorandLogisticRegression.Intheevent,preliminarytestingindicatedthatSupportVectorandLogisticRegres-sionwerenotsuitedfortheproblemathandandsothesewereeliminatedfromfurtherconsideration.WeusedtheWeka[11]packagefordevelopingthesystem.
4.2Evaluation
ThebaseperformancemetricisRecall,definedintermsoftheitemsrecommendedbythesystemcomparedtotheitemsultimatelyselectedbytheuser.4WejustifythisbynotingthatRecallcanbedirectlylinkedtotheexpectedtime
ideabeingtocaptureuser’spreferencewrtparticularclassesof
NEsi.e.theuserpreferstoselectanitemwhereapersonandorganizationarementionedtogether
3Wealsoexperimentedwithusingjusttheuser-specificfeaturesiniso-lationbutfoundtheselessusefulthanacombinationofallfeatures.
2The
0.70.70.70.60.60.60.50.50.5RecallRecallRecall0.40.40.40.30.30.30.2
User-Combined0.1
RawN-gramsBaseline0
1
2
3
0.2IG-Combined0.1User-CombinedRawN-gramsBaseline01230.2Cross-GroupIG-CombinedUser-CombinedRawN-gramsBaseline1230.10PhasePhasePhase(a)RecallValuesfortheFinalmodelforIndividualUserscomparingtheRawN-gramfeatureswithUser-Combinedfeatures
(b)RecallcomparisonbetweenIn-formationGainselectedfeaturesandUserselectedfeatures(c)RecallcomparisonbetweenIndi-vidualusertrainingandcross-grouptraining
Figure1.Figureshowingthevariousexperiments.
savingsfortheeventualusersofaprospectivesummariza-tionsystembasedontheideasdevelopedinthisstudy.Theobjectivefunctionsthatweusedforselectingthesystemmodel(builtonthebasisRecall)are:
1.Weightedmeanrecall(WMR):ofthesystemacrossallweeks.Theweeksaregivenlinearlyincreasingweights(normalized)whichcapturestheintuitionthattheperfor-manceinthelaterweeksisincreasinglymoreimportantastheyhaveconsecutivelymoretrainingdata.
2.Slopeofthephase-wiseperformancecurve(Slope):Wefirstcalculatethethreephase-wiserecallperformanceval-ues(normalaverageoftherecallvalues)andthencomputetheslopeofthecurveforthesethreepoints.
Notethatthesemetricsareusedasaselectioncriteriononly.ResultsinFigure1arestatedintermsoftheoriginalRecallvaluesaveragedoverthephaseandacrossthethreeusers.Wecomparethesewiththeresultsfortherandombaseline.Therandombaselineiscalculatedbyrandomlyselectingitemsoveralarge(10000)runsofthesystemanddeterminingthemeanperformancevalue.5
5ExperimentandResults
Weselectedforexperimentationthethreegroupsthatmostconsistentlygeneratedcompleteweeklydatasets.Thesegroupshad9,8and8completeweeksofdata.De-tailedstatisticsofthedataareshowninTable1.Sincethe
SelectionMechanism
InformationGain(IG)SingleUser(SU)OverlapIG/SU
Phase11.1
12.1
5.50
5.1
Phase31.617.60.4
因篇幅问题不能全部显示,请点此查看更多更全内容