21Adaptor:ACompilationSystemforDataParallelFortranPrograms
1.2TheAdaptorCompilationSystem3
41Adaptor:ACompilationSystemforDataParallelFortranPrograms
1.2TheAdaptorCompilationSystem5
61Adaptor:ACompilationSystemforDataParallelFortranPrograms
1.3ResultsofBenchmarkCodes7
81Adaptor:ACompilationSystemforDataParallelFortranPrograms
shortdescriptionoftheproblem1234567891112131415
Table1.1ProblemsofthePurdueSet
10485761024x10241024x1024524288128x40966553610x3276865536x8512x5122621441023x51126214426214416384
Threedifferentversionsoftheprogramshavebeenconsidered:
theFortran77versioncanbetranslatedforonenodeandisusedtomeasurerealspeedups,
theCMFversiongivesresultsfortheConnectionMachineandisusedforAdaptorwithslightlychangestogetautomaticallygeneratedmessagepassingprogramsfordifferentparallelmachines,
theparallelversion,Fortran77withexplicitmessagepassingbasedonthemessagepassinglibraryPICL,isusedtocomparetheresultsofAdaptorwithahand-codedmessagepassingprogram.
ComparisonofSequentialandParallelVersion
1.3.2
Table1.2showstheresultsofthesequentialFortran77versionrunningononenodeoftheiPSC/860comparedwiththegeneratedmessagepassingprogramofthehostlessprogrammingmodel.Theparallelprogramisonlyrunningononenode.
TheresultsshowthattheAdaptorversionisfasterthanthesequentialFortran77counterpartfortheproblems2,3,4,12,14and15.Thefollowingreasonsareresponsibleforthiseffect:
1.3ResultsofBenchmarkCodes
sequentialversion
284.7s2
47.0s
4
175.9s
6
131.1s
8
46.2s
11
182.6s
13
35.5s
15
92.0s555.9s
1.41
289.1s
8.34
186.5s
1.00
1008.9s
0.91
185.5s
0.84
33.8s
4.56ratio0.98
9
101Adaptor:ACompilationSystemforDataParallelFortranPrograms
8nodes7.96(99.5%)2
6.44(80.5%)4
7.12(89.0%)6
7.97(99.6%)8
7.97(99.6%)11
6.84(85.5%)13
7.20(90.0%)15
14.84(92.7%)15.80(98.7%)
25.20(78.8%)
14.90(93.1%)
16.85(52.6%)
13.92(87.0%)
28.88(90.2%)
8.75(54.7%)
31.17(97.4%)
13.16(82.2%)
27.91(87.2%)
14.08(88.0%)
25.75(80.5%)32nodes31.25(97.6%)
1.3ResultsofBenchmarkCodes
PICL:hand-coded1234791415
18.4s2.4s0.8s14.1s9.1s3.0s1.8s6.2s
11
Table1.4Comparisonwithhand-codedmessage-passingprograms
The14CMFapplicationshavebeencompiledwiththeCMFortrancompiler(SIMDmodel).TheruntimeshavebeencomparedwiththeruntimesofthebyAdaptorauto-maticallygeneratedmessagepassingprograms(MIMDmodel).ThevectorunitsoftheCM-5nodeshavenotbeenutilized.
Table1.5showstheresultsonaCM-5with64nodes.TheseresultsarepreliminaryasanearlyversionoftheCMFCompilerhasbeenusedandtheprogramshavebeenslightlymodifiedforthetranslationwithAdaptor.ButitshowsthattheexecutionofdataparallelprogramsinMIMDmodeismorefavorablethantheexecutioninSIMDmode,whichrequiresmoresynchronization.
no
16.8s6.2s14.6s25.8s43.4s98.7s24.8s22.8s8.4s34.2s28.1s56.0s4.9s14.1s
MIMDexecution
121Adaptor:ACompilationSystemforDataParallelFortranPrograms
1.5Summary
1node
hand-writtenmessagepassingAdaptorgeneratedversion
24.9s34.0s
16nodes
4.8s2.6s
13
Table1.6Comparisonofhand-writtenandautomaticallygeneratedMPprograms
Thesamedistributionandparallelizationstrategyhasbeenusedasproposedinaparallelizationbyhand[8].ThoughthedatastructuresofthecodemustberearrangedforAdaptorandtheruntimeishigher,theeffortforparallelizationwasmuchless.Table1.6showstheruntimesonaCM-5foraproblemwith63wavenumbers,thevectorunitsarenotused.
TheexperienceswiththeIFScodehaveshownthatAdaptorcanalsodealwithcoarse-grainedparallelismfromparallelloopsandnotonlywithfine-grainedparallelismfromarray-operations.Thepotentialoffine-grainedparallelismwasverylowinthegivencode.
1.5Summary
AdaptorisaprototypeofacompilationsystemforHighPerformanceFortranthathasgivenveryusefulinsightshowfaritispossibletotranslateefficientdataparallelprogramstoefficientmessagepassageprogramsforMIMDarchitecturesandinwhichsituationshand-writtenmessagepassingprogramscanbefaster.
WiththeseinsightsAdaptoritselfcanandwillbeusedtoimplementdifferentopti-mizationstrategiesforHighPerformanceFortrancompilers.AdaptorwillalsobeusedtodefineandtestlanguageextensionsofinterestthatcouldbepartinoneofthenextversionsofHighPerformanceFortran.EspeciallyfeaturesthatdealwithsparsematricesandparallelI/Oareofgreatinterest.Acknowledgements
IthanktheCentralInstituteforAppliedMathematicsattheresearchcenterinJ¨ulichforprovidingtheiPSC/860andRenateKnechtforherusersupport.
Thefollowingpeoplehaveinfluencedthisworkbyvaluablediscussions:JamesCownie(Meiko,Bristol),Clemens-AugustThole(GMD),RolfH¨anisch(GMD),MichaelGerndt(ZAM,ResearchCenterJ¨ulich),JohnMerlin(UniversityofSouthampton),andDaveWatson(NASoftware,Liverpool).ManyimprovementsofthecurrentreleasehavebeenproposedbytheAdaptorusers.
ManythanksarealsoduetoFalkZimmermannforhisimplementationworkandforthediscussionsaboutprovingcorrectnessofthetransformationsrealizedwithinAdaptor.
141Adaptor:ACompilationSystemforDataParallelFortranPrograms
1.5Summary15
因篇幅问题不能全部显示,请点此查看更多更全内容