您的当前位置:首页正文

1 1 Adaptor A Compilation System for Data Parallel Fortran Programs

2024-06-21 来源:客趣旅游网
1

21Adaptor:ACompilationSystemforDataParallelFortranPrograms

1.2TheAdaptorCompilationSystem3

41Adaptor:ACompilationSystemforDataParallelFortranPrograms

1.2TheAdaptorCompilationSystem5

61Adaptor:ACompilationSystemforDataParallelFortranPrograms

1.3ResultsofBenchmarkCodes7

81Adaptor:ACompilationSystemforDataParallelFortranPrograms

shortdescriptionoftheproblem1234567891112131415

Table1.1ProblemsofthePurdueSet

10485761024x10241024x1024524288128x40966553610x3276865536x8512x5122621441023x51126214426214416384

Threedifferentversionsoftheprogramshavebeenconsidered:

theFortran77versioncanbetranslatedforonenodeandisusedtomeasurerealspeedups,

theCMFversiongivesresultsfortheConnectionMachineandisusedforAdaptorwithslightlychangestogetautomaticallygeneratedmessagepassingprogramsfordifferentparallelmachines,

theparallelversion,Fortran77withexplicitmessagepassingbasedonthemessagepassinglibraryPICL,isusedtocomparetheresultsofAdaptorwithahand-codedmessagepassingprogram.

ComparisonofSequentialandParallelVersion

1.3.2

Table1.2showstheresultsofthesequentialFortran77versionrunningononenodeoftheiPSC/860comparedwiththegeneratedmessagepassingprogramofthehostlessprogrammingmodel.Theparallelprogramisonlyrunningononenode.

TheresultsshowthattheAdaptorversionisfasterthanthesequentialFortran77counterpartfortheproblems2,3,4,12,14and15.Thefollowingreasonsareresponsibleforthiseffect:

1.3ResultsofBenchmarkCodes

sequentialversion

284.7s2

47.0s

4

175.9s

6

131.1s

8

46.2s

11

182.6s

13

35.5s

15

92.0s555.9s

1.41

289.1s

8.34

186.5s

1.00

1008.9s

0.91

185.5s

0.84

33.8s

4.56ratio0.98

9

101Adaptor:ACompilationSystemforDataParallelFortranPrograms

8nodes7.96(99.5%)2

6.44(80.5%)4

7.12(89.0%)6

7.97(99.6%)8

7.97(99.6%)11

6.84(85.5%)13

7.20(90.0%)15

14.84(92.7%)15.80(98.7%)

25.20(78.8%)

14.90(93.1%)

16.85(52.6%)

13.92(87.0%)

28.88(90.2%)

8.75(54.7%)

31.17(97.4%)

13.16(82.2%)

27.91(87.2%)

14.08(88.0%)

25.75(80.5%)32nodes31.25(97.6%)

1.3ResultsofBenchmarkCodes

PICL:hand-coded1234791415

18.4s2.4s0.8s14.1s9.1s3.0s1.8s6.2s

11

Table1.4Comparisonwithhand-codedmessage-passingprograms

The14CMFapplicationshavebeencompiledwiththeCMFortrancompiler(SIMDmodel).TheruntimeshavebeencomparedwiththeruntimesofthebyAdaptorauto-maticallygeneratedmessagepassingprograms(MIMDmodel).ThevectorunitsoftheCM-5nodeshavenotbeenutilized.

Table1.5showstheresultsonaCM-5with64nodes.TheseresultsarepreliminaryasanearlyversionoftheCMFCompilerhasbeenusedandtheprogramshavebeenslightlymodifiedforthetranslationwithAdaptor.ButitshowsthattheexecutionofdataparallelprogramsinMIMDmodeismorefavorablethantheexecutioninSIMDmode,whichrequiresmoresynchronization.

no

16.8s6.2s14.6s25.8s43.4s98.7s24.8s22.8s8.4s34.2s28.1s56.0s4.9s14.1s

MIMDexecution

121Adaptor:ACompilationSystemforDataParallelFortranPrograms

1.5Summary

1node

hand-writtenmessagepassingAdaptorgeneratedversion

24.9s34.0s

16nodes

4.8s2.6s

13

Table1.6Comparisonofhand-writtenandautomaticallygeneratedMPprograms

Thesamedistributionandparallelizationstrategyhasbeenusedasproposedinaparallelizationbyhand[8].ThoughthedatastructuresofthecodemustberearrangedforAdaptorandtheruntimeishigher,theeffortforparallelizationwasmuchless.Table1.6showstheruntimesonaCM-5foraproblemwith63wavenumbers,thevectorunitsarenotused.

TheexperienceswiththeIFScodehaveshownthatAdaptorcanalsodealwithcoarse-grainedparallelismfromparallelloopsandnotonlywithfine-grainedparallelismfromarray-operations.Thepotentialoffine-grainedparallelismwasverylowinthegivencode.

1.5Summary

AdaptorisaprototypeofacompilationsystemforHighPerformanceFortranthathasgivenveryusefulinsightshowfaritispossibletotranslateefficientdataparallelprogramstoefficientmessagepassageprogramsforMIMDarchitecturesandinwhichsituationshand-writtenmessagepassingprogramscanbefaster.

WiththeseinsightsAdaptoritselfcanandwillbeusedtoimplementdifferentopti-mizationstrategiesforHighPerformanceFortrancompilers.AdaptorwillalsobeusedtodefineandtestlanguageextensionsofinterestthatcouldbepartinoneofthenextversionsofHighPerformanceFortran.EspeciallyfeaturesthatdealwithsparsematricesandparallelI/Oareofgreatinterest.Acknowledgements

IthanktheCentralInstituteforAppliedMathematicsattheresearchcenterinJ¨ulichforprovidingtheiPSC/860andRenateKnechtforherusersupport.

Thefollowingpeoplehaveinfluencedthisworkbyvaluablediscussions:JamesCownie(Meiko,Bristol),Clemens-AugustThole(GMD),RolfH¨anisch(GMD),MichaelGerndt(ZAM,ResearchCenterJ¨ulich),JohnMerlin(UniversityofSouthampton),andDaveWatson(NASoftware,Liverpool).ManyimprovementsofthecurrentreleasehavebeenproposedbytheAdaptorusers.

ManythanksarealsoduetoFalkZimmermannforhisimplementationworkandforthediscussionsaboutprovingcorrectnessofthetransformationsrealizedwithinAdaptor.

141Adaptor:ACompilationSystemforDataParallelFortranPrograms

1.5Summary15

因篇幅问题不能全部显示,请点此查看更多更全内容