Profiling S 3 D on Cray XT 3

  • Slides: 26
Download presentation
Profiling S 3 D on Cray XT 3 using TAU Sameer Shende tau-team@cs. uoregon.

Profiling S 3 D on Cray XT 3 using TAU Sameer Shende tau-team@cs. uoregon. edu

Acknowledgements r r r Alan Morris [UO] Kevin Huck [UO] Allen D. Malony [UO]

Acknowledgements r r r Alan Morris [UO] Kevin Huck [UO] Allen D. Malony [UO] Kenneth Roche [ORNL] Bronis R. de Supinski [LLNL] Profiling S 3 D Harness TAU Performance System 2

TAU Parallel Performance System r http: //www. cs. uoregon. edu/research/tau/ r Multi-level performance instrumentation

TAU Parallel Performance System r http: //www. cs. uoregon. edu/research/tau/ r Multi-level performance instrumentation ¦ r r Multi-language automatic source instrumentation Flexible and configurable performance measurement Widely-ported parallel performance profiling system Computer system architectures and operating systems ¦ Different programming languages and compilers ¦ r Support for multiple parallel programming paradigms ¦ Multi-threading, message passing, mixed-mode, hybrid Profiling S 3 D Harness TAU Performance System 3

TAU Performance System Architecture event selection Profiling S 3 D Harness TAU Performance System

TAU Performance System Architecture event selection Profiling S 3 D Harness TAU Performance System 4

TAU Performance System Architecture Profiling S 3 D Harness TAU Performance System 5

TAU Performance System Architecture Profiling S 3 D Harness TAU Performance System 5

Program Database Toolkit (PDT) Application / Library C / C++ parser IL C /

Program Database Toolkit (PDT) Application / Library C / C++ parser IL C / C++ IL analyzer Program Database Files Profiling S 3 D Harness Fortran parser F 77/90/95 IL Fortran IL analyzer DUCTAPE PDBhtml Program documentation SILOON Application component glue CHASM C++ / F 90/95 interoperability TAU_instr Automatic source instrumentation TAU Performance System 6

PAPI r Performance Application Programming Interface ¦ r r r The purpose of the

PAPI r Performance Application Programming Interface ¦ r r r The purpose of the PAPI project is to design, standardize and implement a portable and efficient API to access the hardware performance monitor counters found on most modern microprocessors. Parallel Tools Consortium project Developed by University of Tennessee, Knoxville http: //icl. cs. utk. edu/papi/ Profiling S 3 D Harness TAU Performance System 7

S 3 D - Building with TAU r Change name of compiler in build/make.

S 3 D - Building with TAU r Change name of compiler in build/make. XT 3 ¦ ¦ r ftn=> tau_f 90. sh cc => tau_cc. sh Set compile time environment variables ¦ ¦ setenv TAU_MAKEFILE /spin/proj/perc/TOOLS/tau_latest/xt 3/lib/ Makefile. tau-callpath-multiplecounters-mpi-papi-pdt-pgi Ø Choose callpath, PAPI counters, MPI profiling, PDT for source instrumentation setenv TAU_OPTIONS ‘-opt. Tau. Select. File=select. tau -opt. Pre. Process’ Ø Ø r Selective instrumentation file eliminates instrumentation in lightweight routines Pre-process Fortran source code using cpp before compiling Set runtime environment variables for instrumentation control and event PAPI counter selection in job submission script: ¦ ¦ ¦ export export Profiling S 3 D Harness TAU_THROTTLE=1 COUNTER 1 GET_TIME_OF_DAY COUNTER 2 PAPI_FP_INS COUNTER 3 PAPI_L 1_DCM COUNTER 4 PAPI_RES_STL COUNTER 5 PAPI_L 2_DCM TAU Performance System 8

Selective Instrumentation in TAU % cat select. tau BEGIN_EXCLUDE_LIST MCADIF GETRATES TRANSPORT_M: : MCAVIS_NEW

Selective Instrumentation in TAU % cat select. tau BEGIN_EXCLUDE_LIST MCADIF GETRATES TRANSPORT_M: : MCAVIS_NEW MCEDIF MCACON CKYTCP THERMCHEM_M: : MIXENTH THERMCHEM_M: : GIBBSENRG_ALL_DIMT CKRHOY MCEVAL 4 THERMCHEM_M: : HIS THERMCHEM_M: : CPS THERMCHEM_M: : ENTROPY END_EXCLUDE_LIST BEGIN_INSTRUMENT_SECTION loops routine="#" END_INSTRUMENT_SECTION Profiling S 3 D Harness TAU Performance System 9

TAU’s Para. Profile Browser - Manager Derived Metrics Flops = PAPI_FP_INS/wallclock time Profiling S

TAU’s Para. Profile Browser - Manager Derived Metrics Flops = PAPI_FP_INS/wallclock time Profiling S 3 D Harness TAU Performance System 10

Main Window - 8 cpus (MPI Ranks 0 -7) Some routines execute on different

Main Window - 8 cpus (MPI Ranks 0 -7) Some routines execute on different sets of processors Profiling S 3 D Harness TAU Performance System 11

Mean Profile Over 8 cpus -- Exclusive Time Profiling S 3 D Harness TAU

Mean Profile Over 8 cpus -- Exclusive Time Profiling S 3 D Harness TAU Performance System 12

Mean Percentage -- Exclusive Time Profiling S 3 D Harness TAU Performance System 13

Mean Percentage -- Exclusive Time Profiling S 3 D Harness TAU Performance System 13

Loop Level Profile With PAPI Counter Data Profiling S 3 D Harness TAU Performance

Loop Level Profile With PAPI Counter Data Profiling S 3 D Harness TAU Performance System 14

Para. Prof’s Source Browser Profiling S 3 D Harness TAU Performance System 15

Para. Prof’s Source Browser Profiling S 3 D Harness TAU Performance System 15

Exclusive MFLOPS Profiling S 3 D Harness TAU Performance System 16

Exclusive MFLOPS Profiling S 3 D Harness TAU Performance System 16

FP Instructions per L 1 Data Cache Miss (rank 0) Profiling S 3 D

FP Instructions per L 1 Data Cache Miss (rank 0) Profiling S 3 D Harness TAU Performance System 17

Level 1 Data Cache Misses Profiling S 3 D Harness TAU Performance System 18

Level 1 Data Cache Misses Profiling S 3 D Harness TAU Performance System 18

Callpath Profiles Profiling S 3 D Harness TAU Performance System 19

Callpath Profiles Profiling S 3 D Harness TAU Performance System 19

Callpath Profiles: Flops, Resource Stalls Profiling S 3 D Harness TAU Performance System 20

Callpath Profiles: Flops, Resource Stalls Profiling S 3 D Harness TAU Performance System 20

Callpath Thread Relations Window parent routine children Profiling S 3 D Harness TAU Performance

Callpath Thread Relations Window parent routine children Profiling S 3 D Harness TAU Performance System 21

Flat Profile Profiling S 3 D Harness TAU Performance System 22

Flat Profile Profiling S 3 D Harness TAU Performance System 22

TAU’s Para. Profile Browser - Manager Different sections of code within the same routine

TAU’s Para. Profile Browser - Manager Different sections of code within the same routine execute on odd and even processors! Profiling S 3 D Harness TAU Performance System 23

3 D Window: Rank, Routine, Time, Instructions Profiling S 3 D Harness TAU Performance

3 D Window: Rank, Routine, Time, Instructions Profiling S 3 D Harness TAU Performance System 24

3 D Window: Variations in FP/L 1 DCM ratios Profiling S 3 D Harness

3 D Window: Variations in FP/L 1 DCM ratios Profiling S 3 D Harness TAU Performance System 25

Getting Access to TAU on Jaguar r r set path=(/spin/proj/perc/TOOLS/tau_latest/x 86_64/bin $path) Choose Stub

Getting Access to TAU on Jaguar r r set path=(/spin/proj/perc/TOOLS/tau_latest/x 86_64/bin $path) Choose Stub Makefiles (TAU_MAKEFILE env. var. ) from /spin/proj/perc/TOOLS/tau_latest/xt 3/lib/Makefile. * ¦ ¦ ¦ r Makefile. tau-mpi-pdt-pgi (flat profile) Makefile. tau-mpi-pdt-pgi-trace (event trace, for use with Vampir) Makefile. tau-callpath-mpi-pdt-pgi (single metric, callpath profile) Binaries of S 3 D can be found in: Ø ~sameer/scratch/S 3 D-BINARIES l withtau » papi, multiplecounters, mpi, pdt, pgi options l without_tau Profiling S 3 D Harness TAU Performance System 26