Profiling S 3 D on Cray XT 3
- Slides: 26
Profiling S 3 D on Cray XT 3 using TAU Sameer Shende tau-team@cs. uoregon. edu
Acknowledgements r r r Alan Morris [UO] Kevin Huck [UO] Allen D. Malony [UO] Kenneth Roche [ORNL] Bronis R. de Supinski [LLNL] Profiling S 3 D Harness TAU Performance System 2
TAU Parallel Performance System r http: //www. cs. uoregon. edu/research/tau/ r Multi-level performance instrumentation ¦ r r Multi-language automatic source instrumentation Flexible and configurable performance measurement Widely-ported parallel performance profiling system Computer system architectures and operating systems ¦ Different programming languages and compilers ¦ r Support for multiple parallel programming paradigms ¦ Multi-threading, message passing, mixed-mode, hybrid Profiling S 3 D Harness TAU Performance System 3
TAU Performance System Architecture event selection Profiling S 3 D Harness TAU Performance System 4
TAU Performance System Architecture Profiling S 3 D Harness TAU Performance System 5
Program Database Toolkit (PDT) Application / Library C / C++ parser IL C / C++ IL analyzer Program Database Files Profiling S 3 D Harness Fortran parser F 77/90/95 IL Fortran IL analyzer DUCTAPE PDBhtml Program documentation SILOON Application component glue CHASM C++ / F 90/95 interoperability TAU_instr Automatic source instrumentation TAU Performance System 6
PAPI r Performance Application Programming Interface ¦ r r r The purpose of the PAPI project is to design, standardize and implement a portable and efficient API to access the hardware performance monitor counters found on most modern microprocessors. Parallel Tools Consortium project Developed by University of Tennessee, Knoxville http: //icl. cs. utk. edu/papi/ Profiling S 3 D Harness TAU Performance System 7
S 3 D - Building with TAU r Change name of compiler in build/make. XT 3 ¦ ¦ r ftn=> tau_f 90. sh cc => tau_cc. sh Set compile time environment variables ¦ ¦ setenv TAU_MAKEFILE /spin/proj/perc/TOOLS/tau_latest/xt 3/lib/ Makefile. tau-callpath-multiplecounters-mpi-papi-pdt-pgi Ø Choose callpath, PAPI counters, MPI profiling, PDT for source instrumentation setenv TAU_OPTIONS ‘-opt. Tau. Select. File=select. tau -opt. Pre. Process’ Ø Ø r Selective instrumentation file eliminates instrumentation in lightweight routines Pre-process Fortran source code using cpp before compiling Set runtime environment variables for instrumentation control and event PAPI counter selection in job submission script: ¦ ¦ ¦ export export Profiling S 3 D Harness TAU_THROTTLE=1 COUNTER 1 GET_TIME_OF_DAY COUNTER 2 PAPI_FP_INS COUNTER 3 PAPI_L 1_DCM COUNTER 4 PAPI_RES_STL COUNTER 5 PAPI_L 2_DCM TAU Performance System 8
Selective Instrumentation in TAU % cat select. tau BEGIN_EXCLUDE_LIST MCADIF GETRATES TRANSPORT_M: : MCAVIS_NEW MCEDIF MCACON CKYTCP THERMCHEM_M: : MIXENTH THERMCHEM_M: : GIBBSENRG_ALL_DIMT CKRHOY MCEVAL 4 THERMCHEM_M: : HIS THERMCHEM_M: : CPS THERMCHEM_M: : ENTROPY END_EXCLUDE_LIST BEGIN_INSTRUMENT_SECTION loops routine="#" END_INSTRUMENT_SECTION Profiling S 3 D Harness TAU Performance System 9
TAU’s Para. Profile Browser - Manager Derived Metrics Flops = PAPI_FP_INS/wallclock time Profiling S 3 D Harness TAU Performance System 10
Main Window - 8 cpus (MPI Ranks 0 -7) Some routines execute on different sets of processors Profiling S 3 D Harness TAU Performance System 11
Mean Profile Over 8 cpus -- Exclusive Time Profiling S 3 D Harness TAU Performance System 12
Mean Percentage -- Exclusive Time Profiling S 3 D Harness TAU Performance System 13
Loop Level Profile With PAPI Counter Data Profiling S 3 D Harness TAU Performance System 14
Para. Prof’s Source Browser Profiling S 3 D Harness TAU Performance System 15
Exclusive MFLOPS Profiling S 3 D Harness TAU Performance System 16
FP Instructions per L 1 Data Cache Miss (rank 0) Profiling S 3 D Harness TAU Performance System 17
Level 1 Data Cache Misses Profiling S 3 D Harness TAU Performance System 18
Callpath Profiles Profiling S 3 D Harness TAU Performance System 19
Callpath Profiles: Flops, Resource Stalls Profiling S 3 D Harness TAU Performance System 20
Callpath Thread Relations Window parent routine children Profiling S 3 D Harness TAU Performance System 21
Flat Profile Profiling S 3 D Harness TAU Performance System 22
TAU’s Para. Profile Browser - Manager Different sections of code within the same routine execute on odd and even processors! Profiling S 3 D Harness TAU Performance System 23
3 D Window: Rank, Routine, Time, Instructions Profiling S 3 D Harness TAU Performance System 24
3 D Window: Variations in FP/L 1 DCM ratios Profiling S 3 D Harness TAU Performance System 25
Getting Access to TAU on Jaguar r r set path=(/spin/proj/perc/TOOLS/tau_latest/x 86_64/bin $path) Choose Stub Makefiles (TAU_MAKEFILE env. var. ) from /spin/proj/perc/TOOLS/tau_latest/xt 3/lib/Makefile. * ¦ ¦ ¦ r Makefile. tau-mpi-pdt-pgi (flat profile) Makefile. tau-mpi-pdt-pgi-trace (event trace, for use with Vampir) Makefile. tau-callpath-mpi-pdt-pgi (single metric, callpath profile) Binaries of S 3 D can be found in: Ø ~sameer/scratch/S 3 D-BINARIES l withtau » papi, multiplecounters, mpi, pdt, pgi options l without_tau Profiling S 3 D Harness TAU Performance System 26
- St mary cray primary school
- St mary cray primary school
- John levesque cray
- Vector supercomputer
- Discernment
- Scott cray
- Cray fish
- Graham cray
- Cray 1 is the example of supercomputer
- Cray-xmp-144
- The trailside killer
- Skus
- Top-down vs bottom-up profiling
- Priling
- Continuous profiling
- Ribosome profiling
- Classification of fixtures
- Gather approach in counseling
- Warehouse profiling
- Geographic profiling
- Top down approach to offender profiling
- Sexual offender map
- Profiling adalah
- Farmer profiling
- Character profiling
- Competency profiling process
- My limitations as a student