Advances in the TAU Performance System Sameer Shende

  • Slides: 61
Download presentation
Advances in the TAU Performance System Sameer Shende and Alan Morris {sameer, amorris}@cs. uoregon.

Advances in the TAU Performance System Sameer Shende and Alan Morris {sameer, amorris}@cs. uoregon. edu Department of Computer and Information Science Neuro. Informatics Center University of Oregon

Acknowledgement Jaideep Ray, SNL r Nick Trebon, U. Oregon r Allen D. Malony, U.

Acknowledgement Jaideep Ray, SNL r Nick Trebon, U. Oregon r Allen D. Malony, U. Oregon r Manish Parashar, Rutgers r Maria Liu, Rutgers r 2

Outline Overview of new features r Instrumentation r Measurement r Analysis tools r CCA

Outline Overview of new features r Instrumentation r Measurement r Analysis tools r CCA proxy generators r 3

TAU Performance System Framework r r r Tuning and Analysis Utilities Performance system framework

TAU Performance System Framework r r r Tuning and Analysis Utilities Performance system framework for scalable parallel and distributed highperformance computing Targets a general complex system computation model ¦ nodes / contexts / threads ¦ Multi-level: system / software / parallelism ¦ Measurement and analysis abstraction Integrated toolkit for performance instrumentation, measurement, analysis, and visualization ¦ Portable, configurable performance profiling/tracing facility ¦ Open software approach University of Oregon, LANL, FZJ Germany http: //www. cs. uoregon. edu/research/paracomp/tau 4

TAU Performance System Architecture Paraver EPILOG 5

TAU Performance System Architecture Paraver EPILOG 5

Enhancements in TAU r Instrumentation ¦ Automatic generation of proxy components for SIDL &

Enhancements in TAU r Instrumentation ¦ Automatic generation of proxy components for SIDL & Classic CCA ¦ Malloc/free wrapper interposition library ¦ Support for MPI-2, SHMEM in wrapper interposition library 1. TAU_COMPILER – improves TAU’s integration into Makefiles 1. Profile Measurement ¦ Phase based profiling ¦ Callpath profiling featuring user defined callpath depth ¦ Support for memory profiling ¦ Compensation of measurement overhead (-COMPENSATE) r Trace Measurement ¦ Online trace analysis, automatic merging and conversion of traces ¦ Support for hierarchical trace merging ¦ Support for binary VTF 3 format (-vtf=<dir> configuration) ¦ Support for hardware performance counters in traces (Vampir) ¦ Trace to profile converter (vtf 2 profile) ¦ Trace input library 6

Enhancements in TAU (contd. ) r Analysis ¦ Perf. DMF (Performance Data Management Framework)

Enhancements in TAU (contd. ) r Analysis ¦ Perf. DMF (Performance Data Management Framework) Ø Oracle, Postgre. SQL, My. SQL supported ¦ Paraprofile browser Normalized/non-normalized views Ø Callpath profile view (immediate parents, routine, immediate children) Ø Scalable histogram display Ø Perf. DMF integration – load, update performance data Ø Support for gprof, mpi. P, Dynaprof, hpmtoolkit, psrun (besides TAU) Ø Callgraph display with clickable callpaths Ø ¦ VNG (Vampir Next Generation, TU Dresden) Online/offline trace visualization Ø Support for binary TAU format in VNG Ø ¦ CUBE (UTK, FZJ) calltree visualizer 7

TAU Performance Measurement r r TAU supports profiling and tracing measurement TAU supports tracking

TAU Performance Measurement r r TAU supports profiling and tracing measurement TAU supports tracking application memory utilization Robust timing and hardware performance support using PAPI Support for online performance monitoring ¦ ¦ r Extension of TAU measurement for multiple counters ¦ ¦ r r Profile and trace performance data export to file system Selective exporting Creation of user-defined TAU counters Access to system-level metrics Support for callpath and phase measurement Integration with system-level performance data 8

Memory Profiling in TAU r Configuration option –PROFILEMEMORY Records global heap memory utilization for

Memory Profiling in TAU r Configuration option –PROFILEMEMORY Records global heap memory utilization for each function ¦ Takes one sample at beginning of each function and associates the sample with function name ¦ Independent of instrumentation/measurement options selected ¦ No need to insert macros/calls in the source code ¦ User defined atomic events appear in profiles/traces ¦ For Traces, see Vampir’s Global Displays->Counter. Timeline to view memory samples ¦ 9

Memory Profiling in TAU r Instrumentation based observation of global heap memory (not per

Memory Profiling in TAU r Instrumentation based observation of global heap memory (not per function) ¦ call TAU_TRACK_MEMORY() Ø Triggers ¦ call TAU_TRACK_MEMORY_HERE() Ø Triggers ¦ set inter-interrupt interval for sampling call TAU_DISABLE_TRACKING_MEMORY() Ø To ¦ sample at a specific location in source code call TAU_SET_INTERRUPT_INTERVAL(seconds) Ø To ¦ one sample every 10 secs turn off recording memory utilization call TAU_ENABLE_TRACKING_MEMORY() Ø To re-enable tracking memory utilization 10

TAU’s malloc/free wrapper for C/C++ #include <TAU. h> #include <malloc. h> int main(int argc,

TAU’s malloc/free wrapper for C/C++ #include <TAU. h> #include <malloc. h> int main(int argc, char **argv) { TAU_PROFILE(“int main(int, char **)”, “ ”, TAU_DEFAULT); int *ary = (int *) malloc(sizeof(int) * 4096); // TAU’s malloc wrapper library replaces this call automatically // when $(TAU_MEMORY_INCLUDE) is used in the Makefile. … free(ary); // other statements in foo … } 11

Using TAU’s Malloc Wrapper Library for C/C++ 12

Using TAU’s Malloc Wrapper Library for C/C++ 12

Using TAU’s Malloc Wrapper Library for C/C++ include /usr/common/acts/TAU/tau-2. 14. 1/rs 6000/lib/Makefile. tau-pdt CC=$(TAU_CC)

Using TAU’s Malloc Wrapper Library for C/C++ include /usr/common/acts/TAU/tau-2. 14. 1/rs 6000/lib/Makefile. tau-pdt CC=$(TAU_CC) CFLAGS=$(TAU_DEFS) $(TAU_INCLUDE) $(TAU_MEMORY_INCLUDE) LIBS = $(TAU_LIBS) OBJS = f 1. o f 2. o. . . TARGET= a. out TARGET: $(OBJS) $(F 90) $(LDFLAGS) $(OBJS) -o $@ $(LIBS). c. o: $(CC) $(CFLAGS) -c $< -o $@ 13

Profile Measurement – Three Flavors r Flat profiles ¦ ¦ ¦ r Callpath Profiles

Profile Measurement – Three Flavors r Flat profiles ¦ ¦ ¦ r Callpath Profiles ¦ ¦ r Time (or counts) spent in each routine (nodes in callgraph). Exclusive/inclusive time, no. of calls, child calls E. g, : MPI_Send, foo, … Flat profiles, plus Sequence of actions that led to poor performance Time spent along a calling path (edges in callgraph) E. g. , “main=> f 1 => f 2 => MPI_Send” shows the time spent in MPI_Send when called by f 2, when f 2 is called by f 1, when it is called by main. Depth of this callpath = 4 (TAU_CALLPATH_DEPTH environment variable) Phase based profiles ¦ ¦ ¦ Flat profiles, plus Flat profiles under a phase (nested phases are allowed) Default “main” phase has all phases and routines invoked outside phases Supports static or dynamic (per-iteration) phases E. g. , “IO => MPI_Send” is time spent in MPI_Send in IO phase 14

Flat Profile – Pprof Profile Browser r r Intel Linux cluster F 90 +

Flat Profile – Pprof Profile Browser r r Intel Linux cluster F 90 + MPICH Profile - Node - Context - Thread Events - code - MPI 15

Flat Profile 16

Flat Profile 16

Callpath Profile 17

Callpath Profile 17

Callpath Profile - parent/node/child view 18

Callpath Profile - parent/node/child view 18

Callpath Profiling 19

Callpath Profiling 19

Phase Profile – Dynamic Phases 20

Phase Profile – Dynamic Phases 20

TAU’s CCA Performance Component r Measurement port and interfaces ¦ Timer set name/type/group Ø

TAU’s CCA Performance Component r Measurement port and interfaces ¦ Timer set name/type/group Ø start/stop Ø ¦ Phase set name/type/group Ø start/stop Ø ¦ Control Ø ¦ enable/disable groups Query get timer names, get metric names, get user-defined event names Ø get timer data, get user-defined event data, dump data to disk Ø ¦ Event Ø ¦ set name, trigger event Memory. Tracker enable interrupt tracking, track memory here, set interrupt interval Ø enable/disable tracking memory Ø 21

TAU’s CCA Interfaces r Performance evaluation using Performance component Uses underlying TAU library for

TAU’s CCA Interfaces r Performance evaluation using Performance component Uses underlying TAU library for measurement ¦ Timer, Phase, Event, Control, Query, Memory. Tracker interfaces ¦ Lightweight instrumentation option ¦ r Performance modeling using Mastermind component Tracks per-invocation performance data ¦ Associates performance data with application data ¦ Method arguments logged with performance data ¦ Callpath information ¦ Helps us build performance models [IPDPS’ 04] ¦ 22

Phase Interface interface Timer { /* Start/stop the Timer */ void start(); void stop();

Phase Interface interface Timer { /* Start/stop the Timer */ void start(); void stop(); interface Phase { /* Start/stop the Phase */ void start(); void stop(); /* Set/get the Timer name */ void set. Name(in string name); string get. Name(); /* Set/get the Phase name */ void set. Name(in string name); string get. Name(); /* Set/get Timer type information (e. g. , signature of the routine) */ void set. Type(in string name); string get. Type(); /* Set/get Phase type information (e. g. , signature of the routine) */ void set. Type(in string name); string get. Type(); /* Set/get the group name associated with the Timer */ void set. Group. Name(in string name); string get. Group. Name(); /* Set/get the group name associated with the Phase */ void set. Group. Name(in string name); string get. Group. Name(); /* Set/get the group id associated with the Timer */ void set. Group. Id(in long group); long get. Group. Id(); /* Set/get the group id associated with the Phase */ void set. Group. Id(in long group); long get. Group. Id(); } } interface Measurement extends gov. cca. Port { /* Create a Timer */ Timer create. Timer(); Timer create. Timer. With. Name(in string name); Timer create. Timer. With. Name. Type(in string name, in string type); Timer create. Timer. With. Name. Type. Group(in string name, in string type, in string group); interface Measurement extends gov. cca. Port { /* Create a Phase */ Phase create. Phase(); Phase create. Phase. With. Name(in string name); Phase create. Phase. With. Name. Type(in string name, in string type); Phase create. Phase. With. Name. Type. Group(in string name, in string type, in string group); 23

Measurement Proxy Component Interpose a proxy component for each port r Inside the proxy

Measurement Proxy Component Interpose a proxy component for each port r Inside the proxy r ¦ Go Make calls to Performance component for each invocation Integrator. Port Driver Midpoint. Integrator. Port. Provides Integrator. Port. Uses Measurement. Port Integrator. Proxy Component 24 Measurement. Port Performance

Master. Mind Component Idea: Create a performance model for the component by tracking performance

Master. Mind Component Idea: Create a performance model for the component by tracking performance per invocation r Uses Monitor Port r Outputs: r ¦ Times per invocation, e. g. # integ_proxy: : integrate(double , double, int) # MPI_TIME Time count low. Bound up. Bound 72420 336 10000 0 1 407 449 1000 0 1 364 540 100 0 1 64838 844 10000 0 1 381 945 1000 0 1 332 1027 100 0 1 Component call path ¦ Regular performance data (uses performance component) ¦ 25

Monitor Proxy Component r Same idea (from the user’s point of view) Go Integrator.

Monitor Proxy Component r Same idea (from the user’s point of view) Go Integrator. Port Driver Midpoint. Integrator. Port. Provides Integrator. Port. Uses Monitor. Port Integrator Monitor Proxy Monitor. Port Measurement. Port Master. Mind Performance 26

Tools Included with Master. Mind Component r Tree pruner ¦ Input: Ø Callgraph generated

Tools Included with Master. Mind Component r Tree pruner ¦ Input: Ø Callgraph generated by Mastermind component Ø User specified rules ¦ Output: Ø Pruned r callgraph with insignificant nodes removed Performance modeling library – brute force Tries all possible permutations of component instances ¦ Input: performance model of each component ¦ Selects optimal component assembly for the ensemble ¦ r Optimizer ¦ Swaps one component instance with another 27

TAU’s Proxy Generator for SIDL/Classic CCA Generate regular measurement proxy or monitor (Master. Mind)

TAU’s Proxy Generator for SIDL/Classic CCA Generate regular measurement proxy or monitor (Master. Mind) proxy r Arguments: r -c -t -p -d -h r <component name> <type name> <port name> <pdbfile name> <header file> Full name of the component Type of component Name of port to generate proxy for Name of pdb file created from cxxparse Header file for this port Options: -n <proxy name> Name of (default: -o <output filename> Name of -f <selective instrumentation file> -x <tag> -m the proxy component base of component name + Proxy) output file (default: proxy. cc) Use Pre-generated Selective instrumentation file Namespace Tag Generate Master. Mind component proxy 28

TAU’s Proxy Generator for Classic C++ Interface r Creating PDB Files: cxxparse <file. cpp>

TAU’s Proxy Generator for Classic C++ Interface r Creating PDB Files: cxxparse <file. cpp> -I<dir> -D<flags> r Merging PDB Files: pdbmerge -o merged. pdb file 1. pdb file 2. pdb … r Invoking tau_pg (example) tau_pg -c integrators: : ccaports: : Integrator -t integrators. ccaports. Integrator -p Integrator. Port -d Parallel. Integrator_CCA. pdb -o Proxy. cc -h ports/Integrator_CCA. h -f select. dat 29

What’s Going On Here? Application Component Performance Component … TAU API other API Alternative

What’s Going On Here? Application Component Performance Component … TAU API other API Alternative implementations of performance component runtime TAU performance data 30

Using TAU r Install TAU % configure ; make clean install r Instrument application

Using TAU r Install TAU % configure ; make clean install r Instrument application ¦ r Typically modify application makefile ¦ r include TAU’s stub makefile, modify variables Set environment variables ¦ ¦ r TAU Profiling API directory where profiles/traces are to be stored name of merged trace file, retain intermediate trace files, etc. Execute application % mpirun –np <procs> a. out; r Analyze performance data ¦ paraprof, vampir/traceanalyzer, pprof, paraver … 31

Auto. Instrumentation using TAU_COMPILER r r r $(TAU_COMPILER) stub Makefile variable in 2. 14+

Auto. Instrumentation using TAU_COMPILER r r r $(TAU_COMPILER) stub Makefile variable in 2. 14+ release Invokes PDT parser, TAU instrumentor, compiler through tau_compiler. sh shell script Requires minimal changes to application Makefile ¦ ¦ Compilation rules are not changed User adds $(TAU_COMPILER) before compiler name Ø F 90=mpxlf 90 Changes to F 90= $(TAU_COMPILER) mpxlf 90 r r Passes options from TAU stub Makefile to the four compilation stages Uses original compilation command if an error occurs 32

TAU_COMPILER – Improving Integration in Makefiles OLD NEW include /usr/tau-2. 14/include/Makefile CXX = mp.

TAU_COMPILER – Improving Integration in Makefiles OLD NEW include /usr/tau-2. 14/include/Makefile CXX = mp. CC F 90 = mpxlf 90_r PDTPARSE = $(PDTDIR)/ include /usr/tau-2. 14/include/Makefile CXX = $(TAU_COMPILER) mp. CC F 90 = $(TAU_COMPILER) mpxlf 90_r CFLAGS = LIBS = -lm $(PDTARCHDIR)/bin/cxxparse TAUINSTR = $(TAUROOT)/$(CONFIG_ARCH)/ OBJS = f 1. o f 2. o f 3. o … fn. o bin/tau_instrumentor CFLAGS = $(TAU_DEFS) $(TAU_INCLUDE) LIBS = $(TAU_MPI_LIBS) $(TAU_LIBS) -lm OBJS = f 1. o f 2. o f 3. o … fn. o app: $(OBJS) $(CXX) $(LDFLAGS) $(OBJS) -o $@ $(LIBS). cpp. o: app: $(OBJS) -o $@ $(LIBS). cpp. o: -o $*. i. cpp $(CC) $(CFLAGS) -c $< $(CXX) $(LDFLAGS) $(PDTPARSE) $< $(TAUINSTR) $*. pdb $< $*. i. cpp –f select. dat $(CC) $(CFLAGS) -c 33

TAU_COMPILER Options r Optional parameters for $(TAU_COMPILER): ¦ ¦ ¦ ¦ -opt. Verbose Turn

TAU_COMPILER Options r Optional parameters for $(TAU_COMPILER): ¦ ¦ ¦ ¦ -opt. Verbose Turn on verbose debugging messages -opt. Pdt. Dir="" PDT architecture directory. Typically $(PDTDIR)/$(PDTARCHDIR) -opt. Pdt. F 95 Opts="" Options for Fortran parser in PDT (f 95 parse) -opt. Pdt. COpts="" Options for C parser in PDT (cparse). Typically $(TAU_MPI_INCLUDE) $(TAU_DEFS) -opt. Pdt. Cxx. Opts="" Options for C++ parser in PDT (cxxparse). Typically $(TAU_MPI_INCLUDE) $(TAU_DEFS) -opt. Pdt. F 90 Parser="" Specify a different Fortran parser. For e. g. , f 90 parse instead of f 95 parse -opt. Pdt. User="" Optional arguments for parsing source code -opt. PDBFile="" Specify [merged] PDB file. Skips parsing phase. -opt. Tau. Instr="" Specify location of tau_instrumentor. Typically $(TAUROOT)/$(CONFIG_ARCH)/bin/tau_instrumentor -opt. Tau. Select. File="" Specify selective instrumentation file for tau_instrumentor -opt. Tau="" Specify options for tau_instrumentor -opt. Compile="" Options passed to the compiler. Typically $(TAU_MPI_INCLUDE) $(TAU_DEFS) -opt. Linking="" Options passed to the linker. Typically $(TAU_MPI_FLIBS) $(TAU_CXXLIBS) -opt. No. Mpi Removes -l*mpi* libraries during linking (default) -opt. Keep. Files Does not remove intermediate. pdb and. inst. * files e. g. , OPT=-opt. Tau. Select. File=select. tau –opt. PDBFile=merged. pdb F 90 = $(TAU_COMPILER) $(OPT) mpxlf 90_r 34

Program Database Toolkit Component source/ Library C / C++ parser IL C / C++

Program Database Toolkit Component source/ Library C / C++ parser IL C / C++ IL analyzer Program Database Files Fortran 77/90/95 parser IL Fortran 77/90/95 IL analyzer DUCTAPE 35 tau_pg Proxy Component SILOON Application component glue CHASM C++ / F 90 interoperability TAU_instr Automatic source instrumentation

TAU Tracing Enhancements r Configure TAU with -TRACE –vtf=dir option % configure –TRACE –vtf=<dir>

TAU Tracing Enhancements r Configure TAU with -TRACE –vtf=dir option % configure –TRACE –vtf=<dir> -MULTIPLECOUNTERS –papi=<dir> -mpi –pdt=dir … r Set environment variables % % setenv Ø r TAU_TRACEFILE foo. vpt. gz COUNTER 1 GET_TIME_OF_DAY (required) COUNTER 2 PAPI_FP_INS… COUNTER 2 PAPI_NATIVE_<event> for IBM, see /usr/pmapi/lib/POWER 4. evs e. g. , PAPI_NATIVE_PM_FPU 0_FDIV for FPU 0 executed FDIV instruction (for using native events) Execute application (automatic merge/convert) % poe a. out –procs 4 % traceanalyzer foo. vpt. gz r NOTE: COUNTER 1 must be GET_TIME_OF_DAY 36

Intel ® Traceanalyzer (Vampir) Global Timeline 37

Intel ® Traceanalyzer (Vampir) Global Timeline 37

Visualizing TAU Traces with Counters/Samples 38

Visualizing TAU Traces with Counters/Samples 38

Visualizing TAU Traces with Counters/Samples 39

Visualizing TAU Traces with Counters/Samples 39

TAU Performance Data Management Framework Performance analysis programs Raw performance data Para. Prof Hpmtoolkit

TAU Performance Data Management Framework Performance analysis programs Raw performance data Para. Prof Hpmtoolkit Psrun Dynaprof mpi. P Gprof … C API Perf. DMF Java API Profile meta-data JDBC Postgre. SQL Oracle My. SQL . . . 40 Database … …

Paraprof Manager – Performance Database 41

Paraprof Manager – Performance Database 41

Paraprof Scalable Histogram View 42

Paraprof Scalable Histogram View 42

Paraprof – Stack Bars Separately View 43

Paraprof – Stack Bars Separately View 43

Paraprof – Full Callgraph View 44

Paraprof – Full Callgraph View 44

Paraprof – Callgraph View (Zoom In +/Out -) 45

Paraprof – Callgraph View (Zoom In +/Out -) 45

KOJAK’s CUBE (UTK, FZJ) Browser 46

KOJAK’s CUBE (UTK, FZJ) Browser 46

Current Status (Jan 2005) r Released TAU v 2. 14. 1 and PDT v

Current Status (Jan 2005) r Released TAU v 2. 14. 1 and PDT v 3. 3. 1 ¦ Perf. DMF (Performance Database Framework) ¦ r http: //www. cs. uoregon. edu/research/paracomp/tau Released Performance Component v 1. 5 ¦ Master. Mind Component Ø Tree Pruner Ø Performance Modeling Library Ø Optimizer ¦ Supports SIDL, Classic C++, Classic Neo interfaces Ø Previous ¦ versions of CCAFE, BABEL supported (1. 0 -1. 5) http: //www. cs. uoregon. edu/research/paracomp/tau/cca 47

Support Acknowledgements r r Department of Energy (DOE) ¦ Office of Science contracts ¦

Support Acknowledgements r r Department of Energy (DOE) ¦ Office of Science contracts ¦ University of Utah DOE ASCI Level 1 sub-contract ¦ DOE ASC/NNSA Level 3 contract NSF Software and Tools for High-End Computing Grant Research Centre Juelich ¦ John von Neumann Institute for Computing ¦ Dr. Bernd Mohr Los Alamos National Laboratory 48

SIDL Performance Interface package Performance version 1. 5. 0 { interface Timer { /*

SIDL Performance Interface package Performance version 1. 5. 0 { interface Timer { /* Start/stop the Timer */ void start(); void stop(); /* Set/get the Timer name */ void set. Name(in string name); string get. Name(); /* Set/get Timer type information (e. g. , signature of the routine) */ void set. Type(in string name); string get. Type(); /* Set/get the group name associated with the Timer */ void set. Group. Name(in string name); string get. Group. Name(); /* Set/get the group id associated with the Timer */ void set. Group. Id(in long group); long get. Group. Id(); } interface Phase { /* Start/stop the Phase */ void start(); void stop(); /* Set/get the Phase name */ void set. Name(in string name); string get. Name(); /* Set/get Phase type information (e. g. , signature of the routine) */ void set. Type(in string name); string get. Type(); /* Set/get the group name associated with the Phase */ void set. Group. Name(in string name); string get. Group. Name(); /* Set/get the group id associated with the Phase */ void set. Group. Id(in long group); long get. Group. Id(); } 49

SIDL Performance Interface interface Query { /* Get the list of Timer and Counter

SIDL Performance Interface interface Query { /* Get the list of Timer and Counter names */ array<string> get. Timer. Names(); array<string> get. Counter. Names(); /* Get the timer data */ void get. Timer. Data(in array<string> timer. List, out array<double, 2> counter. Exclusive, out array<double, 2> counter. Inclusive, out array<int> num. Calls, out array<int> num. Child. Calls, out array<string> counter. Names, out int num. Counters); /* User Event query interface */ array<string> get. Event. Names(); void get. Event. Data(in array<string> event. List, out array<int> num. Samples, out array<double> max, out array<double> min, out array<double> mean, out array<double> sum. Sqr); /* Writes instantaneous profile to disk in a dump file. */ void dump. Profile. Data(); /* Writes instantaneous profile to disk in a dump file with a specified prefix. */ void dump. Profile. Data. Prefix(in string prefix); /* Writes the instantaneous profile to disk in a dump file whose name * contains the current timestamp. */ void dump. Profile. Data. Incremental(); /* Writes the list of timer names to a dump file on the disk */ void dump. Timer. Names(); /* Writes the profile of the given set of timers to the disk. */ void dump. Timer. Data(in array<string> timer. List); /* Writes the profile of the given set of timers to the disk. The dump * file name contains the current timestamp when the data was dumped. */ void dump. Timer. Data. Incremental(in array<string> timer. List); } 50

SIDL Performance Interface /* Memory Tracker interface */ interface Memory. Tracker { /* track

SIDL Performance Interface /* Memory Tracker interface */ interface Memory. Tracker { /* track heap memory at a given place */ void track. Here(); /* enable interrupt driven memory tracking */ void enable. Interrupt. Tracking(); /* set the interrupt interval, default is 10 seconds */ void set. Interrupt. Interval(in int value); /* disable tracking (both interrupt driven and manual) */ void enable(); /* enable tracking (both interrupt driven and manual)*/ void disable(); } /* User defined event profiles for application specific events */ interface Event { /* Set the name of the event */ void set. Name(in string name); /* Trigger the event */ void trigger(in double data); } /* Interface for runtime instrumentation control based on groups */ interface Control { /* Enable/disable group id */ void enable. Group. Id(in long id); void disable. Group. Id(in long id); /* Enable/disable group name */ void enable. Group. Name(in string name); void disable. Group. Name(in string name); /* Enable/disable all groups */ void enable. All. Groups(); void disable. All. Groups(); } 51

SIDL Performance Interface /* Interface to create performance component instances */ interface Measurement extends

SIDL Performance Interface /* Interface to create performance component instances */ interface Measurement extends gov. cca. Port { /* Create a Timer */ Timer create. Timer(); Timer create. Timer. With. Name(in string name); Timer create. Timer. With. Name. Type(in string name, in string type); Timer create. Timer. With. Name. Type. Group(in string name, in string type, in string group); Phase create. Phase(); Phase create. Phase. With. Name(in string name); Phase create. Phase. With. Name. Type(in string name, in string type); Phase create. Phase. With. Name. Type. Group(in string name, in string type, in string group); /* Create a Query interface */ Query create. Query(); /* Create a Memory. Tracker interface */ Memory. Tracker create. Memory. Tracker(); /* Create a User Defined Event interface */ Event create. Event(); Event create. Event. With. Name(in string name); /* Create a Control interface for selectively enabling and disabling * the instrumentation based on groups */ Control create. Control(); } /* Monitor Port for Master. Mind component */ interface Monitor extends gov. cca. Port { void start. Monitoring(in string rname); void stop. Monitoring(in string rname, in array<string> param. Names, in array<double> param. Values); void set. File. Name(in string rname, in string fname); void dump. Data(in string rname); void dump. Data. File. Name(in string rname, in string fname); void destroy. Record(in string rname); } interface Perf. Param extends gov. cca. Port { int get. Performance. Data(in string rname, out array<double, 2> data, in bool reset); int get. Comp. Meth. Names(out array<string> cm_names); } } 52

SIDL Performance Interface /* Interface to create performance component instances */ interface Measurement extends

SIDL Performance Interface /* Interface to create performance component instances */ interface Measurement extends gov. cca. Port { /* Create a Timer */ Timer create. Timer(); Timer create. Timer. With. Name(in string name); Timer create. Timer. With. Name. Type(in string name, in string type); Timer create. Timer. With. Name. Type. Group(in string name, in string type, in string group); Phase create. Phase(); Phase create. Phase. With. Name(in string name); Phase create. Phase. With. Name. Type(in string name, in string type); Phase create. Phase. With. Name. Type. Group(in string name, in string type, in string group); /* Create a Query interface */ Query create. Query(); /* Create a Memory. Tracker interface */ Memory. Tracker create. Memory. Tracker(); /* Create a User Defined Event interface */ Event create. Event(); Event create. Event. With. Name(in string name); /* Create a Control interface for selectively enabling and disabling * the instrumentation based on groups */ Control create. Control(); } /* Monitor Port for Master. Mind component */ interface Monitor extends gov. cca. Port { void start. Monitoring(in string rname); void stop. Monitoring(in string rname, in array<string> param. Names, in array<double> param. Values); void set. File. Name(in string rname, in string fname); void dump. Data(in string rname); void dump. Data. File. Name(in string rname, in string fname); void destroy. Record(in string rname); } interface Perf. Param extends gov. cca. Port { int get. Performance. Data(in string rname, out array<double, 2> data, in bool reset); int get. Comp. Meth. Names(out array<string> cm_names); } } 53

Sample Driver void sample: : Driver_impl: : set. Services ( /*in*/ : : gov:

Sample Driver void sample: : Driver_impl: : set. Services ( /*in*/ : : gov: : cca: : Services services ) throw ( : : gov: : cca: : CCAException ){ // DO-NOT-DELETE splicer. begin(sample. Driver. set. Services) framework. Services = services; gov: : cca: : Type. Map tm = framework. Services. create. Type. Map(); gov: : cca: : Port p = self; framework. Services. add. Provides. Port (p, "Go", "gov. cca. ports. Go. Port", tm); framework. Services. register. Uses. Port ("Measurement. Port", "Performance. Measurement", tm); // DO-NOT-DELETE splicer. end(sample. Driver. set. Services) } 54

Sample Driver int 32_t sample: : Driver_impl: : go () throw () { //

Sample Driver int 32_t sample: : Driver_impl: : go () throw () { // DO-NOT-DELETE splicer. begin(sample. Driver. go) : : gov: : cca: : Port port; port = framework. Services. get. Port ("Measurement. Port"); if (port. _is_nil()) { std: : cerr << "Measurement. Port is not connected" << std: : endl; return -1; } Performance: : Measurement measurement = port; for (int i = 0; i < 4; i++) { std: : ostringstream os; os << "Iteration " << i; std: : string phase. Name = os. str(); // Create and start a phase : : Performance: : Phase phase = measurement. create. Phase. With. Name(phase. Name); phase. start(); // Create and start a timer static : : Performance: : Timer tautimer = measurement. create. Timer. With. Name. Type. Group("go", "int 32_t ()", "TAU_GROUP_CCA"); tautimer. start(); // Create a memory tracker and start interrupt driven memory tracking : : Performance: : Memory. Tracker tracker = measurement. create. Memory. Tracker(); tracker. enable. Interrupt. Tracking(); 55

Sample Driver sleep(i); // Manually track memory here tracker. track. Here(); tautimer. stop(); phase.

Sample Driver sleep(i); // Manually track memory here tracker. track. Here(); tautimer. stop(); phase. stop(); } // Create a query interface : : Performance: : Query query = measurement. create. Query(); // Get the event names : : sidl: : array< : : std: : string> event. Names = query. get. Event. Names(); : : sidl: : array<int 32_t> num. Samples; : : sidl: : array<double> max, min, mean, sum. Sqr; // Get the event data query. get. Event. Data(event. Names, num. Samples, max, min, mean, sum. Sqr); int num. Events = event. Names. upper(0) - event. Names. lower(0) + 1; for (int i = 0; i < num. Events; i++) { std: : cout << "User Event: " << event. Names. get(i) << std: : endl; std: : cout << "Number of Samples: " << num. Samples. get(i) << std: : endl; std: : cout << "Maximum Value: " << max. get(i) << std: : endl; std: : cout << "Minimim Value: " << min. get(i) << std: : endl; std: : cout << "Mean Value: " << mean. get(i) << std: : endl; std: : cout << "Sum Squared: " << sum. Sqr. get(i) << std: : endl; } framework. Services. release. Port("Measurement. Port"); return 0; // DO-NOT-DELETE splicer. end(sample. Driver. go) } 56

CCA Classic C++ Performance interface #include <string> using std: : string; namespace performance {

CCA Classic C++ Performance interface #include <string> using std: : string; namespace performance { class Timer { public: /** * The destructor should be declared virtual in an interface class. */ virtual ~Timer() { } /** * Start the Timer. * Implement this function in * a derived class to provide required functionality. */ virtual void start(void) = 0; /** * Stop the Timer. */ virtual void stop(void) = 0; /** * Set the name of the Timer. */ virtual void set. Name(string name) = 0; /** * Get the name of the Timer. */ virtual string get. Name(void) = 0; /** * Set the type information of the Timer * (e. g. , signature of the routine) */ virtual void set. Type(string name) = 0; /** * Get the type information of the Timer * (e. g. , signature of the routine) */ virtual string get. Type(void) = 0; 57

CCA Classic C++ Performance interface /** * Set the group name associated with the

CCA Classic C++ Performance interface /** * Set the group name associated with the Timer * (e. g. , All MPI calls can be grouped into an "MPI" group) */ virtual void set. Group. Name(string name) = 0; /** * Get the group name associated with the Timer */ virtual string get. Group. Name(void) = 0; /** * Set the group id associated with the Timer */ virtual void set. Group. Id(unsigned long group ) = 0; /** * Get the group id associated with the Timer */ virtual unsigned long get. Group. Id(void) = 0; }; class Phase { public: /** * The destructor should be declared virtual in an interface class. */ virtual ~Phase() { } /** * Start the Phase. * Implement this function in * a derived class to provide required functionality. */ virtual void start(void) = 0; /** * Stop the Phase. */ virtual void stop(void) = 0; /** 58

CCA Classic C++ Performance interface virtual void set. Name(string name) = 0; /** *

CCA Classic C++ Performance interface virtual void set. Name(string name) = 0; /** * Get the name of the Phase. */ virtual string get. Name(void) = 0; /** * Set the type information of the Phase * (e. g. , signature of the routine) */ virtual void set. Type(string name) = 0; /** * Get the type information of the Phase * (e. g. , signature of the routine) */ virtual string get. Type(void) = 0; /** * Set the group name associated with the Phase * (e. g. , All MPI calls can be grouped into an "MPI" group) */ virtual void set. Group. Name(string name) = 0; /** * Get the group name associated with the Phase */ virtual string get. Group. Name(void) = 0; /** * Set the group id associated with the Phase */ virtual void set. Group. Id(unsigned long group ) = 0; /** * Get the group id associated with the Phase */ virtual unsigned long get. Group. Id(void) = 0; }; /** * Query the timing information */ class Query { public: 59

CCA Classic C++ Performance interface virtual ~Query() { } /** * Get the list

CCA Classic C++ Performance interface virtual ~Query() { } /** * Get the list of Timer names */ virtual void get. Timer. Names(const char **& function. List, int& num. Funcs) = 0; /** * Get the list of Counter names */ virtual void get. Counter. Names(const char **& counter. List, int& num. Counters) = 0; /** * get. Timer. Data. Returns lists of metrics. */ virtual void get. Timer. Data(const char **& in. Timer. List, int num. Timers, double **& counter. Exclusive, double **& counter. Inclusive, int*& num. Calls, int*& num. Child. Calls, const char **& counter. Names, int& num. Counters) = 0; /* * Get the list of User Event names */ virtual void get. Event. Names(const char **&event. List, int &num. Events) = 0; /* * Get User Event data */ virtual void get. Event. Data(const char **&in. Event. List, int num. Events, int* &num. Samples, double* &max, double* &min, double* &mean, double* &sum. Sqr) = 0; /** * dump. Profile. Data. Writes the entire profile to disk in a dump file. * It maintains a consistent state and represents the instantaneous * profile data had the application terminated at the instance this call * is invoked. */ virtual void dump. Profile. Data(void) = 0; 60

CCA Classic C++ Performance interface /** * dump. Profile. Data. Prefix. Writes the entire

CCA Classic C++ Performance interface /** * dump. Profile. Data. Prefix. Writes the entire profile to disk in a dump * file prefixed by 'prefix'. It maintains a consistent state and * represents the instantaneous profile data had the application * terminated at the instance this call is invoked. */ virtual void dump. Profile. Data. Prefix(const char *prefix) = 0; /** * dump. Profile. Data. Incremental. Writes the entire profile to disk in a * dump file whose name contains the current timestamp. * It maintains a consistent state and represents the instantaneous * profile data had the application terminated at the instance this call * is invoked. This call allows us to build a set of timestamped profile * files. */ virtual void dump. Profile. Data. Incremental(void) = 0; /** * dump. Timer. Names. Writes the list of timer names to a dump file on the * disk. */ virtual void dump. Timer. Names(void) = 0; /** * dump. Timer. Data. Writes the profile of the given set of timers to the * disk. This allows the user to select the set of routines to dump and * periodically write the performance data of a subset of timers to disk * for monitoring purposes. */ virtual void dump. Timer. Data(const char **& in. Timer. List, int num. Timers) = 0; /** * dump. Timer. Data. Incremental. Writes the profile of the given set of * timers to the disk. The dump file name contains the current timestamp * when the data was dumped. This allows the user to select the set of * routines to dump and periodically write the performance data of a * subset of timers to the disk and maintain a timestamped set of values * for post-mortem analysis of how the performance data varied for a * given set of routimes with time. */ virtual void dump. Timer. Data. Incremental(const char **& in. Timer. List, int num. Timers) = 0; }; 61