Open MP Performance Visualization with Paraver Jess Labarta

Open. MP Performance Visualization with Paraver Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC

PARAVER (1992 - ) l Flexible performance visualization tool ü Functions of time ü Precedence relationships l Quantitative, comparative l Powerful / not trivial ü You drive the analysis l MPI + Open. MP, System activity, performance counters, … l Distributed by CEPBA Jesús Labarta, SPSci. Comp 2000

Process model Multithreaded + message passing + multiprogramming Objects: Thread Task Ptask (application) Jesús Labarta, SPSci. Comp 2000

Tracefile Instrumented codes MPI + Open. MP Java Pthreads, shmem Monitoring tools System activity (SCPUs) Info. Perfex Filters par 2 Paraver UTE 2 Paraver Simulators Dimemas Simplescalar Records State Events: Flag Precedence Jesús Labarta, SPSci. Comp 2000 (Object, time_start, time_end, state) (Object, time, type, value) (Object_src, Object_dst, time_src, time_dst, tag, size)

Structure Tracefile Filter Reduced Tracefile Semantics Function of time (semantic value) Events Representation Visualization Jesús Labarta, SPSci. Comp 2000 Textual Demand Driven evaluation Analysis

Filter module l Events ü by type ü by value l Communications ü by tag ü by size ü by source / destination ü logical / physical Jesús Labarta, SPSci. Comp 2000

Semantic module l Semantic value: f(t) l f = fcomp 2 fcomp 1 f. Ptask fthread l Semantic functions ü fcomp 2, fcomp 1: sign, mod, div, in range ü f. Ptask : add, average, max, select fcomp 1 ü ftask : add, average, max, select f. Ptask ü fthread: in state, useful, given state, last event value, next event value, average next event value Jesús Labarta, SPSci. Comp 2000 ftask fthread fthread

Visualization l Type of window ü Ptask / Task / thread: one row per object of selected type ü Object selection (scalability) l Representation ü Color encoded / Gradient / Function of time l Multiple windows ü Synchronised l Forward/backward animation l Precise time measurement ü Within/between windows Jesús Labarta, SPSci. Comp 2000

Textual l Textual detail of area around point within window l Semantic value and duration / flag / communication l Numeric / translated text (. pcf file) Jesús Labarta, SPSci. Comp 2000

Analysis l Time and object range selected pointing on window l Analysis function applied to output of semantic module ü Average semantic value ü Average duration/variance/number of bursts (if within range) ü Number of events ü Number of communications ü. . . Jesús Labarta, SPSci. Comp 2000

Open. MP instrumentation n Compiler instrumentation l NANOS compiler n Dynamic Interception l SGI native Open. MP (MP library) n Tracing of thread status l running l idle (busy wait) l scheduling l blocked Jesús Labarta, SPSci. Comp 2000

Open. MP analysis n Application structure l Stamping code Jesús Labarta, SPSci. Comp 2000

Open. MP analysis n Loop scheduling l Antena design Jesús Labarta, SPSci. Comp 2000

Open. MP analysis Jesús Labarta, SPSci. Comp 2000

Open. MP analysis How do bees see flowers? Jesús Labarta, SPSci. Comp 2000

Open. MP analysis Jesús Labarta, SPSci. Comp 2000

Open. MP analysis Jesús Labarta, SPSci. Comp 2000

Open. MP analysis What bees don’t see Function A B C D Av. L 2 misses/ms 62 52 163 14 FLOPS/ms 41 K 21 K 8 K 1 K Loads/ms 57 K 52 K 18 K 100 K Jesús Labarta, SPSci. Comp 2000

Static vs. Dynamic Parallelism Jesús Labarta, SPSci. Comp 2000

More on hardware counters Less misses, more time Jesús Labarta, SPSci. Comp 2000

More on hardware counters More memory accesses per second Less coherence state changes Jesús Labarta, SPSci. Comp 2000

MPI + Open. MP NAS FT Quantitative data: %MPI collective comm: %OMP: fork/join 18% 5% %non parallelized: 32% Avg. || Loop: # || loops < 5 ms Jesús Labarta, SPSci. Comp 2000 50 ms 38 6

Other uses n System activity n Info. Perfex Average : 33 MFLOPS n Pthreads Jesús Labarta, SPSci. Comp 2000 Peak: 60 MFLOPS

Paraver on IBM n DPCL + PAPI : l Sequential programs l Open. MP n UTE l MPI+Open. MP Jesús Labarta, SPSci. Comp 2000

UTE Paraver n Filter l Thread states ü Executing application code ü Executing MPI Reveive ü Executing MPI Send ü Descheduled n Statistics Jesús Labarta, SPSci. Comp 2000

UTE Analysis n Communication pattern l Exchanges 1 2 ; 3 4 n Load balance l More load on thread 1 n MPI implementation l Busy wait on receives n Scheduling l Thread 2 and 3 time sharing one CPU l Thread 4 time sharing one CPU with other processes l OS quantum: 10 ms. Jesús Labarta, SPSci. Comp 2000

More information http: //www. cepba. upc. es/paraver cepbatools@cepba. upc. es Jesús Labarta, SPSci. Comp 2000
- Slides: 27