CSCE 713 Advanced Computer Architecture Lecture 4 Timing

  • Slides: 38
Download presentation
CSCE 713 Advanced Computer Architecture Lecture 4 Timing Linux Processes & Threads Topics n

CSCE 713 Advanced Computer Architecture Lecture 4 Timing Linux Processes & Threads Topics n … Readings January 19, 2012

Overview Last Time n Posix Pthreads: create, join, exit, mutexes n /class/csce 713 -006

Overview Last Time n Posix Pthreads: create, join, exit, mutexes n /class/csce 713 -006 Code and Data Readings for today n http: //csapp. cs. cmu. edu/public/1 e/public/ch 9 -preview. pdf New n Website alive and kicking; dropbox too! n From Last time: Gauss-Seidel Method, Barriers, Threads Assignment Next time performance evaluation, barriers and MPI intro n – 2– CSCE 713 Spring 2012

Threads programming Assignment 1. Matrix addition (embarassingly parallel) 2. Versions a. b. c. d.

Threads programming Assignment 1. Matrix addition (embarassingly parallel) 2. Versions a. b. c. d. Sequential with blocking factor Sequential Read without conversions Multi threaded passing number of threads as command line argument (args. c code should be distributed as an example) 3. Plot of several runs 4. Next time – 3– CSCE 713 Spring 2012

Time in the Computer World Clock cycle time = ? 1 GHz processor has

Time in the Computer World Clock cycle time = ? 1 GHz processor has 10 -9 = 1 ns clock cycle time – 4 –Computer Systems: A Programmers Perspective, Bryant and O’Hallaron CSCE 713 Spring 2012

Times in the Unix World Command Real Time User Time System Time Waiting Time

Times in the Unix World Command Real Time User Time System Time Waiting Time – time waiting on I/O operations and while other processes execute Clock Time Which time is most important? Latency vs throughput – 5– CSCE 713 Spring 2012

The Time Command TIME(1) NAME time - run programs and summarize system resource usage

The Time Command TIME(1) NAME time - run programs and summarize system resource usage Example of a sleepy program real 3 m 10. 006 s user 0 m 0. 004 s sys 0 m 0. 000 s – 6– CSCE 713 Spring 2012

MIPS, MFLOPS, Benchmarks SPEC 2006 – relevance to this course – 7– CSCE 713

MIPS, MFLOPS, Benchmarks SPEC 2006 – relevance to this course – 7– CSCE 713 Spring 2012

Multitasking and Interval Interrupts Recall Multitasking from OS class: • multiple processes ready to

Multitasking and Interval Interrupts Recall Multitasking from OS class: • multiple processes ready to go • scheduler decides who goes next • each process given small time slice (1 -10 ms) • The process end either because of an I/O operation, page-fault, even cache-miss, or it uses up time slice • System has an “interval timer” to interupt at the end of the time slice – 8– CSCE 713 Spring 2012

CSAPP Fig 9. 2. – 9 –Computer Systems: A Programmers Perspective, Bryant and O’Hallaron

CSAPP Fig 9. 2. – 9 –Computer Systems: A Programmers Perspective, Bryant and O’Hallaron CSCE 713 Spring 2012

Measuring Time by Interval Counting. – 10 –Computer Systems: A Programmers Perspective, Bryant and

Measuring Time by Interval Counting. – 10 –Computer Systems: A Programmers Perspective, Bryant and O’Hallaron CSCE 713 Spring 2012

Practice Problem 9. 4 On a system with a timer interval of 10 ms,

Practice Problem 9. 4 On a system with a timer interval of 10 ms, some segment of process A is recorded as requiring 70 ms, combining both system and user time. What are the minimum and maximum actual times used by thissegment? – 11 – CSCE 713 Spring 2012

Programmer Access to Interval Timers #include <sys/times. h> struct tms clock_t tms_utime; /* user

Programmer Access to Interval Timers #include <sys/times. h> struct tms clock_t tms_utime; /* user time * / clock_t tms_s time; /* system time * / clock_t tms_cutime; /* user time of reaped children */ clock_t tms_cstime; /* system time of reaped children */ }; clock_t times(struct tms *buf); Returns: number of clock ticks elapsed since system started – 12 –Computer Systems: A Programmers Perspective, Bryant and O’Hallaron CSCE 713 Spring 2012

Measuring interval counting accuracy. Fig 9. 8 – 13 – CSCE 713 Spring 2012

Measuring interval counting accuracy. Fig 9. 8 – 13 – CSCE 713 Spring 2012

Cycle Counters on the IA 32 Cycle counters increment every clock cycle 32 bit

Cycle Counters on the IA 32 Cycle counters increment every clock cycle 32 bit counter with 1 GHz clock wraps after 232/109 ~ 4. 3 seconds 64 bit counter takes a lot longer 264/109 ~ 570 years IA 32 counter accessed with the “rdtsc” (read time stamp counter) instruction It places high order 32 bits in %edx and low 32 in %eax It would be nice if we had a c interface void access counter(unsigned *hi, unsigned *lo); – 14 – CSCE 713 Spring 2012

Including Assembly Code in C void access_counter(unsigned *hi, unsigned *lo) { asm("rdtsc; movl %%edx,

Including Assembly Code in C void access_counter(unsigned *hi, unsigned *lo) { asm("rdtsc; movl %%edx, %0; movl %%eax, %1" /* Read cycle counter */ : "=r" (*hi), "=r" (*lo) : /* No input */ /* and move results to */ /* the two outputs */ : "%edx", "%eax"); } – 15 – CSCE 713 Spring 2012

Closer Look at Extended ASM asm(“Instruction String" : Output List : Input List :

Closer Look at Extended ASM asm(“Instruction String" : Output List : Input List : Clobbers List); } void access_counter (unsigned *hi, unsigned *lo) { /* Get cycle counter */ asm("rdtsc; movl %%edx, %0; movl %%eax, %1" : "=r" (*hi), "=r" (*lo) : /* No input */ : "%edx", "%eax"); } Instruction String n Series of assembly commands l Separated by “; ” or “n” l Use “%%” where normally would use “%” – 16 – CSCE 713 Spring 2012

Closer Look at Extended ASM asm(“Instruction String" : Output List : Input List :

Closer Look at Extended ASM asm(“Instruction String" : Output List : Input List : Clobbers List); } Output List n void access_counter (unsigned *hi, unsigned *lo) { /* Get cycle counter */ asm("rdtsc; movl %%edx, %0; movl %%eax, %1" : "=r" (*hi), "=r" (*lo) : /* No input */ : "%edx", "%eax"); } Expressions indicating destinations for values %0, %1, …, %j l Enclosed in parentheses l Must be lvalue » Value that can appear on LHS of assignment n – 17 – Tag "=r" indicates that symbolic value (%0, etc. ), should be replaced by register CSCE 713 Spring 2012

Closer Look at Extended ASM asm(“Instruction String" : Output List : Input List :

Closer Look at Extended ASM asm(“Instruction String" : Output List : Input List : Clobbers List); } void access_counter (unsigned *hi, unsigned *lo) { /* Get cycle counter */ asm("rdtsc; movl %%edx, %0; movl %%eax, %1" : "=r" (*hi), "=r" (*lo) : /* No input */ : "%edx", "%eax"); } Input List n Series of expressions indicating sources for values %j+1, %j+2, … l Enclosed in parentheses l Any expression returning value n – 18 – Tag "r" indicates that symbolic value (%0, etc. ) will come from register CSCE 713 Spring 2012

Closer Look at Extended ASM asm(“Instruction String" : Output List : Input List :

Closer Look at Extended ASM asm(“Instruction String" : Output List : Input List : Clobbers List); } void access_counter (unsigned *hi, unsigned *lo) { /* Get cycle counter */ asm("rdtsc; movl %%edx, %0; movl %%eax, %1" : "=r" (*hi), "=r" (*lo) : /* No input */ : "%edx", "%eax"); } Clobbers List n n List of register names that get altered by assembly instruction Compiler will make sure doesn’t store something in one of these registers that must be preserved across asm l Value set before & used after – 19 – CSCE 713 Spring 2012

Using access_counter #include "clock. h" void start counter(); double get counter(); – 21 –

Using access_counter #include "clock. h" void start counter(); double get counter(); – 21 – CSCE 713 Spring 2012

The K-Best Measurement Scheme Measurements of process time in a loaded system are always

The K-Best Measurement Scheme Measurements of process time in a loaded system are always overestimates Context switching, cache operations, branch predictions Interference from other processes Even unloaded I/O operations will vary because of locations of disk heads etc. Heisenberg uncertainty principle Heisenbugs – bugs influenced/created by monitoring code – 22 – CSCE 713 Spring 2012

The K-Best Measurement Scheme 1. Warm up the instruction cache 2. Record the K

The K-Best Measurement Scheme 1. Warm up the instruction cache 2. Record the K fastest times 3. If these agree with some tolerance say. 1% then the fastest of these represents the true execution time. – 23 – CSCE 713 Spring 2012

Measuring Using Time of Day Code/CSAPP/perf/tod. c #include <stdlib. h> #include <stdio. h> #include

Measuring Using Time of Day Code/CSAPP/perf/tod. c #include <stdlib. h> #include <stdio. h> #include <time. h> #include <sys/time. h> #include <unistd. h> static struct timeval tstart; /* Record current time */ void start_timer() { gettimeofday(&tstart, NULL); – }24 – CSCE 713 Spring 2012

/* Get number of seconds since last call to start_timer */ double get_timer() {

/* Get number of seconds since last call to start_timer */ double get_timer() { struct timeval tfinish; long sec, usec; gettimeofday(&tfinish, NULL); sec = tfinish. tv_sec - tstart. tv_sec; usec = tfinish. tv_usec - tstart. tv_usec; return sec + 1 e-6*usec; } – 25 – CSCE 713 Spring 2012

/* Determine how many "seconds" are in tod counter */ static void callibrate_tod() {

/* Determine how many "seconds" are in tod counter */ static void callibrate_tod() { double quick, slow; start_timer(); quick = get_timer(); start_timer(); sleep(1); slow = get_timer(); printf("%. 2 f - %f = %. 2 f seconds/sleep secondn", slow, quick, slow-quick); } – 26 – CSCE 713 Spring 2012

static void run_timer() { int i = 0; double t 1, t 2, t

static void run_timer() { int i = 0; double t 1, t 2, t 3, t 4; start_timer(); t 1 = get_timer(); do { t 2 = get_timer(); } while (t 2 == t 1); do { t 3 = get_timer(); i++; } while (t 3 == t 2); printf("Time = %f usecs, in %d iterations, %f usec/iterationn“, 1 e 6 *(t 3 -t 2), i, 1 e 6 *(t 3 -t 2)/i); – 27 – CSCE 713 Spring 2012

start_timer(); i = 0; do { t 4 = get_timer(); i++; } while (t

start_timer(); i = 0; do { t 4 = get_timer(); i++; } while (t 4 < 1. 0); printf("%d iterations in %f secs = %f usec/iterationn", i, t 4, 1 e 6*t 4/i); } – 28 – CSCE 713 Spring 2012

static void check_time() { long e; int s, m, h, d, y; start_timer(); e

static void check_time() { long e; int s, m, h, d, y; start_timer(); e = tstart. tv_sec; s = e % 60; e = e / 60; m = e % 60; e = e / 60; h = e % 24; e = e / 24; d = e % 365; e = e / 365; y = e; printf("This clock started %d years, %d days, %d hours, %d minutes, %d seconds agon“, y, d, h, m, s); – }29 – CSCE 713 Spring 2012

int main(int arg, char *argv[]) { callibrate_tod(); run_timer(); check_time(); return 0; } – 30

int main(int arg, char *argv[]) { callibrate_tod(); run_timer(); check_time(); return 0; } – 30 – CSCE 713 Spring 2012

– 31 – CSCE 713 Spring 2012

– 31 – CSCE 713 Spring 2012

– 32 – CSCE 713 Spring 2012

– 32 – CSCE 713 Spring 2012

Gprof – profiling Linux programs Links • https: //computing. llnl. gov/tutorials/performance_tools/ • http: //www.

Gprof – profiling Linux programs Links • https: //computing. llnl. gov/tutorials/performance_tools/ • http: //www. cs. utah. edu/dept/old/texinfo/as/gprof_toc. html http: //www. cs. utah. edu/dept/old/texinfo/as/gprof. html http: //en. wikipedia. org/wiki/Profiling_%28 computer_programmin g%29 http: //en. wikipedia. org/wiki/List_of_performance_analysis_tools http: //docs. freebsd. org/44 doc/psd/18. gprof/paper. pdf http: //linuxgazette. net/100/vinayak. html http: //ececmpsysweb. groups. et. byu. net/cmpsys. 2004. winter/citi zenship/Bryan_Wheeler/Profiling_Tutorial. html • • • – 33 – CSCE 713 Spring 2012

Man gprof GPROF(1) NAME GNU GPROF(1) gprof - display call graph profile data SYNOPSIS

Man gprof GPROF(1) NAME GNU GPROF(1) gprof - display call graph profile data SYNOPSIS gprof [ -[abc. Dhil. Lrs. Tvwxyz] ] [ -[ACe. Ef. FJn. NOp. Pq. QZ][name] ] [ -I dirs ] [ -d[num] ] [ -k from/to ] …] [ --[no-]exec-counts[=name] ] … [ --debug[=level] ] [ --function-ordering ] … [ --file-info ] [ --help ] [ --line ] [ --min-count=n ] … [ --static-call-graph ] [ --sum ] [ --table-length=len ] … [ image-file ] [ profile-file. . . ] – 34 – CSCE 713 Spring 2012

Gprof – GNU profiler http: //www. unix-tutorials. com/go. php? id=321 -Arnout Engelen $ time

Gprof – GNU profiler http: //www. unix-tutorials. com/go. php? id=321 -Arnout Engelen $ time event 2 dot //this took more than 3 minutes on this input: real 3 m 36. 316 s user 0 m 55. 590 s sys 0 m 1. 070 s $ g++ -pg dotgen. cpp readfile. cpp main. cpp graph. cpp config. cpp -o event 2 dot $ gprof event 2 dot | less'. gprof now shows us the following functions are important: % cumulative self time seconds 43. 32 46. 03 25. 06 72. 66 26. 63 16. 80 90. 51 17. 85 12. 70 104. 01 13. 50 – 35 – 1. 98 106. 11 2. 10 self total calls s/call 339952989 0. 00 55000 0. 00 339433374 0. 00 51987 0. 00 s/call 0. 00 name Compare. Nodes(…) get. Node(…) Compare. Edges(…) add. Annotated. Edge(… ) add. Edge(…)CSCE 713 Spring 2012

Valgrind is a memory mismanagement detector. It shows you memory leaks, deallocation errors, cache

Valgrind is a memory mismanagement detector. It shows you memory leaks, deallocation errors, cache profiling Memcheck can detect: • Use of uninitialised memory • Reading/writing memory after it has been free'd • Reading/writing off the end of malloc'd blocks • Reading/writing inappropriate areas on the stack • Memory leaks -- where pointers to malloc'd blocks are lost forever • Mismatched use of malloc/new; vs free/delete; • Overlapping src and dst pointers in memcpy() and related functions • Some misuses of the POSIX pthreads API – 36 – http: //cs. ecs. baylor. edu/~donahoo/tools/valgrind/ CSCE 713 Spring 2012

General Parallel Computing Wiki-Links • • – 37 – http: //en. wikipedia. org/wiki/Category: Parallel_computing

General Parallel Computing Wiki-Links • • – 37 – http: //en. wikipedia. org/wiki/Category: Parallel_computing http: //en. wikipedia. org/wiki/Intel_Parallel_Studio http: //www. itservices. hku. hk/sp 2/workshop/html/ibmhwsw/ib mhwsw. html#power CSCE 713 Spring 2012

Libraries • http: //static. msi. umn. edu/tutorial/scicomp/sp/mathnumerical-lib-SP/num_lib. html – 38 – CSCE 713 Spring

Libraries • http: //static. msi. umn. edu/tutorial/scicomp/sp/mathnumerical-lib-SP/num_lib. html – 38 – CSCE 713 Spring 2012

– 39 – CSCE 713 Spring 2012

– 39 – CSCE 713 Spring 2012