CSCE 713 Advanced Computer Architecture Lecture 4 Timing
- Slides: 38
CSCE 713 Advanced Computer Architecture Lecture 4 Timing Linux Processes & Threads Topics n … Readings January 19, 2012
Overview Last Time n Posix Pthreads: create, join, exit, mutexes n /class/csce 713 -006 Code and Data Readings for today n http: //csapp. cs. cmu. edu/public/1 e/public/ch 9 -preview. pdf New n Website alive and kicking; dropbox too! n From Last time: Gauss-Seidel Method, Barriers, Threads Assignment Next time performance evaluation, barriers and MPI intro n – 2– CSCE 713 Spring 2012
Threads programming Assignment 1. Matrix addition (embarassingly parallel) 2. Versions a. b. c. d. Sequential with blocking factor Sequential Read without conversions Multi threaded passing number of threads as command line argument (args. c code should be distributed as an example) 3. Plot of several runs 4. Next time – 3– CSCE 713 Spring 2012
Time in the Computer World Clock cycle time = ? 1 GHz processor has 10 -9 = 1 ns clock cycle time – 4 –Computer Systems: A Programmers Perspective, Bryant and O’Hallaron CSCE 713 Spring 2012
Times in the Unix World Command Real Time User Time System Time Waiting Time – time waiting on I/O operations and while other processes execute Clock Time Which time is most important? Latency vs throughput – 5– CSCE 713 Spring 2012
The Time Command TIME(1) NAME time - run programs and summarize system resource usage Example of a sleepy program real 3 m 10. 006 s user 0 m 0. 004 s sys 0 m 0. 000 s – 6– CSCE 713 Spring 2012
MIPS, MFLOPS, Benchmarks SPEC 2006 – relevance to this course – 7– CSCE 713 Spring 2012
Multitasking and Interval Interrupts Recall Multitasking from OS class: • multiple processes ready to go • scheduler decides who goes next • each process given small time slice (1 -10 ms) • The process end either because of an I/O operation, page-fault, even cache-miss, or it uses up time slice • System has an “interval timer” to interupt at the end of the time slice – 8– CSCE 713 Spring 2012
CSAPP Fig 9. 2. – 9 –Computer Systems: A Programmers Perspective, Bryant and O’Hallaron CSCE 713 Spring 2012
Measuring Time by Interval Counting. – 10 –Computer Systems: A Programmers Perspective, Bryant and O’Hallaron CSCE 713 Spring 2012
Practice Problem 9. 4 On a system with a timer interval of 10 ms, some segment of process A is recorded as requiring 70 ms, combining both system and user time. What are the minimum and maximum actual times used by thissegment? – 11 – CSCE 713 Spring 2012
Programmer Access to Interval Timers #include <sys/times. h> struct tms clock_t tms_utime; /* user time * / clock_t tms_s time; /* system time * / clock_t tms_cutime; /* user time of reaped children */ clock_t tms_cstime; /* system time of reaped children */ }; clock_t times(struct tms *buf); Returns: number of clock ticks elapsed since system started – 12 –Computer Systems: A Programmers Perspective, Bryant and O’Hallaron CSCE 713 Spring 2012
Measuring interval counting accuracy. Fig 9. 8 – 13 – CSCE 713 Spring 2012
Cycle Counters on the IA 32 Cycle counters increment every clock cycle 32 bit counter with 1 GHz clock wraps after 232/109 ~ 4. 3 seconds 64 bit counter takes a lot longer 264/109 ~ 570 years IA 32 counter accessed with the “rdtsc” (read time stamp counter) instruction It places high order 32 bits in %edx and low 32 in %eax It would be nice if we had a c interface void access counter(unsigned *hi, unsigned *lo); – 14 – CSCE 713 Spring 2012
Including Assembly Code in C void access_counter(unsigned *hi, unsigned *lo) { asm("rdtsc; movl %%edx, %0; movl %%eax, %1" /* Read cycle counter */ : "=r" (*hi), "=r" (*lo) : /* No input */ /* and move results to */ /* the two outputs */ : "%edx", "%eax"); } – 15 – CSCE 713 Spring 2012
Closer Look at Extended ASM asm(“Instruction String" : Output List : Input List : Clobbers List); } void access_counter (unsigned *hi, unsigned *lo) { /* Get cycle counter */ asm("rdtsc; movl %%edx, %0; movl %%eax, %1" : "=r" (*hi), "=r" (*lo) : /* No input */ : "%edx", "%eax"); } Instruction String n Series of assembly commands l Separated by “; ” or “n” l Use “%%” where normally would use “%” – 16 – CSCE 713 Spring 2012
Closer Look at Extended ASM asm(“Instruction String" : Output List : Input List : Clobbers List); } Output List n void access_counter (unsigned *hi, unsigned *lo) { /* Get cycle counter */ asm("rdtsc; movl %%edx, %0; movl %%eax, %1" : "=r" (*hi), "=r" (*lo) : /* No input */ : "%edx", "%eax"); } Expressions indicating destinations for values %0, %1, …, %j l Enclosed in parentheses l Must be lvalue » Value that can appear on LHS of assignment n – 17 – Tag "=r" indicates that symbolic value (%0, etc. ), should be replaced by register CSCE 713 Spring 2012
Closer Look at Extended ASM asm(“Instruction String" : Output List : Input List : Clobbers List); } void access_counter (unsigned *hi, unsigned *lo) { /* Get cycle counter */ asm("rdtsc; movl %%edx, %0; movl %%eax, %1" : "=r" (*hi), "=r" (*lo) : /* No input */ : "%edx", "%eax"); } Input List n Series of expressions indicating sources for values %j+1, %j+2, … l Enclosed in parentheses l Any expression returning value n – 18 – Tag "r" indicates that symbolic value (%0, etc. ) will come from register CSCE 713 Spring 2012
Closer Look at Extended ASM asm(“Instruction String" : Output List : Input List : Clobbers List); } void access_counter (unsigned *hi, unsigned *lo) { /* Get cycle counter */ asm("rdtsc; movl %%edx, %0; movl %%eax, %1" : "=r" (*hi), "=r" (*lo) : /* No input */ : "%edx", "%eax"); } Clobbers List n n List of register names that get altered by assembly instruction Compiler will make sure doesn’t store something in one of these registers that must be preserved across asm l Value set before & used after – 19 – CSCE 713 Spring 2012
Using access_counter #include "clock. h" void start counter(); double get counter(); – 21 – CSCE 713 Spring 2012
The K-Best Measurement Scheme Measurements of process time in a loaded system are always overestimates Context switching, cache operations, branch predictions Interference from other processes Even unloaded I/O operations will vary because of locations of disk heads etc. Heisenberg uncertainty principle Heisenbugs – bugs influenced/created by monitoring code – 22 – CSCE 713 Spring 2012
The K-Best Measurement Scheme 1. Warm up the instruction cache 2. Record the K fastest times 3. If these agree with some tolerance say. 1% then the fastest of these represents the true execution time. – 23 – CSCE 713 Spring 2012
Measuring Using Time of Day Code/CSAPP/perf/tod. c #include <stdlib. h> #include <stdio. h> #include <time. h> #include <sys/time. h> #include <unistd. h> static struct timeval tstart; /* Record current time */ void start_timer() { gettimeofday(&tstart, NULL); – }24 – CSCE 713 Spring 2012
/* Get number of seconds since last call to start_timer */ double get_timer() { struct timeval tfinish; long sec, usec; gettimeofday(&tfinish, NULL); sec = tfinish. tv_sec - tstart. tv_sec; usec = tfinish. tv_usec - tstart. tv_usec; return sec + 1 e-6*usec; } – 25 – CSCE 713 Spring 2012
/* Determine how many "seconds" are in tod counter */ static void callibrate_tod() { double quick, slow; start_timer(); quick = get_timer(); start_timer(); sleep(1); slow = get_timer(); printf("%. 2 f - %f = %. 2 f seconds/sleep secondn", slow, quick, slow-quick); } – 26 – CSCE 713 Spring 2012
static void run_timer() { int i = 0; double t 1, t 2, t 3, t 4; start_timer(); t 1 = get_timer(); do { t 2 = get_timer(); } while (t 2 == t 1); do { t 3 = get_timer(); i++; } while (t 3 == t 2); printf("Time = %f usecs, in %d iterations, %f usec/iterationn“, 1 e 6 *(t 3 -t 2), i, 1 e 6 *(t 3 -t 2)/i); – 27 – CSCE 713 Spring 2012
start_timer(); i = 0; do { t 4 = get_timer(); i++; } while (t 4 < 1. 0); printf("%d iterations in %f secs = %f usec/iterationn", i, t 4, 1 e 6*t 4/i); } – 28 – CSCE 713 Spring 2012
static void check_time() { long e; int s, m, h, d, y; start_timer(); e = tstart. tv_sec; s = e % 60; e = e / 60; m = e % 60; e = e / 60; h = e % 24; e = e / 24; d = e % 365; e = e / 365; y = e; printf("This clock started %d years, %d days, %d hours, %d minutes, %d seconds agon“, y, d, h, m, s); – }29 – CSCE 713 Spring 2012
int main(int arg, char *argv[]) { callibrate_tod(); run_timer(); check_time(); return 0; } – 30 – CSCE 713 Spring 2012
– 31 – CSCE 713 Spring 2012
– 32 – CSCE 713 Spring 2012
Gprof – profiling Linux programs Links • https: //computing. llnl. gov/tutorials/performance_tools/ • http: //www. cs. utah. edu/dept/old/texinfo/as/gprof_toc. html http: //www. cs. utah. edu/dept/old/texinfo/as/gprof. html http: //en. wikipedia. org/wiki/Profiling_%28 computer_programmin g%29 http: //en. wikipedia. org/wiki/List_of_performance_analysis_tools http: //docs. freebsd. org/44 doc/psd/18. gprof/paper. pdf http: //linuxgazette. net/100/vinayak. html http: //ececmpsysweb. groups. et. byu. net/cmpsys. 2004. winter/citi zenship/Bryan_Wheeler/Profiling_Tutorial. html • • • – 33 – CSCE 713 Spring 2012
Man gprof GPROF(1) NAME GNU GPROF(1) gprof - display call graph profile data SYNOPSIS gprof [ -[abc. Dhil. Lrs. Tvwxyz] ] [ -[ACe. Ef. FJn. NOp. Pq. QZ][name] ] [ -I dirs ] [ -d[num] ] [ -k from/to ] …] [ --[no-]exec-counts[=name] ] … [ --debug[=level] ] [ --function-ordering ] … [ --file-info ] [ --help ] [ --line ] [ --min-count=n ] … [ --static-call-graph ] [ --sum ] [ --table-length=len ] … [ image-file ] [ profile-file. . . ] – 34 – CSCE 713 Spring 2012
Gprof – GNU profiler http: //www. unix-tutorials. com/go. php? id=321 -Arnout Engelen $ time event 2 dot //this took more than 3 minutes on this input: real 3 m 36. 316 s user 0 m 55. 590 s sys 0 m 1. 070 s $ g++ -pg dotgen. cpp readfile. cpp main. cpp graph. cpp config. cpp -o event 2 dot $ gprof event 2 dot | less'. gprof now shows us the following functions are important: % cumulative self time seconds 43. 32 46. 03 25. 06 72. 66 26. 63 16. 80 90. 51 17. 85 12. 70 104. 01 13. 50 – 35 – 1. 98 106. 11 2. 10 self total calls s/call 339952989 0. 00 55000 0. 00 339433374 0. 00 51987 0. 00 s/call 0. 00 name Compare. Nodes(…) get. Node(…) Compare. Edges(…) add. Annotated. Edge(… ) add. Edge(…)CSCE 713 Spring 2012
Valgrind is a memory mismanagement detector. It shows you memory leaks, deallocation errors, cache profiling Memcheck can detect: • Use of uninitialised memory • Reading/writing memory after it has been free'd • Reading/writing off the end of malloc'd blocks • Reading/writing inappropriate areas on the stack • Memory leaks -- where pointers to malloc'd blocks are lost forever • Mismatched use of malloc/new; vs free/delete; • Overlapping src and dst pointers in memcpy() and related functions • Some misuses of the POSIX pthreads API – 36 – http: //cs. ecs. baylor. edu/~donahoo/tools/valgrind/ CSCE 713 Spring 2012
General Parallel Computing Wiki-Links • • – 37 – http: //en. wikipedia. org/wiki/Category: Parallel_computing http: //en. wikipedia. org/wiki/Intel_Parallel_Studio http: //www. itservices. hku. hk/sp 2/workshop/html/ibmhwsw/ib mhwsw. html#power CSCE 713 Spring 2012
Libraries • http: //static. msi. umn. edu/tutorial/scicomp/sp/mathnumerical-lib-SP/num_lib. html – 38 – CSCE 713 Spring 2012
– 39 – CSCE 713 Spring 2012
- Class waiver interest list
- Wij moeten gode zingen
- Timing and control in computer architecture
- Computer architecture notes
- Isa vs microarchitecture
- Fundamentals of cpu in advanced computer architecture
- 01:640:244 lecture notes - lecture 15: plat, idah, farad
- Advanced inorganic chemistry lecture notes
- Three bus architecture
- Organization and architecture difference
- Basic computer organisation and design
- Computer security 161 cryptocurrency lecture
- Computer-aided drug design lecture notes
- Advanced topics in computer science
- Advanced computer graphics
- Advanced computer forensics
- Advanced computer forensics
- Csce 221 tamu syllabus
- Csce 314
- Csce 314
- Csce 314 tamu
- Tamu csce 314
- Tamu csce 314
- Csce 314
- Csce 481
- Csce 181
- Csce 181
- Csce 181 tamu
- Csce 121 tamu
- Csce 411
- Csce 355
- Csce 355
- Csce 350
- Csce 350
- Csce 211
- Tamu csce 221
- Csce 313 github
- Csce 587
- Csce 492