CSCE 513 Computer Architecture Lecture 18Linux Performance II
- Slides: 38
CSCE 513 Computer Architecture Lecture 18–Linux Performance II Topics n Measuring Process time n timeval timespec Improving performance n n November 27, 2017
Linux – Sytem Info saluda> lscpu Architecture: i 686 CPU op-mode(s): 32 -bit, 64 -bit CPU(s): 4 Thread(s) per core: 1 Core(s) per socket: 4 CPU socket(s): 1 Vendor ID: Genuine. Intel CPU family: 6 Model: 15 Stepping: 11 CPU MHz: 2393. 830 Virtualization: VT-x L 1 d cache: 32 K L 1 i cache: 32 K L 2 cache: 4096 K –saluda> 2– CSCE 513 Fall 2017
Control Panel System and Sec… System … … – 3– CSCE 513 Fall 2017
– 4– CSCE 513 Fall 2017
Task Manager Ctrl+Alt+Delete then select Task Manager – 5– CSCE 513 Fall 2017
/proc /usr/bin/procinfo r 2 d 2> procinfo The program 'procinfo' is currently not installed. To run 'procinfo' please ask your administrator to install the package 'procinfo' – 6– CSCE 513 Fall 2017
time filetimes - #seconds since Jan 1, 1970 time of day - struct timeval seconds microseconds Functions: § gettimeofday, § getrusage – 7– CSCE 513 Fall 2017
timeval structure struct timeval { __kernel_time_t tv_sec; /* seconds */ __kernel_suseconds_t tv_usec; /* microseconds */ }; – 8– CSCE 513 Fall 2017
Gettimeofday – Wall time #include <stdio. h> #include <sys/time. h> int main (int argc, char** argv) { struct timeval tval. Before, tval. After; gettimeofday (&tval. Before, NULL); int i =0; while ( i < 10000) { i ++; } gettimeofday (&tval. After, NULL); printf("Time in microseconds: %ld microsecondsn", ((tval. After. tv_sec - tval. Before. tv_sec)*1000000 L +tval. After. tv_usec) - tval. Before. tv_usec ); – 9– return 0; } CSCE 513 Fall 2017
Time Command r 2 d 2> time real gcc mt_mm. c -lpthread 0 m 0. 071 s user 0 m 0. 036 s sys 0 m 0. 020 s – 10 – CSCE 513 Fall 2017
r 2 d 2> time. /a. out 100 100 8 With 1 threads: 0. 012000 seconds With 2 threads: 0. 008001 seconds With 3 threads: 0. 008000 seconds … With 6 threads: 0. 016001 seconds With 7 threads: 0. 008001 seconds With 8 threads: 0. 004000 seconds real 0 m 0. 038 s user 0 m 0. 076 s sys 0 m 0. 008 s – 11 – CSCE 513 Fall 2017
r 2 d 2> man getrusage GETRUSAGE(2) NAME Linux Programmer's Manual GETRUSAGE(2) - getrusage - get resource usage SYNOPSIS #include <sys/time. h> #include <sys/resource. h> int getrusage(int who, struct rusage *usage); DESCRIPTION getrusage() returns resource usage measures for who, which can be one of the following: n n n – 12 – RUSAGE_SELF RUSAGE_CHILDREN RUSAGE_THREAD CSCE 513 Fall 2017
I-7 links • move to multicore NY times • http: //www. nytimes. com/2004/05/08/business/08 chip. html? ex =1399348800&en=98 cc 44 ca 97 b 1 a 562&ei=5007 • http: //software. intel. com/en-us/articles/addingparallelism-sample-code • http: //software. intel. com/en-us/articles/performancetools-for-software-developers-how-do-i-run-the-intelipp-samples • http: //software. intel. com/en-us/code-downloads • http: //software. intel. com/en-us/intel-parallel-universemagazine – 13 – CSCE 513 Fall 2017
matmul. c - get seconds used struct timeval { __kernel_time_t tv_sec; /* seconds */ __kernel_suseconds_t tv_usec; /* microseconds */ }; – 14 – CSCE 513 Fall 2017
double seconds(int nmode){ struct rusage buf; double temp; getrusage( nmode, &buf ); /* Get system time and user time in micro-seconds. */ temp = (double)buf. ru_utime. tv_sec*1. 0 e 6 + (double)buf. ru_utime. tv_usec + (double)buf. ru_stime. tv_sec*1. 0 e 6 + (double)buf. ru_stime. tv_usec; /* Return the sum of system and user time in SECONDS. */ return( temp*1. 0 e-6 ); } – 15 – CSCE 513 Fall 2017
Matrix Multiplication tstart = seconds(RUSAGE_SELF); for(i=0; i<m; ++i){ for(j=0; j<n; ++j){ C[i][j] = 0. 0; for(x=0; x<k; ++x){ C[i][j] = C[i][j] + A[i][x] * B[x][j]; } } } tend = seconds(RUSAGE_SELF); – 16 – CSCE 513 Fall 2017
allocmatrix double **allocmatrix(int nrows, int ncols) { double **m; int i; m=(double **) malloc((unsigned)(nrows)*sizeof(double*)); if (!m) nerror("allocation failure 1 in matrix()"); for(i=0; i<nrows; i++) { m[i]=(double *) malloc((unsigned)(ncols) * sizeof(double)); if (!m[i]) nerror("allocation failure 2 in matrix()"); } return m; } – 17 – CSCE 513 Fall 2017
More Notes on Matrix Multiplication § drand 48, random for generating elements of arrays randomly § nerror (error_text) – § fprintf(stderr, "Run-time error. . . n%sn ", error_text); § freematrix § m = atoi(argv[1]); § § – 18 – aoti = alpha to integer atoi( argv[1] ) equiv to strtol( argv[1] , (char **) NULL, 10); CSCE 513 Fall 2017
struct timespec – finer process times * /usr/include/time. h contains the timespec definition * struct timespec{ * __time_t tv_sec; * long int tv_nsec; * }; */ int clock_getres(clockid_t clk_id, struct timespec *res); int clock_gettime(clockid_t clk_id, struct timespec *tp); int clock_settime(clockid_t clk_id, const struct timespec *tp); – 19 – CSCE 513 Fall 2017
time 0. c struct timespec start; struct timespec finish; int retval; retval = clock_getres(CLOCK_MONOTONIC, &clk_resolution); retval = clock_gettime(CLOCK_MONOTONIC, &start); … do something … retval = clock_gettime(CLOCK_MONOTONIC, &finish); dumptimespec(&finish); – 20 – CSCE 513 Fall 2017
dumptimespec(struct timespec *ts){ printf("seconds=%ld, nanoseconds=%ldn", ts->tv_sec, ts->tv_nsec); } – 21 – CSCE 513 Fall 2017
Amdahl’s Law % Parallelizable Suppose you have an enhancement or improvement in a design component. The improvement in the performance of the system is limited by the % of the time the enhancement can be used – 22 – Ref. CAAQA CSCE 513 Fall 2017
Cache Performance. – 23 – CSAPP – Bryant O’Hallaron CSCE 513 Fall 2017
row major strides vs column strides arrays stored in row major order row major strides column strides – 24 – CSCE 513 Fall 2017
Matrix Multiply blocking-loop CAAQA page 90 for (jj=0; jj < N; jj = jj+B) for (kk=0; kk < N; kk = kk+B) for(i=0; i<m; ++i){ for(j=jj; j<n; ++j){ C[i][j] = 0. 0; for(k=kk; k<min(kk+B, N); ++k){ C[i][j] = C[i][j] + A[i][k] * B[x][k]; } } } – 25 – CSCE 513 Fall 2017
cpp macros cpp – c preprocessor directives #define MAXLINE 1024 #define min(a, b) ((a)<(b) ? (a) : (b)) Well #define max(a, b) ({ __typeof__ (a) _a = (a); __typeof__ (b) _b = (b); _a > _b ? _a : _b; }) http: //stackoverflow. com/questions/3437404/min-and-max-in-c – 26 – CSCE 513 Fall 2017
Pointer Arithmetic If p is a pointer what is p++ ? Decl char * float * double * struct v * p++ p+1 p+4 p+8 p+ sizeof struct v p+4 p+1 p+4 p+8 p*5 p p*4 p*8 If a is the name of an array, then a is a pointer to the base of the array, i. e. &a[0][0]. – 27 – CSCE 513 Fall 2017
Array reference macro &v = address-of operator Recall from data structures &A[i][j] = &A[0][0] + skip i rows + skip j elements = &A[0][0] + i * rowsize + j * elementsize = &A[0][0] + i * numcols*elementsize + j * elementsize A(i, j) = address of macro for A[i][j] with n columns #define A(i, j) (A + (i)*n + j) – 28 – CSCE 513 Fall 2017
Matrix multiply with address macro for(i=0; i<m; ++i){ for(j=0; j<n; ++j){ C[i][j] = 0. 0; for(x=0; x<k; ++x){ *C(i, j) = *C(i, j) + *A(i, x) * B(x, j); } } } Note pointer dereference – 29 – CSCE 513 Fall 2017
Libraries BLAS – Basic Linear Algebra Subroutines (1970’s) BLAS 2 BLAS 3 BOOST – 30 – CSCE 513 Fall 2017
Reading Writing matrices fprintf(fp, “%e “, a[i][j]); fscanf(fp, “%e”, &a[i][k]); Problem? Solution: fread and fwrite § read and write without conversion #include <stdio. h> size_t fread(void *ptr, size_t size, size_t nmemb, FILE *str); size_t fwrite(const void *ptr, size_t size, size_t nmemb, – 31 – FILE *str); CSCE 513 Fall 2017
fread and fwrite #include <stdio. h> size_t fread(void *ptr, size_t size, size_t nmemb, FILE *str); size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *str); – 32 – CSCE 513 Fall 2017
Valgrind Suite of Tools Valgrind includes six production-quality tools: • a memory error detector, • two thread error detectors, • a cache and branch-prediction profiler, • a call-graph generating cache and • branch-prediction profiler, and • a heap profiler. Valgrind is Open Source / Free Software, and is freely available under the GNU General Public License, version 2. – 33 – http: //valgrind. org/ CSCE 513 Fall 2017
Linux 101 Hacks e. Book, by Ramesh Natarajan Chapter 12: System Monitoring and Performance • Hack 89. Free Command • Hack 90. Top Command Hack 91. Df Command • • • – 34 – Hack 92. Du Command Hack 93. Lsof Commands Hack 94. Vmstat Command Hack 95. Netstat Command Hack 96. Sysctl Command Hack 97. Nice Command Hack 98. Renice Command Hack 99. Kill Command Hack 100. Ps Command Hack 101. Sar Command http: //www. thegeekstuff. com/linux-101 -hacks-ebook/ CSCE 513 Fall 2017
GNU Binutils The GNU Binutils are a collection of binary tools. The main ones are: § § ld - the GNU linker. as - the GNU assembler. § But they also include: § § § – 35 – addr 2 line - Converts addresses into filenames and line numbers. ar - A utility for creating, modifying and extracting from archives. c++filt - Filter to demangle encoded C++ symbols. dlltool - Creates files for building and using DLLs. gold - A new, faster, ELF only linker, still in beta test. gprof - Displays profiling information. CSCE 513 Fall 2017
GNU Binutils (continued) § § § – 36 – nlmconv - Converts object code into an NLM. nm - Lists symbols from object files. objcopy - Copys and translates object files. objdump - Displays information from object files. ranlib - Generates an index to the contents of an archive. readelf - Displays information from any ELF format object file. size - Lists the section sizes of an object or archive file. strings - Lists printable strings from files. strip - Discards symbols. windmc - A Windows compatible message compiler. windres - A compiler for Windows resource files. CSCE 513 Fall 2017
– 37 – CSCE 513 Fall 2017
– 38 – CSCE 513 Fall 2017
- Computer architecture notes
- Isa computer architecture
- Ghc cmu
- Csci 513 usc
- Linux 513
- Ee-513
- Ee-513
- Ee-513
- Upc 2-510
- Was julius caesar a good emperor
- Carnegie mellon
- 01:640:244 lecture notes - lecture 15: plat, idah, farad
- Computer architecture performance evaluation methods
- Performance equation in computer architecture
- Bus design in computer architecture
- Computer architecture vs computer organization
- Design of basic computer in computer architecture
- Performance management lecture
- Lecture performance definition
- Computer security 161 cryptocurrency lecture
- Computer-aided drug design lecture notes
- Behaviorally anchored rating scale
- Behaviorally anchored rating scale
- 2018 jcids manual
- Csce 221 tamu
- Csce 314
- Csce 314
- Csce 314 tamu
- Tamu csce 314
- Tamu csce 314
- Csce 314 tamu
- Csce 481 tamu
- Csce 181
- Csce 181
- Csce 181
- Csce 121 tamu
- Csce 411
- Csce 355
- Csce 355