CSCE 513 Computer Architecture Measuring Program Performance Matrix



















![for(i=0; i<rows; ++i){ for(j=0; j<cols 2; ++j){ for(k=0; k<cols; ++k){ C[i][j] = C[i][j] + for(i=0; i<rows; ++i){ for(j=0; j<cols 2; ++j){ for(k=0; k<cols; ++k){ C[i][j] = C[i][j] +](https://slidetodoc.com/presentation_image_h2/8948342d5dd9d75c3e64144d1fca82eb/image-20.jpg)








- Slides: 28
CSCE 513 Computer Architecture Measuring Program Performance Matrix Multiply Topics § Linux times § Matrix multiplication Readings: November 20, 2017 Performance: Matrix Multiplication – 1 – CSCE 513 Fall 2017
Times in Unix 1. File times § ls –l gives modification date (#seconds since Jan 1, 1970) 2. Process times struct timeval { long tv_sec; long tv_usec; /* seconds */ /* microseconds */ }; Performance: Matrix Multiplication – 2 – CSCE 513 Fall 2017
The time command cocsce-l 1 d 39 -11> time gcc pthread 1. c -l pthread -o pthread 1 real 0 m 0. 077 s user 0 m 0. 052 s sys 0 m 0. 012 s cocsce-l 1 d 39 -11> § Note real == wall clock time, and § real-time >= user-time + system-time Performance: Matrix Multiplication – 3 – CSCE 513 Fall 2017
cocsce-l 1 d 39 -11> gcc pthread 1. c -l pthread -o pthread 1 cocsce-l 1 d 39 -11>. /pthread 1 In main: creating thread 0 In main: creating thread 1 In main: creating thread 2 Hello World! It's me, thread 0! In main: creating thread 3 Hello World! It's me, thread 1! Hello World! It's me, thread 2! In main: creating thread 4 Hello World! It's me, thread 3! Hello World! It's me, thread 4! Performance: Matrix Multiplication – 4 – CSCE 513 Fall 2017
TIME(7) Linux Programmer's Manual TIME(7) NAME time - overview of time and timers DESCRIPTION Real time and process time Real time is defined as time measured from some fixed point, either from a standard point in the past (see the description of the Epoch and calendar time below), or from some point (e. g. , the start) in the life of a process (elapsed time). Process time is defined as the amount of CPU time used by a process. This is some‐ times divided into user and system components. User CPU time is the time spent executing code in user mode. System CPU time is the time spent by the kernel exe‐ cuting in system mode on behalf of the process (e. g. , executing system calls). The time(1) command can be used to determine the amount of CPU time consumed during the execution of a program. A program can determine the amount of CPU time it has con‐ sumed using times(2), getrusage(2), or clock(3). The hardware clock … Performance: Matrix Multiplication – 5 – CSCE 513 Fall 2017
struct rusage { struct timeval ru_utime; struct timeval ru_stime; long ru_maxrss; long ru_idrss; long ru_isrss; long ru_minflt; long ru_majflt; long ru_nswap; long ru_inblock; long ru_oublock; long ru_msgsnd; long ru_msgrcv; long ru_nsignals; long ru_nvcsw; long ru_nivcsw; }; Performance: Matrix Multiplication – 6 – Getrusage /* user CPU time used */ /* system CPU time used */ /* maximum resident set size */ /* integral shared memory size */ /* integral unshared data size */ /* integral unshared stack size */ /* page reclaims (soft page faults) */ /* page faults (hard page faults) */ /* swaps */ /* block input operations */ /* block output operations */ /* IPC messages sent */ /* IPC messages received */ /* signals received */ /* voluntary context switches */ /* involuntary context switches */ CSCE 513 Fall 2017
struct timeval { long tv_sec; long tv_usec; /* seconds */ /* microseconds */ }; Performance: Matrix Multiplication – 7 – CSCE 513 Fall 2017
Matmult. c - example Performance: Matrix Multiplication – 8 – CSCE 513 Fall 2017
3 Nested Loops to compute product for(i=0; i<rows; ++i){ for(j=0; j<cols 2; ++j){ for(k=0; k<cols; ++k){ C[i][j] = C[i][j] + A[i][k] * B[k][j]; } } } Note rows*cols 2 *cols multiplications and additions If for square matrices rows=cols 2=cols= n there are n 3 multiplications Performance: Matrix Multiplication – 9 – CSCE 513 Fall 2017
Headers #include <stdio. h> #include <stdlib. h> #include <math. h> #include <assert. h> #include <time. h> #include <sys/resource. h> double **allocmatrix(int, int ); int freematrix(double **, int); void nerror(char *error_text); double seconds(int nmode); double rand_gen(double fmin, double fmax); void Set. Seed(int flag); Performance: Matrix Multiplication – 10 – CSCE 513 Fall 2017
int main(int argc, char** argv) { int l, rows, cols 2, cols; int i, j, k; double temp; double **A, **B, **C; double tstart, tend; /* ************************** // * The following allows matrix parameters to be * // * entered on the command line to take advantage * // * of dynamically allocated memory. You may modify * // * or remove it as you wish. * // **************************/ if (argc != 4) { nerror("Usage: <executable> <rows-value> <cols 2 -value>"); } rows = atoi(argv[1]); /* A is a rows x cols matrix */ cols = atoi(argv[2]); /* B is a cols x cols 2 matrix */ cols 2 = atoi(argv[3]); /* So C=A*B is a rows x cols 2 matrix */ Main: args Performance: Matrix Multiplication – 11 – CSCE 513 Fall 2017
Initializing the arrays A=(double **) allocmatrix(rows, cols); /* ***************************** // * Initialize matrix elements so compiler does not // * optimize out * * // *****************************/ for(i=0; i<rows; i++) { for(j=0; j<cols; j++) { A[i][j] = rand_gen(1. 0, 2. 0); /* if(i == j) A[i][j]=1. 0; else A[i][j] = 0. 0; */ } } Performance: Matrix Multiplication – 12 – CSCE 513 Fall 2017
Rand_gen /* generate a random double between fmin and fmax */ double rand_gen(double fmin, double fmax) { return fmin + (fmax - fmin) * drand 48(); } /* The drand 48() and erand 48() functions return nonnegative double-precision floating-point values uniformly distributed over the interval [0. 0, 1. 0). */ Performance: Matrix Multiplication – 13 – CSCE 513 Fall 2017
Seconds- a function to combine all the times into one double /* Returns the total cpu time used in seconds. */ double seconds(int nmode){ struct rusage buf; double temp; getrusage( nmode, &buf ); /* Get system time and user time in micro-seconds. */ temp = (double)buf. ru_utime. tv_sec*1. 0 e 6 + (double)buf. ru_utime. tv_usec + (double)buf. ru_stime. tv_sec*1. 0 e 6 + (double)buf. ru_stime. tv_usec; /* Return the sum of system and user time in SECONDS. */ return( temp*1. 0 e-6 ); } Performance: Matrix Multiplication – 14 – CSCE 513 Fall 2017
Timing a section of code tstart = seconds(RUSAGE_SELF); for(i=0; i<rows; ++i){ for(j=0; j<cols 2; ++j){ for(k=0; k<cols; ++k){ C[i][j] = C[i][j] + A[i][k] * B[k][j]; } } } tend = seconds(RUSAGE_SELF); Performance: Matrix Multiplication – 15 – CSCE 513 Fall 2017
Timing a section of code – kij variation tstart = seconds(RUSAGE_SELF); for(k=0; k<cols; ++k){ for(i=0; i<rows; ++i){ for(j=0; j<cols 2; ++j){ C[i][j] = C[i][j] + A[i][k] * B[k][j]; } } } tend = seconds(RUSAGE_SELF); Performance: Matrix Multiplication – 16 – CSCE 513 Fall 2017
Timing a section of code – kji variation tstart = seconds(RUSAGE_SELF); for(k=0; k<cols; ++k){ for(j=0; j<cols 2; ++j){ for(i=0; i<rows; ++i){ C[i][j] = C[i][j] + A[i][k] * B[k][j]; } } } tend = seconds(RUSAGE_SELF); Performance: Matrix Multiplication – 17 – CSCE 513 Fall 2017
Performance variations cocsce-l 1 d 39 -11> gcc kij. c -o kij cocsce-l 1 d 39 -11> gcc kji. c -o kji cocsce-l 1 d 39 -11>. /matmul 1000 The total CPU time is: 9. 212000 seconds cocsce-l 1 d 39 -11>. /kij 1000 The total CPU time is: 3. 712000 seconds cocsce-l 1 d 39 -11>. /kji 1000 The total CPU time is: 14. 264000 seconds Performance: Matrix Multiplication – 18 – CSCE 513 Fall 2017
Address Trace &x – address of operator § &x – address of x § if((mytracefile = fopen(“trace”, “w”)) == NULL) fprintf(stderr, “Could not open file %s!n”, “trace”; § fprintf(mytracefile, “address of x is %pn”, &x); Performance: Matrix Multiplication – 19 – CSCE 513 Fall 2017
for(i=0; i<rows; ++i){ for(j=0; j<cols 2; ++j){ for(k=0; k<cols; ++k){ C[i][j] = C[i][j] + A[i][k] * B[k][j]; } } } Performance: Matrix Multiplication – 20 – CSCE 513 Fall 2017
Performance: Matrix Multiplication – 21 – CSCE 513 Fall 2017
Performance: Matrix Multiplication – 22 – CSCE 513 Fall 2017
Performance: Matrix Multiplication – 23 – CSCE 513 Fall 2017
Performance: Matrix Multiplication – 24 – CSCE 513 Fall 2017
Performance: Matrix Multiplication – 25 – CSCE 513 Fall 2017
Performance: Matrix Multiplication – 26 – CSCE 513 Fall 2017
Performance: Matrix Multiplication – 27 – CSCE 513 Fall 2017
Performance: Matrix Multiplication – 28 – CSCE 513 Fall 2017