CSCE 513 Computer Architecture Measuring Program Performance Matrix

  • Slides: 28
Download presentation
CSCE 513 Computer Architecture Measuring Program Performance Matrix Multiply Topics § Linux times §

CSCE 513 Computer Architecture Measuring Program Performance Matrix Multiply Topics § Linux times § Matrix multiplication Readings: November 20, 2017 Performance: Matrix Multiplication – 1 – CSCE 513 Fall 2017

Times in Unix 1. File times § ls –l gives modification date (#seconds since

Times in Unix 1. File times § ls –l gives modification date (#seconds since Jan 1, 1970) 2. Process times struct timeval { long tv_sec; long tv_usec; /* seconds */ /* microseconds */ }; Performance: Matrix Multiplication – 2 – CSCE 513 Fall 2017

The time command cocsce-l 1 d 39 -11> time gcc pthread 1. c -l

The time command cocsce-l 1 d 39 -11> time gcc pthread 1. c -l pthread -o pthread 1 real 0 m 0. 077 s user 0 m 0. 052 s sys 0 m 0. 012 s cocsce-l 1 d 39 -11> § Note real == wall clock time, and § real-time >= user-time + system-time Performance: Matrix Multiplication – 3 – CSCE 513 Fall 2017

cocsce-l 1 d 39 -11> gcc pthread 1. c -l pthread -o pthread 1

cocsce-l 1 d 39 -11> gcc pthread 1. c -l pthread -o pthread 1 cocsce-l 1 d 39 -11>. /pthread 1 In main: creating thread 0 In main: creating thread 1 In main: creating thread 2 Hello World! It's me, thread 0! In main: creating thread 3 Hello World! It's me, thread 1! Hello World! It's me, thread 2! In main: creating thread 4 Hello World! It's me, thread 3! Hello World! It's me, thread 4! Performance: Matrix Multiplication – 4 – CSCE 513 Fall 2017

TIME(7) Linux Programmer's Manual TIME(7) NAME time - overview of time and timers DESCRIPTION

TIME(7) Linux Programmer's Manual TIME(7) NAME time - overview of time and timers DESCRIPTION Real time and process time Real time is defined as time measured from some fixed point, either from a standard point in the past (see the description of the Epoch and calendar time below), or from some point (e. g. , the start) in the life of a process (elapsed time). Process time is defined as the amount of CPU time used by a process. This is some‐ times divided into user and system components. User CPU time is the time spent executing code in user mode. System CPU time is the time spent by the kernel exe‐ cuting in system mode on behalf of the process (e. g. , executing system calls). The time(1) command can be used to determine the amount of CPU time consumed during the execution of a program. A program can determine the amount of CPU time it has con‐ sumed using times(2), getrusage(2), or clock(3). The hardware clock … Performance: Matrix Multiplication – 5 – CSCE 513 Fall 2017

struct rusage { struct timeval ru_utime; struct timeval ru_stime; long ru_maxrss; long ru_idrss; long

struct rusage { struct timeval ru_utime; struct timeval ru_stime; long ru_maxrss; long ru_idrss; long ru_isrss; long ru_minflt; long ru_majflt; long ru_nswap; long ru_inblock; long ru_oublock; long ru_msgsnd; long ru_msgrcv; long ru_nsignals; long ru_nvcsw; long ru_nivcsw; }; Performance: Matrix Multiplication – 6 – Getrusage /* user CPU time used */ /* system CPU time used */ /* maximum resident set size */ /* integral shared memory size */ /* integral unshared data size */ /* integral unshared stack size */ /* page reclaims (soft page faults) */ /* page faults (hard page faults) */ /* swaps */ /* block input operations */ /* block output operations */ /* IPC messages sent */ /* IPC messages received */ /* signals received */ /* voluntary context switches */ /* involuntary context switches */ CSCE 513 Fall 2017

struct timeval { long tv_sec; long tv_usec; /* seconds */ /* microseconds */ };

struct timeval { long tv_sec; long tv_usec; /* seconds */ /* microseconds */ }; Performance: Matrix Multiplication – 7 – CSCE 513 Fall 2017

Matmult. c - example Performance: Matrix Multiplication – 8 – CSCE 513 Fall 2017

Matmult. c - example Performance: Matrix Multiplication – 8 – CSCE 513 Fall 2017

3 Nested Loops to compute product for(i=0; i<rows; ++i){ for(j=0; j<cols 2; ++j){ for(k=0;

3 Nested Loops to compute product for(i=0; i<rows; ++i){ for(j=0; j<cols 2; ++j){ for(k=0; k<cols; ++k){ C[i][j] = C[i][j] + A[i][k] * B[k][j]; } } } Note rows*cols 2 *cols multiplications and additions If for square matrices rows=cols 2=cols= n there are n 3 multiplications Performance: Matrix Multiplication – 9 – CSCE 513 Fall 2017

Headers #include <stdio. h> #include <stdlib. h> #include <math. h> #include <assert. h> #include

Headers #include <stdio. h> #include <stdlib. h> #include <math. h> #include <assert. h> #include <time. h> #include <sys/resource. h> double **allocmatrix(int, int ); int freematrix(double **, int); void nerror(char *error_text); double seconds(int nmode); double rand_gen(double fmin, double fmax); void Set. Seed(int flag); Performance: Matrix Multiplication – 10 – CSCE 513 Fall 2017

int main(int argc, char** argv) { int l, rows, cols 2, cols; int i,

int main(int argc, char** argv) { int l, rows, cols 2, cols; int i, j, k; double temp; double **A, **B, **C; double tstart, tend; /* ************************** // * The following allows matrix parameters to be * // * entered on the command line to take advantage * // * of dynamically allocated memory. You may modify * // * or remove it as you wish. * // **************************/ if (argc != 4) { nerror("Usage: <executable> <rows-value> <cols 2 -value>"); } rows = atoi(argv[1]); /* A is a rows x cols matrix */ cols = atoi(argv[2]); /* B is a cols x cols 2 matrix */ cols 2 = atoi(argv[3]); /* So C=A*B is a rows x cols 2 matrix */ Main: args Performance: Matrix Multiplication – 11 – CSCE 513 Fall 2017

Initializing the arrays A=(double **) allocmatrix(rows, cols); /* ***************************** // * Initialize matrix elements

Initializing the arrays A=(double **) allocmatrix(rows, cols); /* ***************************** // * Initialize matrix elements so compiler does not // * optimize out * * // *****************************/ for(i=0; i<rows; i++) { for(j=0; j<cols; j++) { A[i][j] = rand_gen(1. 0, 2. 0); /* if(i == j) A[i][j]=1. 0; else A[i][j] = 0. 0; */ } } Performance: Matrix Multiplication – 12 – CSCE 513 Fall 2017

Rand_gen /* generate a random double between fmin and fmax */ double rand_gen(double fmin,

Rand_gen /* generate a random double between fmin and fmax */ double rand_gen(double fmin, double fmax) { return fmin + (fmax - fmin) * drand 48(); } /* The drand 48() and erand 48() functions return nonnegative double-precision floating-point values uniformly distributed over the interval [0. 0, 1. 0). */ Performance: Matrix Multiplication – 13 – CSCE 513 Fall 2017

Seconds- a function to combine all the times into one double /* Returns the

Seconds- a function to combine all the times into one double /* Returns the total cpu time used in seconds. */ double seconds(int nmode){ struct rusage buf; double temp; getrusage( nmode, &buf ); /* Get system time and user time in micro-seconds. */ temp = (double)buf. ru_utime. tv_sec*1. 0 e 6 + (double)buf. ru_utime. tv_usec + (double)buf. ru_stime. tv_sec*1. 0 e 6 + (double)buf. ru_stime. tv_usec; /* Return the sum of system and user time in SECONDS. */ return( temp*1. 0 e-6 ); } Performance: Matrix Multiplication – 14 – CSCE 513 Fall 2017

Timing a section of code tstart = seconds(RUSAGE_SELF); for(i=0; i<rows; ++i){ for(j=0; j<cols 2;

Timing a section of code tstart = seconds(RUSAGE_SELF); for(i=0; i<rows; ++i){ for(j=0; j<cols 2; ++j){ for(k=0; k<cols; ++k){ C[i][j] = C[i][j] + A[i][k] * B[k][j]; } } } tend = seconds(RUSAGE_SELF); Performance: Matrix Multiplication – 15 – CSCE 513 Fall 2017

Timing a section of code – kij variation tstart = seconds(RUSAGE_SELF); for(k=0; k<cols; ++k){

Timing a section of code – kij variation tstart = seconds(RUSAGE_SELF); for(k=0; k<cols; ++k){ for(i=0; i<rows; ++i){ for(j=0; j<cols 2; ++j){ C[i][j] = C[i][j] + A[i][k] * B[k][j]; } } } tend = seconds(RUSAGE_SELF); Performance: Matrix Multiplication – 16 – CSCE 513 Fall 2017

Timing a section of code – kji variation tstart = seconds(RUSAGE_SELF); for(k=0; k<cols; ++k){

Timing a section of code – kji variation tstart = seconds(RUSAGE_SELF); for(k=0; k<cols; ++k){ for(j=0; j<cols 2; ++j){ for(i=0; i<rows; ++i){ C[i][j] = C[i][j] + A[i][k] * B[k][j]; } } } tend = seconds(RUSAGE_SELF); Performance: Matrix Multiplication – 17 – CSCE 513 Fall 2017

Performance variations cocsce-l 1 d 39 -11> gcc kij. c -o kij cocsce-l 1

Performance variations cocsce-l 1 d 39 -11> gcc kij. c -o kij cocsce-l 1 d 39 -11> gcc kji. c -o kji cocsce-l 1 d 39 -11>. /matmul 1000 The total CPU time is: 9. 212000 seconds cocsce-l 1 d 39 -11>. /kij 1000 The total CPU time is: 3. 712000 seconds cocsce-l 1 d 39 -11>. /kji 1000 The total CPU time is: 14. 264000 seconds Performance: Matrix Multiplication – 18 – CSCE 513 Fall 2017

Address Trace &x – address of operator § &x – address of x §

Address Trace &x – address of operator § &x – address of x § if((mytracefile = fopen(“trace”, “w”)) == NULL) fprintf(stderr, “Could not open file %s!n”, “trace”; § fprintf(mytracefile, “address of x is %pn”, &x); Performance: Matrix Multiplication – 19 – CSCE 513 Fall 2017

for(i=0; i<rows; ++i){ for(j=0; j<cols 2; ++j){ for(k=0; k<cols; ++k){ C[i][j] = C[i][j] +

for(i=0; i<rows; ++i){ for(j=0; j<cols 2; ++j){ for(k=0; k<cols; ++k){ C[i][j] = C[i][j] + A[i][k] * B[k][j]; } } } Performance: Matrix Multiplication – 20 – CSCE 513 Fall 2017

Performance: Matrix Multiplication – 21 – CSCE 513 Fall 2017

Performance: Matrix Multiplication – 21 – CSCE 513 Fall 2017

Performance: Matrix Multiplication – 22 – CSCE 513 Fall 2017

Performance: Matrix Multiplication – 22 – CSCE 513 Fall 2017

Performance: Matrix Multiplication – 23 – CSCE 513 Fall 2017

Performance: Matrix Multiplication – 23 – CSCE 513 Fall 2017

Performance: Matrix Multiplication – 24 – CSCE 513 Fall 2017

Performance: Matrix Multiplication – 24 – CSCE 513 Fall 2017

Performance: Matrix Multiplication – 25 – CSCE 513 Fall 2017

Performance: Matrix Multiplication – 25 – CSCE 513 Fall 2017

Performance: Matrix Multiplication – 26 – CSCE 513 Fall 2017

Performance: Matrix Multiplication – 26 – CSCE 513 Fall 2017

Performance: Matrix Multiplication – 27 – CSCE 513 Fall 2017

Performance: Matrix Multiplication – 27 – CSCE 513 Fall 2017

Performance: Matrix Multiplication – 28 – CSCE 513 Fall 2017

Performance: Matrix Multiplication – 28 – CSCE 513 Fall 2017