Execution time Execution Time processorrelated IC x CPI
- Slides: 22
Execution time • Execution Time (processor-related) = IC x CPI x T IC = instruction count CPI = average number of system clock periods to execute an instruction T = clock period
Review
CS 501 Advanced Computer Architecture Lecture 03 Dr. Noor Muhammad Sheikh
Example Consider two SRC programs having three types of instructions given as follows Number of. . data transfer instructions control instructions ALSU Instructions Program 1 Program 2 2 1 5 1 Compare both the programs for the following parameters 1. Instruction count 2. Speed of execution
Example contd. . 1. 2. Instruction count IC. IC for program 1= 2+2+2=6 IC for program 2= 1+5+1=7 For execution time we can use the following SRC specifications. Instruction Type CPI ET = IC x CPI x T ET 1= (2 x 2)+(2 x 3)+(2 x 4) Control 2 = 18 ALSU 3 ET 2 =(5 x 2)+(1 x 3)+(1 x 4) Data Transfer 4 =17 Note: Since both programs are executing on the same machine, the T factor can be ignored while calculating ET.
Problem: Consider the following SRC code segments for implementing the operation a=b+5 c. Find which one is more efficient in terms of instruction count and execution time. Program 1: Multiplication by using repeated addition in a for loop org 0 a: . dw 1 b: . dw 1 c: . dw 1. org 80 la r 5, 5 lar r 6, mpy lar r 7, next ld r 2, b ld r 3, c la r 4, 0 mpy: brzr r 7, r 5 ; load value of loop ; load address of mpy ; load address of next ; load contents of b ; load contents of c ; load 0 in r 4 ; jump to next after 5 iterations add r 4, r 3 ; r 4 contains r 4+c addi r 5, -1 ; decrement index br r 6 ; loop again next: add r 4, r 2 ; r 4 contains sum of b and 5 c st r 4, a ; store at address a stop
Problem: Consider the following two SRC code segments for implementing the operation a=b+5 c. Find which one is more efficient in terms of instruction count and execution time. Program 2: Multiplication using subroutine call. org 0 a: . dw 1 b: . dw 1 c: . dw 1. org 80 lar r 1, mpy ld r 2, b la r 3, 5 ld r 4, c brl r 5, r 1 add r 2, r 7 st r 2, a ; load address of mpy in r 1 ; load contents of b in r 2 ; load index in r 3 ; load contents of c in r 4 ; r 5 contains PC ; r 2 contains sum b+5 c stop mpy: la r 7, 0 lar r 8, again: brzr r 5, r 3 add r 7, r 4 addi r 3, -1 br r 8 ; r 7 contains zero ; r 8 contain again address ; exit loop when index is 0 ; r 7 contains r 7+c ; decrement index
Solution The instructions in both programs can be divided into 3 types and the respective count of each type is Number of. . Program 1 Data transfer instructions 7 7 Control instructions 3 4 ALSU instructions 3 3 IC for program 1 = 7 + 3= 13 IC for program 2 = 7 + 4 + 3= 14 Program 2
Solution contd. . For execution time, consider the following SRC specifications. Instruction Type CPI ET = IC x CPI x T Control 2 ET 1= (7 x 4)+(3 x 2)+(3 x 3) ALSU 3 = 43 T Data Transfer 4 ET 2= (7 x 4)+(4 x 2)+(3 x 3) = 45 T Conclusion: Program 1 runs faster than program 2 as obvious from the execution time of both.
MIPS • Millions of Instructions Per Second = IC / (ET x 106) • Capability of different instructions varies from machine to machine, eg. RISC machines have simpler instructions, so the same job will require more instructions • Was popular when the VAX 11/780 was treated as a reference – late 70 s and early 80 s
MIPS as a performance metric • MIPS is inversely proportional to execution time, ET= IC / (MIPS x 106 )
Example Consider a machine having a 100 MHz clock and three instruction types with following Instruction Type CPI parameters. Control 2 Now suppose that two ALSU 3 different compilers generate Data Transfer 4 code for the same program. The instruction count for each is given as follows IC in millions Code from compiler 1 Code from compiler 2 Control 5 10 ALSU 1 1 Data Transfer 1 1
Compare the two codes according to MIPS and according to execution time. Solution: First we find the CPI for both code sequences Since CPI = clock cycles for each type of instruction / IC CPI 1= (5 x 2 + 1 x 3 + 1 x 4)/ 7 = 2. 43 CPI 2= (10 x 2 +1 x 3 + 1 x 4)/12 = 2. 25 As MIPS= Clock Rate/ (CPI x 106 ) MIPS 1= 100 x 106 / (2. 43 x 106) = 41. 15 MIPS 2=100 x 106 / (2. 25 x 106) = 44. 44 Hence the code generated by compiler 2 has higher MIPS Rating.
Compare the two codes according to MIPS and according to execution time. Solution: First we find the CPI for both code sequences Since CPI = clock cycles for each type of instruction / IC CPI 1= (5 x 2 + 1 x 3 + 1 x 4)/ 7 = 2. 43 CPI 2= (10 x 2 +1 x 3 + 1 x 4)/12 = 2. 25 As MIPS= Clock Rate/ (CPI x 106 ) MIPS 1= 100 x 106 / (2. 43 x 106) As MIPS = IC / (ET x 106) = 41. 15 MIPS= (IC x clock rate)/ ( IC x CPI x 106) MIPS 2=100 x 106 / (2. 25 x 106) = Clock rate/(CPI x 106) = 44. 44 Hence the code generated by compiler 2 has higher MIPS Rating.
Solution contd. . Since ET = IC / (MIPS x 106) ET 1= (7 x 106) / (41. 15 x 106) = 0. 17 seconds ET 2= (12 x 106) / ( 44. 44 x 106) = 0. 27 seconds Hence code sequence 1 is much more efficient in terms of execution time.
MFLOPS • Millions of FLoating point Operations Per Second • Using FP operations makes more sense to some compared to using just any instructions • Results vary from FP op to FP op • Better compared to MIPS because of two reasons:
2 reasons 1. 2. FP ops are complex, and therefore, provide a better picture of the hardware capabilities on which they are run Overheads (get operands, store results, etc. ) are effectively lumped with the FP ops they support
Dhrystones *** • Dhrystone is a general “integer performance” benchmark test originally developed by Reinhold Weicker in 1984. • Small program; less than 100 HLL statements • Compiles to about 1 to 1. 5 Kb of code *** The name is a play on the word Whetstone
Disadvantages of using Whetstones and Dhrystones Both Whetstones and Dhrystones are now considered obsolete because of the following reasons. § Small, fit in cache § Obsolete instruction mix § Prone to compiler tricks § Difficult to reproduce results § Uncontrolled source code
SPEC • System Performance Evaluation Cooperative • (SPEC) was founded in October, 1988, by Apollo, Hewlett-Packard, MIPS Computer Systems and SUN Microsystems • Latest version is SPEC CPU 2000
SPEC • The standard SPEC benchmark suite includes: § A compiler § A Boolean minimization program § A spreadsheet program § A number of other programs that stress arithmetic processing speed • It uses a simple metric, elapsed time, to measure performance of competing machines • Machine independent code is used for fair comparison
Advantages • • It provides for ease of publication. Each benchmark carries the same weight. SPECratio is dimensionless. It is not unduly influenced by long running programs. • It is relatively immune to performance variation on individual benchmarks. • It provides a consistent and fair metric.
- Regime tt
- Cpi santa lucia
- Three reasons you should use the supportive stance
- What is supportive stance
- Cpi during the great depression
- Cpi meaning
- Pengenalan marketing interior
- Real gdp formula without deflator
- How to calculate average inflation rate using cpi
- Cpi formula
- Crisis development behavior levels
- Verbal escalation continuum
- Cpi cycles per instruction
- What is the global cpi for each implementation
- Per instruction
- Pagbabago ng presyo
- Cpi ap macro
- Cpi meaning
- Structural unemployment example
- Cpi calculo
- Cpi chemical process industry
- Lei de amdahl
- Cpi evaluation examples