CS 161 Design and Architecture of Computer Systems

  • Slides: 56
Download presentation
CS 161 – Design and Architecture of Computer Systems Technology Trends and Performance Evaluation

CS 161 – Design and Architecture of Computer Systems Technology Trends and Performance Evaluation

Quick recap What is computer architecture? The science and art of designing, selecting, and

Quick recap What is computer architecture? The science and art of designing, selecting, and interconnecting hardware components and designing the hardware/software interface to create a computing system that meets functional, performance, energy consumption, cost, and other specific goals. 2

AN ENABLER FOR COMP ARCH Technology trends MOORE’S LAW 3

AN ENABLER FOR COMP ARCH Technology trends MOORE’S LAW 3

What is Moore’s Law? The ____ Will ____ Every ____ Performance Number of Transistors

What is Moore’s Law? The ____ Will ____ Every ____ Performance Number of Transistors Double Triple 1 year 18 months 2 years 4

5

5

An Enabler: Moore’s Law Moore, “Cramming more components onto integrated circuits, ” Electronics Magazine,

An Enabler: Moore’s Law Moore, “Cramming more components onto integrated circuits, ” Electronics Magazine, 1965. Component counts double every other year Image source: Intel 6

Number of transistors on an integrated circuit doubles ~ every two years Image source:

Number of transistors on an integrated circuit doubles ~ every two years Image source: Wikipedia 7

# of Transistors == Performance? 8

# of Transistors == Performance? 8

Constrained by power, instructionlevel parallelism, memory latency 9

Constrained by power, instructionlevel parallelism, memory latency 9

Technology trends MEMORY SCALING 10

Technology trends MEMORY SCALING 10

11

11

12

12

Technology trends THE POWER WALL 13

Technology trends THE POWER WALL 13

14

14

15

15

FUNDAMENTAL CONCEPTS 16

FUNDAMENTAL CONCEPTS 16

What is A Computer? Three key components Computation Communication Storage (memory) 17

What is A Computer? Three key components Computation Communication Storage (memory) 17

The Von Neumann Model/Architecture Also called stored program computer (instructions in memory). Two key

The Von Neumann Model/Architecture Also called stored program computer (instructions in memory). Two key properties: Stored program Instructions stored in a linear memory array Memory is unified between instructions and data The interpretation of a stored value depends on the control signals When is a value interpreted as an instruction? Sequential instruction processing One instruction processed (fetched, executed, and completed) at a time Program counter (instruction pointer) identifies the current instr. Program counter is advanced sequentially except for control transfer instructions 18

The Von Neumann Model (of a Computer) Q: Is this the only way that

The Von Neumann Model (of a Computer) Q: Is this the only way that a computer can operate? A: No. Qualified Answer: But, it has been the dominant way i. e. , the dominant paradigm for computing for N decades 19

The Dataflow Model (of a Computer) Von Neumann model: An instruction is fetched and

The Dataflow Model (of a Computer) Von Neumann model: An instruction is fetched and executed in control flow order As specified by the instruction pointer Sequential unless explicit control flow instruction Dataflow model: An instruction is fetched and executed in data flow order i. e. , when its operands are ready i. e. , there is no instruction pointer Instruction ordering specified by data flow dependence Each instruction specifies “who” should receive the result An instruction can “fire” whenever all operands are received Potentially many instructions can execute at the same time Inherently more parallel 20

Von Neumann vs Dataflow n Consider a Von Neumann program q What is the

Von Neumann vs Dataflow n Consider a Von Neumann program q What is the significance of the program order? a v <= a + b; w <= b * 2; x <= v - w y <= v + w z <= x * y Sequential b + *2 - + * Dataflow n Which model is more natural to you as a programmer? z 21

WHAT IS PERFORMANCE? 22

WHAT IS PERFORMANCE? 22

Depends on what you’re measuring… Algorithm Determines number of operations executed Programming language, compiler,

Depends on what you’re measuring… Algorithm Determines number of operations executed Programming language, compiler, architecture Determine number of machine instructions executed per operation Processor and memory system Determine how fast instructions are executed I/O system (including OS) Determines how fast I/O operations are executed 23

Performance analogy Car Top Speed (mph) Acceleration (0 -60) Torque Horsepower MPG Processor Frequency

Performance analogy Car Top Speed (mph) Acceleration (0 -60) Torque Horsepower MPG Processor Frequency Response Time / Latency Throughput / Bandwidth Power 24

Response Time and Throughput Response time How long it takes to do a task

Response Time and Throughput Response time How long it takes to do a task Throughput Total work done per unit time e. g. , tasks/transactions/… per hour How are response time and throughput affected by Replacing the processor with a faster version? Adding more processors? We’ll focus on response time for now… aka Execution time 25

Performance EXECUTION TIME 26

Performance EXECUTION TIME 26

Measuring Execution Time Elapsed time Total response time, including all aspects Processing, I/O, OS

Measuring Execution Time Elapsed time Total response time, including all aspects Processing, I/O, OS overhead, idle time Two common measurements: Wall Clock Time CPU Time 27

Measuring Execution Time Wall Clock Time Real time to complete job (seconds) CPU time

Measuring Execution Time Wall Clock Time Real time to complete job (seconds) CPU time Time spent processing a given job Discounts I/O time, other jobs’ shares Comprises user CPU time and system CPU time $ time make > /dev/null 2>&1 real 1 m 14. 115 s user 0 m 57. 853 s sys 0 m 10. 853 s 28

Relative Performance “X is n time faster than Y” Execution time. Y / Execution

Relative Performance “X is n time faster than Y” Execution time. Y / Execution time. X = n n Example: time taken to run a program n n n 10 s on A, 15 s on B Execution Time. B / Execution Time. A = 15 s / 10 s = 1. 5 So A is 1. 5 times faster than B 29

Relative Performance “X is n time faster than Y” Execution time. Y / Execution

Relative Performance “X is n time faster than Y” Execution time. Y / Execution time. X = n n Example: time taken to run a program n n n 60 s on A, 30 s on B Execution Time. B / Execution Time. A= 30 s / 60 s = 0. 5 So A is 0. 5 times faster than B or B is 2 times faster than A 30

Performance is defined as: Performance = 1 / Execution Time More generally: Speedup =

Performance is defined as: Performance = 1 / Execution Time More generally: Speedup = Performance. New / Performance. Old = Execution. Old / Execution. New 31

Performance CLOCK CYCLES 32

Performance CLOCK CYCLES 32

CPU Clocking Operation of digital hardware governed by a constant-rate clock Clock period Clock

CPU Clocking Operation of digital hardware governed by a constant-rate clock Clock period Clock (cycles) Data transfer and computation Update state n Clock period: duration of a clock cycle n n e. g. , 250 ps = 0. 25 ns = 250× 10– 12 s Clock frequency (rate): cycles per second n e. g. , 4. 0 GHz = 4000 MHz = 4. 0× 109 Hz 33

CPU Time Performance improved by Reducing number of clock cycles Increasing clock rate Hardware

CPU Time Performance improved by Reducing number of clock cycles Increasing clock rate Hardware designer must often trade off clock rate against cycle count 34

CPU Time Example Computer A: 2 GHz clock, 10 s CPU time Designing Computer

CPU Time Example Computer A: 2 GHz clock, 10 s CPU time Designing Computer B Aim for 6 s CPU time Can do faster clock, but causes 1. 2 × clock cycles How fast must Computer B clock be? 35

CPU Time Example Computer A: 2 GHz clock, 10 s CPU time Designing Computer

CPU Time Example Computer A: 2 GHz clock, 10 s CPU time Designing Computer B Aim for 6 s CPU time Can do faster clock, but causes 1. 2 × clock cycles How fast must Computer B clock be? 36

Performance CPI 37

Performance CPI 37

Instruction Count and CPI Instruction Count for a program Determined by program, ISA and

Instruction Count and CPI Instruction Count for a program Determined by program, ISA and compiler Average cycles per instruction Determined by CPU hardware If different instructions have different CPI Average CPI affected by instruction mix 38

CPI Example Computer A: Cycle Time = 250 ps, CPI = 2. 0 Computer

CPI Example Computer A: Cycle Time = 250 ps, CPI = 2. 0 Computer B: Cycle Time = 500 ps, CPI = 1. 2 Same ISA Which is faster, and by how much? 39

CPI Example Computer A: Cycle Time = 250 ps, CPI = 2. 0 Computer

CPI Example Computer A: Cycle Time = 250 ps, CPI = 2. 0 Computer B: Cycle Time = 500 ps, CPI = 1. 2 Same ISA Which is faster, and by how much? A is faster… …by this much 40

CPI in More Detail If different instruction types take different numbers of cycles n

CPI in More Detail If different instruction types take different numbers of cycles n Weighted average CPI Relative frequency 41

CPI Example Alternative compiled code sequences using instructions in type INT, FP, MEM Type

CPI Example Alternative compiled code sequences using instructions in type INT, FP, MEM Type INT FP MEM CPI for type 1 2 3 IC in Program 1 2 IC in Program 2 4 1 1 42

CPI Example Alternative compiled code sequences using instructions in type INT, FP, MEM Type

CPI Example Alternative compiled code sequences using instructions in type INT, FP, MEM Type n INT FP MEM CPI for type 1 2 3 IC in Program 1 2 IC in Program 2 4 1 1 Program 1: IC = 5 n n Clock Cycles = 2× 1 + 1× 2 + 2× 3 = 10 Avg. CPI = 10/5 = 2. 0 43

CPI Example Alternative compiled code sequences using instructions in type INT, FP, MEM Type

CPI Example Alternative compiled code sequences using instructions in type INT, FP, MEM Type n INT FP MEM CPI for type 1 2 3 IC in Program 1 2 IC in Program 2 4 1 1 Program 1: IC = 5 n n Clock Cycles = 2× 1 + 1× 2 + 2× 3 = 10 Avg. CPI = 10/5 = 2. 0 n Program 2: IC = 6 n n Clock Cycles = 4× 1 + 1× 2 + 1× 3 =9 Avg. CPI = 9/6 = 1. 5 44

Performance Summary The BIG Picture Performance depends on Algorithm: affects IC, possibly CPI Programming

Performance Summary The BIG Picture Performance depends on Algorithm: affects IC, possibly CPI Programming language: affects IC, CPI Compiler: affects IC, CPI Instruction set architecture: affects IC, CPI, Tc 45

POWER 46

POWER 46

Power Trends In CMOS IC technology × 40 5 V → 1 V ×

Power Trends In CMOS IC technology × 40 5 V → 1 V × 1000 47

Reducing Power Suppose a new CPU has 85% of capacitive load of old CPU

Reducing Power Suppose a new CPU has 85% of capacitive load of old CPU 15% voltage and 15% frequency reduction How much power does the new CPU use compared to the old CPU? 48

Reducing Power Suppose a new CPU has 85% of capacitive load of old CPU

Reducing Power Suppose a new CPU has 85% of capacitive load of old CPU 15% voltage and 15% frequency reduction 49

The Power Wall n The power wall n n n We can’t reduce voltage

The Power Wall n The power wall n n n We can’t reduce voltage further We can’t remove more heat How else can we improve performance? 50

Multiprocessors Multicore microprocessors More than one processor per chip Requires explicitly parallel programming Compare

Multiprocessors Multicore microprocessors More than one processor per chip Requires explicitly parallel programming Compare with instruction level parallelism Hardware executes multiple instructions at once Hidden from the programmer Hard to do Programming for performance Load balancing Optimizing communication and synchronization 51

Pitfall: Amdahl’s Law Improving an aspect of a computer and expecting a proportional improvement

Pitfall: Amdahl’s Law Improving an aspect of a computer and expecting a proportional improvement in overall performance n Example: multiply accounts for 80 s/100 s n How much improvement in multiply performance to get 5× overall? n n Can’t be done! Corollary: make the common case fast 59

Amdahl’s Law Speedup is due to enhancement(E) Let F be the fraction where enhancement

Amdahl’s Law Speedup is due to enhancement(E) Let F be the fraction where enhancement is applied Also, called parallel fraction and (1 -F) as the serial fraction 60

Amdahl’s Law: Example Ø When we are about to speed up only a part

Amdahl’s Law: Example Ø When we are about to speed up only a part of the program Ø What is the overall speedup on the entire program? Example: We have two possible enhancements, what is the overall speedup on the entire program for each one? Enhancement 1: Speedup of 20 on 10% of execution time Enhancement 2: Speedup of 1. 6 on 80% of execution time 1. 105 1. 43 61

Pitfall: MIPS as a Performance Metric MIPS: Millions of Instructions Per Second Doesn’t account

Pitfall: MIPS as a Performance Metric MIPS: Millions of Instructions Per Second Doesn’t account for Differences in ISAs between computers Differences in complexity between instructions n CPI varies between programs on a given CPU 63

Concluding Remarks Cost/performance is improving Due to underlying technology development Instruction set architecture The

Concluding Remarks Cost/performance is improving Due to underlying technology development Instruction set architecture The hardware/software interface Execution time: the best performance measure Power is a limiting factor Use parallelism to improve performance 64