CMPUT 429CMPE 382 Winter 2001 Topic 2 Technology

  • Slides: 28
Download presentation
CMPUT 429/CMPE 382 Winter 2001 Topic 2: Technology Trend and Cost/Performance (Adapted from David

CMPUT 429/CMPE 382 Winter 2001 Topic 2: Technology Trend and Cost/Performance (Adapted from David A. Patterson’s CS 252 lecture slides at Berkeley) 1/17/01 CS 252/Patterson Lec 1. 1

Technology Trends: Microprocessor Capacity “Graduation Window” Moore’s Law Alpha 21264: 15 million Pentium Pro:

Technology Trends: Microprocessor Capacity “Graduation Window” Moore’s Law Alpha 21264: 15 million Pentium Pro: 5. 5 million Power. PC 620: 6. 9 million Alpha 21164: 9. 3 million Sparc Ultra: 5. 2 million CMOS improvements: • Die size: 2 X every 3 yrs • Line width: halve / 7 yrs 1/17/01 CS 252/Patterson Lec 1. 2

Memory Capacity (Single Chip DRAM) 1/17/01 year size(Mb) cyc time 1980 0. 0625 250

Memory Capacity (Single Chip DRAM) 1/17/01 year size(Mb) cyc time 1980 0. 0625 250 ns 1983 0. 25 220 ns 1986 1 190 ns 1989 4 165 ns 1992 16 145 ns 1996 64 CS 252/Patterson 120 ns Lec 1. 3

Technology Trends (Summary) 1/17/01 Capacity Speed (latency) Logic 2 x in 3 years DRAM

Technology Trends (Summary) 1/17/01 Capacity Speed (latency) Logic 2 x in 3 years DRAM 4 x in 3 years 2 x in 10 years Disk 4 x in 3 years 2 x in 10 years CS 252/Patterson Lec 1. 4

Processor Performance Trends 1000 Supercomputers 100 Mainframes 10 Minicomputers Microprocessors 1 0. 1 1965

Processor Performance Trends 1000 Supercomputers 100 Mainframes 10 Minicomputers Microprocessors 1 0. 1 1965 1970 1975 1980 1985 1990 1995 2000 Year 1/17/01 CS 252/Patterson Lec 1. 5

Processor Performance (1. 35 X before, 1. 55 X now) 1. 54 X/yr 1/17/01

Processor Performance (1. 35 X before, 1. 55 X now) 1. 54 X/yr 1/17/01 CS 252/Patterson Lec 1. 6

Performance Trends (Summary) • Workstation performance (measured in Spec Marks) improves roughly 50% per

Performance Trends (Summary) • Workstation performance (measured in Spec Marks) improves roughly 50% per year (2 X every 18 months) • Improvement in cost performance estimated at 70% per year 1/17/01 CS 252/Patterson Lec 1. 7

Computer Architecture Topics Input/Output and Storage Disks, WORM, Tape Emerging Technologies Interleaving Bus protocols

Computer Architecture Topics Input/Output and Storage Disks, WORM, Tape Emerging Technologies Interleaving Bus protocols DRAM Memory Hierarchy VLSI Coherence, Bandwidth, Latency L 2 Cache L 1 Cache Instruction Set Architecture Addressing, Protection, Exception Handling Pipelining, Hazard Resolution, Superscalar, Reordering, Prediction, Speculation, Vector, DSP 1/17/01 RAID Pipelining and Instruction Level Parallelism CS 252/Patterson Lec 1. 8

Computer Architecture Topics P M P S M ° ° ° P M Interconnection

Computer Architecture Topics P M P S M ° ° ° P M Interconnection Network Processor-Memory-Switch Multiprocessors Networks and Interconnections 1/17/01 Shared Memory, Message Passing, Data Parallelism Network Interfaces Topologies, Routing, Bandwidth, Latency, Reliability CS 252/Patterson Lec 1. 9

Course Focus Technology Parallelism Applications Computer Architecture: • Instruction Set Design • Organization •

Course Focus Technology Parallelism Applications Computer Architecture: • Instruction Set Design • Organization • Hardware Operating Systems 1/17/01 Measurement & Evaluation Programming Languages Interface Design (ISA) History CS 252/Patterson Lec 1. 10

Measurement Tools • Benchmarks, Traces, Mixes • Hardware: Cost, delay, area, power estimation •

Measurement Tools • Benchmarks, Traces, Mixes • Hardware: Cost, delay, area, power estimation • Simulation (many levels) – ISA, RT, Gate, Circuit • Queueing Theory • Rules of Thumb • Fundamental “Laws”/Principles 1/17/01 CS 252/Patterson Lec 1. 11

Which is faster? Plane DC to Paris Speed Passengers Throughput (pmph) Boeing 747 6.

Which is faster? Plane DC to Paris Speed Passengers Throughput (pmph) Boeing 747 6. 5 hours 610 mph 470 286, 700 BAD/Sud Concodre 3 hours 1350 mph 132 178, 200 • Time to run the task (Ex. Time) – Execution time, response time, latency • Tasks per day, hour, week, sec, ns … (Performance) – Throughput, bandwidth 1/17/01 CS 252/Patterson Lec 1. 12

Definitions • Performance is in units of things per sec – bigger is better

Definitions • Performance is in units of things per sec – bigger is better • If we are primarily concerned with response time – performance(x) = 1 execution_time(x) " X is n times faster than Y" means Performance(X) n = Performance(Y) 1/17/01 Execution_time(Y) = Execution_time(X) CS 252/Patterson Lec 1. 13

Cycles Per Instruction IC = Instruction Count CPI = Clock Per Instruction 1/17/01 CS

Cycles Per Instruction IC = Instruction Count CPI = Clock Per Instruction 1/17/01 CS 252/Patterson Lec 1. 14

Cycles Per Instruction We may separate the contribution of each type of instruction to

Cycles Per Instruction We may separate the contribution of each type of instruction to the execution time defining: 1/17/01 CS 252/Patterson Lec 1. 15

Example: Calculating CPI Base Machine Op ALU Load Store Branch (Reg / Freq 50%

Example: Calculating CPI Base Machine Op ALU Load Store Branch (Reg / Freq 50% 20% 10% 20% Reg) Cycles 1 2 2 2 Typical Mix of instruction types in program 1/17/01 CPI(i). 5. 4. 2. 4 1. 5 (% Time) (33%) (27%) (13%) (27%) CS 252/Patterson Lec 1. 16

Aspects of CPU Performance (CPU Law) CPU time = Seconds = Instructions x Cycles

Aspects of CPU Performance (CPU Law) CPU time = Seconds = Instructions x Cycles x Seconds Program Instruction Cycle Inst Count CPI Clock Rate Program X Compiler X (X) Inst. Set. X X Organization Technology 1/17/01 X X X CS 252/Patterson Lec 1. 17

Amdahl's Law Speedup due to enhancement E: Suppose that enhancement E accelerates a fraction

Amdahl's Law Speedup due to enhancement E: Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected 1/17/01 CS 252/Patterson Lec 1. 18

Amdahl’s Law 1/17/01 CS 252/Patterson Lec 1. 19

Amdahl’s Law 1/17/01 CS 252/Patterson Lec 1. 19

Amdahl’s Law • Example: Floating point instructions improved to run 2 X; but only

Amdahl’s Law • Example: Floating point instructions improved to run 2 X; but only 10% of actual instructions are FP 1/17/01 CS 252/Patterson Lec 1. 20

Metrics of Performance Application Answers per month Operations per second Programming Language Compiler ISA

Metrics of Performance Application Answers per month Operations per second Programming Language Compiler ISA (millions) of Instructions per second: MIPS (millions) of (FP) operations per second: MFLOP/s Datapath Control Function Units Transistors Wires Pins 1/17/01 Megabytes per second Cycles per second (clock rate) CS 252/Patterson Lec 1. 21

SPEC: System Performance Evaluation Cooperative • First Round 1989 – 10 programs yielding a

SPEC: System Performance Evaluation Cooperative • First Round 1989 – 10 programs yielding a single number (“SPECmarks”) • Second Round 1992 – SPECInt 92 (6 integer programs) and SPECfp 92 (14 floating point programs) » Compiler Flags unlimited. March 93 of DEC 4000 Model 610: spice: unix. c: /def=(sysv, has_bcopy, ”bcopy(a, b, c)= memcpy(b, a, c)” wave 5: /ali=(all, dcom=nat)/ag=a/ur=4/ur=200 nasa 7: /norecu/ag=a/ur=4/ur 2=200/lc=blas • Third Round 1995 1/17/01 – new set of programs: SPECint 95 (8 integer programs) and SPECfp 95 (10 floating point) – “benchmarks useful for 3 years” – Single flag setting for all programs: SPECint_base 95, SPECfp_base 95 CS 252/Patterson Lec 1. 22

How to Summarize Performance • Arithmetic mean (weighted arithmetic mean) tracks execution time: �

How to Summarize Performance • Arithmetic mean (weighted arithmetic mean) tracks execution time: � (Ti)/n or � (Wi*Ti) • Harmonic mean (weighted harmonic mean) of rates (e. g. , MFLOPS) tracks execution time: n/� (1/Ri) or n/� (Wi/Ri) • Normalized execution time is handy for scaling performance (e. g. , X times faster than SPARCstation 10) • But do not take the arithmetic mean of normalized execution time, use the geometrici)^1/n) 1/17/01 CS 252/Patterson Lec 1. 23

Performance Evaluation • “For better or worse, benchmarks shape a field” • Good products

Performance Evaluation • “For better or worse, benchmarks shape a field” • Good products created when have: – Good benchmarks – Good ways to summarize performance • Given sales is a function in part of performance relative to competition, investment in improving product as reported by performance summary • If benchmarks/summary inadequate, then choose between improving product for real programs vs. improving product to get more sales; Sales almost always wins! • Execution time is the measure of computer performance! 1/17/01 CS 252/Patterson Lec 1. 24

Instruction Set Architecture (ISA) software instruction set hardware 1/17/01 CS 252/Patterson Lec 1. 25

Instruction Set Architecture (ISA) software instruction set hardware 1/17/01 CS 252/Patterson Lec 1. 25

Interface Design A good interface: • Lasts through many implementations (portability, compatability) • Is

Interface Design A good interface: • Lasts through many implementations (portability, compatability) • Is used in many differeny ways (generality) • Provides convenient functionality to higher levels • Permits an efficient implementation at lower levels use use 1/17/01 Interface imp 1 time imp 2 imp 3 CS 252/Patterson Lec 1. 26

Summary, #1 • Designing to Last through Trends Capacity • Speed Logic 2 x

Summary, #1 • Designing to Last through Trends Capacity • Speed Logic 2 x in 3 years DRAM 4 x in 3 years 2 x in 10 years Disk 4 x in 3 years 2 x in 10 years 6 yrs to graduate => 16 X CPU speed, DRAM/Disk size • Time to run the task – Execution time, response time, latency • Tasks per day, hour, week, sec, ns, … – Throughput, bandwidth • “X is n times faster than Y” means Ex. Time(Y) ----Ex. Time(X) 1/17/01 = Performance(X) -------Performance(Y) CS 252/Patterson Lec 1. 27

Summary, #2 • Amdahl’s Law: Speedupoverall = • CPI Law: CPU time Ex. Timeold

Summary, #2 • Amdahl’s Law: Speedupoverall = • CPI Law: CPU time Ex. Timeold Ex. Timenew 1 = (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced = Seconds = Instructions x Cycles x Seconds Program Instruction Cycle • Execution time is the REAL measure of computer performance! • Good products created when have: – Good benchmarks, good ways to summarize performance 1/17/01 • Die Cost goes roughly with die area 4 • Can PC industry support engineering/research investment? CS 252/Patterson Lec 1. 28