CS 704 Advanced Computer Architecture Lecture 2 Quantitative












































- Slides: 44
CS 704 Advanced Computer Architecture Lecture 2 Quantitative Principles Detailed discussion on the computer Performance – the key to quantitative design and analysis MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 1
Today’s Topics Recap of Lecture 1 Growth in processor performance Price-performance design CPU performance metrics CPU benchmarks suites Summary MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 2
Recap of Lecture 1 Computer Systems: Architecture refers to those attributes of a computer visible to a programmer or compiler writer; e. g. instruction set, addressing techniques, I/O mechanisms etc. Organization refers to how the features of a computer are implemented? i. e. , control signals are generated using the principles of finite state machine (FSM) or microprogramming MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 3
Recap of Lecture 1 Computer Development: • Academically, modern computer developments have their infancy in 1944 -49 • Commercially, the first machine was built by Eckert. Mauchly Computer Corporation in 1949 • Technological developments, from vacuum tubes to VLSI circuits, dynamic memory and network technology gave birth to four different generations of computers. MAC/VU-Advanced • Microprocessor and PCs were introduced in 1971 Computer Architecture Lecture 2 - Performance 4
Recap of Lecture 1 Design Perspectives: Processor – ISA, ILP and Cache Memory hierarchy: Multilevel cache and Virtual memory input/output and storages multiprocessor and networks MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 5
Recap of Lecture 1 Computer Design Cycle: • The computer design and development has been under the influence of -Technology -performance and -cost; the decisive factors for rapid changes in the computer development have been the performance enhancements, price reduction and functional improvements. MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 6
Growth in Processor Performance Insert Slide 9 here • The supercomputers and mainframes, costing millions of dollars and occupying excessively large space, prevailing form of computing in 1960 s were replaced with relatively low-cost and smaller-sized minicomputers in 1970 s • In 1980 s, very low-cost microprocessor-based desktop computing machines in the form of personal computer (PC) and workstation were introduced. MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 7
Growth in Processor Performance Insert Slide 9 here • The growth in processor performance since mid-1980 s has been substantially high than in earlier years • Prior to the mid-1980 s microprocessor performance growth was averaged about 35% per year • By 2001 the growth raised to about 1. 58 per year MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 8
Growth in Processor Performance relative to MIPS 1600 Intel P-III 1400 1200 HP 9000 1000 ■ DEC Alpha ■ 800 600 400 200 0 ■ ■ ■ IBM HP 9000 ■ DEC MIPS Power 1 ■ Alpha ■ R 2000 ■ 1984 1986 1988 1990 1992 1994 1996 1998 2000 Year MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 9
Price-Performance Design Technology improvements are used to lower the cost and increase performance. The relationship between cost and price is complex one The cost is the total amount spends to produce a product The price is the amount for which a finished good is sold. MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 10
Price-Performance Design The cost passes through different stages before it becomes price. A small change in cost may have a big impact on price MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 11
Price vs. Cost …. . Insert Slide 14 here • Manufacturing Costs: Total amount spent to produce a component - Component Cost: Cost at which the components are available to the designer. - It ranges from 40% to 50% of the list price of the product. - Recurring costs: Labor, purchasing scrap, warranty – 4% - 16 % of list price - Gross margin – Non-recurring cost: R&D, marketing, sales, equipment, rental, maintenance, financing cost, pre-tax profits, taxes MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 12
Price vs. Cost …. . Insert Slide 14 here • List Price: • Amount for which the finished good is sold; • it includes Average Discount of 15% to 35% of the as volume discounts and/or retailer markup MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 13
Price vs. Cost …. . Price-Performance Design Cont’d MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 14
Cost-effective IC Design: Price-Performance Design Yield: Percentage of manufactured components surviving testing Volume: increases manufacturing hence decreases the list price and improves the purchasing efficiency Feature Size: the minimum size of a transistor or wire in either x or y direction MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 15
Cost-effective IC Design: Price-Performance Design Reduction in feature size from 10 microns in 1971 and 0. 18 in 2001 has resulted in: - Quadratic rise in transistor count - Linear increase in performance 4 -bit to 64 -bit microprocessor Desktops have replaced time-sharing machines MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 16
Cost of Integrated Circuits Manufacturing Stages: The Integrated circuit manufacturing passes through many stage: Wafer growth and testing Wafer chopping it into dies Packaging the dies to chips Testing a chip. MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 17
Cost of Integrated Circuits Insert Slide 19 here Die: is the square area of the wafer containing the integrated circuit See that while fitting dies on the wafer the small wafer area around the periphery goes waist Cost of a die: The cost of a die is determined from cost of a wafer; the number of dies fit on a wafer and the percentage of dies that work, i. e. , the yield of the die. MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 18
Dies of Integrated Circuits MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 19
Cost of Integrated Circuits Insert Slide 21 here • The cost of integrated circuit can be determined as ratio of the total cost; i. e. , the sum of the costs of die, cost of testing die, cost of packaging and the cost of final testing a chip; to the final test yield. MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 20
Calculating Integrated Circuits Cost of IC = die cost + die testing cost + packaging cost + final testing cost final test yield MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 21
Cost of Integrated Circuits Insert Slide 23 here • The cost of die is the ratio of the cost of the wafer to the product of the dies per wafer and die yield MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 22
Calculating Integrated Circuits Cost of IC = die cost + die testing cost + packaging cost + final testing cost final test yield Cost of die = Cost of wafer dies per wafer x die yield MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 23
Cost of Integrated Circuits Insert Slide 25 here • The number of dies per wafer is determined by the dividing the wafer area (minus the waist wafer area near the round periphery) by the die area MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 24
Calculating Integrated Circuits Cost of IC = die cost + die testing cost + packaging cost + final testing cost final test yield Cost of die = Cost of wafer dies per wafer x die yield Dies per wafer = π (wafer diameter/2)2 die area MAC/VU-Advanced Computer Architecture Lecture 2 - Performance π (wafer diameter) √ 2 x die area 25
Example Calculating Number of Dies For die of 0. 7 Cm on a side, find the number of dies per wafer of 30 cm diameter Answer: [Wafer area / Die Area] - Wafer Waist area = π (30/2)2 / 0. 49 - π (30) / √ (2 x 0. 49) = 1347 dies MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 26
Example For die of 0. 7 Cm on a side, find the number of dies per wafer of 30 cm diameter Answer: [Wafer area / Die Area] - Wafer Waist area = π (30/2)2 / 0. 49 - π (30) / √ (2 x 0. 49) = 1347 dies MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 27
Calculating Die Yield Insert Slide 29 here • Die yield is the fraction or percentage of good dies on a wafer number • Wafer yield accounts for completely bad wafers so need not be tested • Wafer yield corresponds to on defect density by α which depends on number of masking levels • good estimate for CMOS is 4. 0 and MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 28
Calculating Integrated Circuits Costs Die yield = Wafer yield x (1 + defects per unit area x die area) -α α Example: The yield of a die, 0. 7 cm on a side, with defect density of 0. 6/cm 2 = (1+[0. 6 x 0. 47]/4. 0) -4 = 0. 75 MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 29
Price-Performance Design • Time to run the task: • Execution time, response time, latency • Throughput or bandwidth: • Tasks per day, hour, week, sec, ns … MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 30
Price-Performance Design Insert Slid 32 • Example: • To carry 2400 passengers from Lahore to Islamabad – • Train completes the task in 4: 00 hrs while airplane completes the same task in 6. 00 hrs. ; • . e. , 66. 67% of the task in same time – throughput and hence performance of train is 50% more than airplane MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 31
Price-Performance Design: Example Vehicle Time Lah to Isb Passenge rs/ trip Time to complete job Execution time /person Cost / person Cost-performance Train 4. 0 hours 2400 4. 0 hours 6. 0 sec 300 Rs. 300 x 6=1, 800 Rs-sec/person Plane 45 min. 300 45 x 8 min. = 6. 0 Hr 9. 0 sec. 3000 Rs. 3000 x 9=27, 000 Rs-sec/person Plane 10 time faster but takes 50% more time to complete the job; i. e. , lesser throughput – thus performance of train is MAC/VU-Advanced 50%better than plane Computer Architecture Lecture 2 - Performance The time person and cost person of train is less than that of plane Thus the cost-performance of plane is 1: 15 32
Metrics of Performance Insert Slide 33 Answers per month Operations per second Application Programming Language Compiler MIPS: Millions of Instructions per second MFLOPS: millions of FP operations per sec. Instruction Set Architecture Datapath Control Function Units Transistors Pins/ Wire – I/O MAC/VU-Advanced Computer Architecture Megabytes per second Cycles per second (clock rate) Lecture 2 - Performance 33
Aspects of CPU Performance CPU time = Seconds = Instructions x Program Inst Count Program √ Compiler √ Inst. Set. √ Organization CPI x Seconds Instruction Cycle Clock Rate √ √ Technology MAC/VU-Advanced Computer Architecture Cycles √ √ Lecture 2 - Performance 34
Cycles Per Instruction • Cycles per Instruction – CPI = CPU Clock Cycles for program / Instruction Count = (CPU Time * Clock Rate) / Instruction Count • Instruction Frequency – For instruction mix, the relative frequency of occurrence of different types of instructions is given as: FICi = IC of ith instruction / Total Instruction count • Average Cycles per Instruction – n n CPI = [1/Instruction count] ∑ ICi x CPIi = ∑ FICi x CPIi i=1 MAC/VU-Advanced Computer Architecture Lecture 2 - Performance i=1 35
Example: Calculating average CPI Base Machine (Reg / Reg) Op ALU Load Store Branch Freq 50% 20% 10% 20% Cycles 1 2 2 2 CPI (i) 0. 5 0. 4 0. 2 0. 4 (% Time) (33%) (27%) (13%) (27%) 1. 5 MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 36
Cycles Per Instruction n Arithmetic mean time: 1/n ∑ Time i i=1 Weighted arithmetic mean time: n ∑ w i x Time i i=1 Geometric mean time: n _________ / n / π Execution time ratio i √ I =1 MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 37
Summary: Price-Performance Design Computer cost: The total cost of manufacturing a computer is distributed among different parts of the system such as the cost of cabinet, processor board and I/O devices. Performance Time is the key measurement of performance Comparing performance of two designs: the ratio, η = Execution time Y / Execution time X determines how much lower execution time machine Y takes as compared to X ; as performance is inverse of execution time, i. e. , η = Performance X / Performance Y MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 38
Instruction Execution Rate - MIPS specify performance inversely to execution time; For a given program: MIPS = (instruction count) / (execution time x 106) MIPS could not be calculated from the instruction mix Relative MIPS for a machine ‘M’ is defined based on some reference machine as: RMIPS = [Performance M / Performance reference] x MIPS reference or = [Time reference / Time M] x MIPS reference MFLOPS defined for Floating-point-intensive programs as millions of floating-point operations per second MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 39
CPU Benchmark Suites Performance Comparison: the execution time of the same workload running on two machines without running the actual programs Benchmarks: the programs specifically chosen to measure the performance. Five levels of programs: in the decreasing order of accuracy – Real Applications – Modified Applications – Kernels – Toy benchmarks – Synthetic benchmarks MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 40
SPEC: System Performance Evaluation Cooperative First Round 1989: 10 programs yielding a single number – SPECmarks Second Round 1992: SPECInt 92 (6 integer programs) and SPECfp 92 (14 floating point programs) Third Round 1995 – new set of programs: SPECint 95 (8 integer programs) and SPECfp 95 (10 floating point) – “benchmarks useful for 3 years” – Single flag setting for all programs: SPECint_base 95, SPECfp_base 95 MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 41
Summary: Designing and performance comparison • Designing to Last through Trends Capacity • Speed Logic 2 x in 3 years DRAM 4 x in 3 years 2 x in 10 years Disk 4 x in 3 years 2 x in 10 years 6 yrs to graduate => 16 X CPU speed, DRAM/Disk size • Time to run the task – Execution time, response time, latency • Tasks per day, hour, week, sec, ns, … – Throughput, bandwidth • “X is n times faster than Y” means Ex. Time(Y) Performance(X) = Ex. Time(X) MAC/VU-Advanced Computer Architecture Performance(Y) Lecture 2 - Performance 42
Summary ……. . Cont’d CPI Law: CPU time = Seconds Program = Instructions x Program Cycles x Seconds Instruction Cycle Execution time is the REAL measure of computer performance! Good products created when have: – Good benchmarks, good ways to summarize performance Die Cost goes roughly with die area 4 MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 43
Summary …. . Cont’d “For better or worse, benchmarks shape a field” Good products created when have: – Good benchmarks – Good ways to summarize performance Given sales is a function in part of performance relative to competition, investment in improving product as reported by performance summary If benchmarks/summary inadequate, then choose between improving product for real programs vs. improving product to get more sales; Sales almost always wins! Execution time is the measure of computer performance! MAC/VU-Advanced Computer Architecture Lecture 2 - Performance 44