ECE 472 Computer Architecture Patrick Chiang TA KangMin

  • Slides: 64
Download presentation
ECE 472 Computer Architecture Patrick Chiang TA: Kang-Min Hu Department of Electrical Engineering Oregon

ECE 472 Computer Architecture Patrick Chiang TA: Kang-Min Hu Department of Electrical Engineering Oregon State University http: //eecs. oregonstate. edu/~pchiang EE 472 – Spring 2007 Lecture 1 - 1 P. Chiang, with Slide Help from C. Kozyrakis (Stanford)

Is this class for you? • This class will not be easy – My

Is this class for you? • This class will not be easy – My first quarter of teaching computer architecture at Oregon State – Assumes good mastery of basic assembly language programming – What is the class makeup? • ECE 1/2 • CS 1/2 – This is “ECE 472”, and emphasizes the hardware side of Comp. Arch. • There is CS 472 in Spring 2008 quarter • Class Breakdown – 5 Homeworks: 10% – 1 Midterm: 20% – 1 Project: 30% – 1 Final: 40% • Average grade: around B/B+, with some flexibility EE 472 – Fall 2007 Lecture 1 - 2 P. Chiang with slides from C. Kozyrakis (Stanford)

Today: What’s the big picture? • Syllabus: Given this Thursday • Start with the

Today: What’s the big picture? • Syllabus: Given this Thursday • Start with the C-code • Do the assembly language • FIRST: How to evaluate whether a computer is “fast”, or “good”? – Execution Time (time to run process(s)) – Power – Cost – Flexibility (complexity, programmability) EE 472 – Fall 2007 Lecture 1 - 3 P. Chiang with slides from C. Kozyrakis (Stanford)

Link I/O Chan Applications ISA API What do Computer Architects Do? Interfaces IR Regs

Link I/O Chan Applications ISA API What do Computer Architects Do? Interfaces IR Regs Technology Machine Organization ECE 471: Digital VLSI Computer Architect Software Requirements Measurement & Analysis The science/art of constructing efficient systems for computing tasks EE 472 – Fall 2007 Lecture 1 - 4 P. Chiang with slides from C. Kozyrakis (Stanford)

What is Computer Architecture? • Understanding every level of the complete system: – –

What is Computer Architecture? • Understanding every level of the complete system: – – Software Compiler Computer Architecture VLSI digital circuit design • For SOC, even analog/mixed-signal design – Devices • For a engineer, you must understand “depth” and “breadth” – Everything is related – Must understand every level of the problem to make the right “choices” • • Cannot just black-box and say: “Not my problem. Someone else will solve it. ” Choice of where you want to go next depends on understanding changes along the entire vertical structure – How is the technology changing? Are there fundamental shifts? – i. e. multi-core, parallel processing • Execution Time = ? EE 472 – Fall 2007 Lecture 1 - 5 P. Chiang with slides from C. Kozyrakis (Stanford)

Write Some C Code for Me • C code • What does the complier

Write Some C Code for Me • C code • What does the complier do? – Assembly language EE 472 – Fall 2007 Lecture 1 - 6 P. Chiang with slides from C. Kozyrakis (Stanford)

Now that we have assembly code, how do we evaluate performance? • Execution time

Now that we have assembly code, how do we evaluate performance? • Execution time = • Is execution time the only metric for performance? • What about power? • What about cost? • What about usability/programmability? EE 472 – Fall 2007 Lecture 1 - 7 P. Chiang with slides from C. Kozyrakis (Stanford)

Notice one thing about your C Code: Application Specific • Where are you running

Notice one thing about your C Code: Application Specific • Where are you running this code? – Laptop – Desktop – Cellphone – Google Server Farm – Digital Signal Processor • Each application has completely different fundamentals and constraints EE 472 – Fall 2007 Lecture 1 - 8 P. Chiang with slides from C. Kozyrakis (Stanford)

Do a DSP Calculation now- • Write C-code for DSP – i. e. Polygon

Do a DSP Calculation now- • Write C-code for DSP – i. e. Polygon Rendering for X-box Halo 3 – MP 3 Decode • Write assembly code for this: EE 472 – Fall 2007 Lecture 1 - 9 P. Chiang with slides from C. Kozyrakis (Stanford)

Do a Transaction Processing Code Now- • Google query--? EE 472 – Fall 2007

Do a Transaction Processing Code Now- • Google query--? EE 472 – Fall 2007 Lecture 1 - 10 P. Chiang with slides from C. Kozyrakis (Stanford)

Processor-based Digital Systems • Systems with a programmable, general-purpose processor – Advantages ? ?

Processor-based Digital Systems • Systems with a programmable, general-purpose processor – Advantages ? ? • Computers are the canonical example – PCs, laptops, workstations, … • However, most processors are embedded or in servers – Game consoles, PDAs, cell phones, … – Printers, car electronics system, … – Web servers, database servers, … EE 472 – Fall 2007 Lecture 1 - 11 P. Chiang with slides from C. Kozyrakis (Stanford)

FUTURE: Why are we going here--? EE 472 – Fall 2007 Lecture 1 -

FUTURE: Why are we going here--? EE 472 – Fall 2007 Lecture 1 - 12 P. Chiang with slides from C. Kozyrakis (Stanford)

Overall System Architecture • Multiple interacting layers – Term “architecture” used with all of

Overall System Architecture • Multiple interacting layers – Term “architecture” used with all of them Application Libraries Operating System • This class focuses on Drivers VM SW Scheduler – Hardware architecture • Memory, interconnect, IO • Clusters Processor • Reliability & low power systems VM HW System Bus Controller Main Graphics Memory HW IO Bus(es) Controller IO Net – Hardware-software interaction • Programming for performance • OS support • Cluster programming • Virtual machines & security EE 472 – Fall 2007 Lecture 1 - 13 P. Chiang with slides from C. Kozyrakis (Stanford)

Application: Constraints & Opportunities • Applications drive machine ‘balance’ – Scientific computations • Floating-point

Application: Constraints & Opportunities • Applications drive machine ‘balance’ – Scientific computations • Floating-point performance • Main memory bandwidth – Transaction/web processing • ? ? – Multimedia processing • ? ? – Embedded control • ? ? Architecture concepts typically exploit application behavior EE 472 – Fall 2007 Lecture 1 - 14 P. Chiang with slides from C. Kozyrakis (Stanford)

Applications Change over Time • Data-sets & memory requirements larger – Cache & memory

Applications Change over Time • Data-sets & memory requirements larger – Cache & memory architecture become more critical • Standalone networked – IO integration & system software become more critical • Single task multiple tasks – Parallel architectures become critical • Limited IO requirements rich IO requirements – 60 s: tapes & punch cards – 70 s: character oriented displays – 80 s: video displays, audio, hard disks – 90 s: 3 D graphics; networking, high-quality audio – 00 s: real-time video, immersion, … EE 472 – Fall 2007 Lecture 1 - 15 P. Chiang with slides from C. Kozyrakis (Stanford)

Application Properties to Exploit in Computer Design • Locality in memory/IO references – Programs

Application Properties to Exploit in Computer Design • Locality in memory/IO references – Programs work on subset of instructions/data at any point in time – Both spatial and temporal locality • Parallelism – – – Data-level (DLP): same operation on every element of a data sequence Instruction-level (ILP): independent instructions within sequential program Thread-level (TLP): parallel tasks within one program Multi-programming: independent programs Pipelining • Predictability – Control-flow direction, memory references, data values EE 472 – Fall 2007 Lecture 1 - 16 P. Chiang with slides from C. Kozyrakis (Stanford)

Technology Trends & Constraints: Yearly Improvement • Integrated circuits: logic – 60% more devices

Technology Trends & Constraints: Yearly Improvement • Integrated circuits: logic – 60% more devices per chip 1992 – 15% faster devices – Long wires don’t improve • 1995 Integrated circuits: DRAM – 60% more devices per chip 1998 – 7% reduction in latency – 14% increase in bandwidth • Magnetic Disks – 60% to 100% increase in density • IO/networking – Little improvement in latency – Large improvements in bandwidth through fast/wide signaling EE 472 – Fall 2007 Lecture 1 - 17 2001 64 x more devices since 1992 4 x faster devices P. Chiang with slides from C. Kozyrakis (Stanford)

Changes in Technology & Applications lead to Changes in Architecture • • 1970 s

Changes in Technology & Applications lead to Changes in Architecture • • 1970 s – Multi-chip CPUs – 1 M - 64 M transistors, 64 b CPUs – Semiconductor memory very expensive – Complex control to exploit instructionlevel parallelism – Complex instruction sets (good code density) – Deep pipelines – Microcoded control • 1990 s – Multi-level caches • 1980 s 2000 s – 100 M - 5 B transistors – 5 K – 500 K transistors – On-chip memory possible – Slow wires, power consumption, design, complexity, memory latency, IO bottlenecks, … – Simple, hard-wired control – Multiprocessors & parallel systems – Simple instruction sets – Support & programming for parallelism? – Single-chip, pipelined CPUs – Small on-chip caches – <<your Ph. D. thesis goes here>> Keeps computer architecture interesting and challenging EE 472 – Fall 2007 Lecture 1 - 18 P. Chiang with slides from C. Kozyrakis (Stanford)

Rules of Thumb in Data Engineering by J. Gray and Prashant Shenoy Storage 1.

Rules of Thumb in Data Engineering by J. Gray and Prashant Shenoy Storage 1. Moore’s Law: Things get 4 x denser every three years. 2. You need an extra bit of addressing every 18 months. 3. Storage capacities increase 100 x per decade. 4. Storage device throughput increases 10 x per decade. 5. Disk data cools 10 x per decade. 6. Disk page sizes increase 5 x per decade. 7. Nearline. Tape: Online. Disk: RAM storage cost ratios are approximately 1: 3: 300. 8. In ten years RAM will cost what disk costs today. 9. A person can administer a million dollars of disk storage – Disks are replacing tapes as backup devices. – On random workloads, disk mirroring is preferable to RAID 5 parity because it spends disk space (which is plentiful) to save disk accesses (which are precious). EE 472 – Fall 2007 Lecture 1 - 19 P. Chiang with slides from C. Kozyrakis (Stanford)

Metrics of Efficiency • Desktop computing ($500 - $3 K) – Metrics: ? ?

Metrics of Efficiency • Desktop computing ($500 - $3 K) – Metrics: ? ? – Prominent processors: Intel Pentium, AMD Athlon, Power. PC G 5 • Server computing ($3 K - $1 M) – Metrics: ? ? – Prominent processors: IBM Power 5, Sun Ultra. Sparc, AMD Opteron • Embedded computing ($10 - $500) – Metrics: ? ? – Prominent processors: ARM, MIPS, Motorola 68 K, many others Diversity in requirements leads to diversity in architectures EE 472 – Fall 2007 Lecture 1 - 20 P. Chiang with slides from C. Kozyrakis (Stanford)

Performance Metrics Plane DC to Paris Speed Passengers Throughput (pmph) Boeing 747 6. 5

Performance Metrics Plane DC to Paris Speed Passengers Throughput (pmph) Boeing 747 6. 5 hours 610 mph 470 286, 700 BAD/Sud Concorde 3 hours 1350 mph 132 178, 200 • Latency or execution time or response time – Wall-clock time to complete a task – Important if all we have to run is a single or a time-critical time to run • Bandwidth or throughput or execution rate – Number of tasks completed per unit of time • Bandwidth = total amount of work / total execution time – Metric is independent of exact number of tasks executed – Important when we have many tasks to run • What about Power? What about Cost? What about Reliability? EE 472 – Fall 2007 Lecture 1 - 21 P. Chiang with slides from C. Kozyrakis (Stanford)

Examples • Latency metric: program execution time in seconds – Your system architecture can

Examples • Latency metric: program execution time in seconds – Your system architecture can affect all of them • CPI: memory latency, IO latency, … • CCT: cache organization, … • IC: OS overhead, … EE 472 – Fall 2007 Lecture 1 - 22 P. Chiang with slides from C. Kozyrakis (Stanford)

A is Faster than B? • Given the CPUtime for machines A and B,

A is Faster than B? • Given the CPUtime for machines A and B, A is X times faster than B means: • Example, CPUtime. A=3. 4 sec & CPUtime. B=5. 3 sec then – A is 5. 3/3. 4=1. 55 times faster than B or 55% faster • If you start with bandwidth metrics of performance, use inverse ratio EE 472 – Fall 2007 Lecture 1 - 23 P. Chiang with slides from C. Kozyrakis (Stanford)

Speedup and Amdahl’s Law • Speedup = CPUtimeold / CPUtimenew • Given an optimization

Speedup and Amdahl’s Law • Speedup = CPUtimeold / CPUtimenew • Given an optimization x that accelerates fraction fx of program by a factor of Sx, how much is the overall speedup? • Lesson’s from Amdhal’s law – Make common cases fast: as fx→ 1, speedup→Sx – But don’t overoptimize common case: as Sx→ , speedup→ 1 / (1 -fx) • Speedup is limited by the fraction of the code that can be accelerated • Uncommon case will eventually become the common one EE 472 – Fall 2007 Lecture 1 - 24 P. Chiang with slides from C. Kozyrakis (Stanford)

Amdahl’s Law Example • If Sx=100, what is the overall speedup as a function

Amdahl’s Law Example • If Sx=100, what is the overall speedup as a function of fx? EE 472 – Fall 2007 Lecture 1 - 25 P. Chiang with slides from C. Kozyrakis (Stanford)

Historical Trend for Computer Performance Integer Performance 55% faster per year EE 472 –

Historical Trend for Computer Performance Integer Performance 55% faster per year EE 472 – Fall 2007 Lecture 1 - 26 P. Chiang with slides from C. Kozyrakis (Stanford)

To Put it Into Perspective • 1982 -2000: computers getting 55% faster per year

To Put it Into Perspective • 1982 -2000: computers getting 55% faster per year – Total of 4, 000 x – Significant cost improvements as well • What if other areas showed similar improvement rates? – Cars: 176, 000 mph or 64, 000 miles/gal – Airplanes: LA to NY in 5. 5 sec (MACH 3200) – Wheat: 320, 000 bushels per acre EE 472 – Fall 2007 Lecture 1 - 27 P. Chiang with slides from C. Kozyrakis (Stanford)

Digital System Cost • Cost is a very important design constraint – Most digital

Digital System Cost • Cost is a very important design constraint – Most digital systems are consumer electronic produces • Cost distribution for $1 K PC – Processor board: 37% • Processor, memory, … – IO devices: 37% • Hard disk, DVD, monitor, keyboard, … – Software: 20% – Cabinet: 6% • Integrated circuits represent significant part of the system cost – Processor, memory, hard disk controller, graphics chips, networking chip EE 472 – Fall 2007 Lecture 1 - 28 P. Chiang with slides from C. Kozyrakis (Stanford)

Cost of Integrated Circuits EE 472 – Fall 2007 Lecture 1 - 29 P.

Cost of Integrated Circuits EE 472 – Fall 2007 Lecture 1 - 29 P. Chiang with slides from C. Kozyrakis (Stanford)

Chip Cost is a Function of Size Chip cost increases roughly with die area

Chip Cost is a Function of Size Chip cost increases roughly with die area 4 EE 472 – Fall 2007 Lecture 1 - 30 P. Chiang with slides from C. Kozyrakis (Stanford)

Cost – Performance Tradeoff • The trade-off – Chip cost is primarily a function

Cost – Performance Tradeoff • The trade-off – Chip cost is primarily a function of die area 4 – But bigger dies provide more resources for higher performance • The goal of a good architect – Find the knee of the performance-cost curve OR – Get maximum performance for a fixed cost target EE 472 – Fall 2007 Lecture 1 - 31 P. Chiang with slides from C. Kozyrakis (Stanford)

Other Cost Contributors • Testing cost – Cost/die = (cost/hour x test time) /

Other Cost Contributors • Testing cost – Cost/die = (cost/hour x test time) / yield – Could be $10 -$20 or more for complex chips • IC Packaging – Depends on die size, number of pins, and power dissipation • Cost of cooling system – <2 W no heat-sink, <10 W no fan, >100+W liquid/spray cooling • And most of all, do not forget VOLUME – Cost of a modern IC fabrication facility: >$2 B – Cost of a set of masks for a wafer: $0. 5 M - $1 M – Design NRE cost: often ~$10 M – Need volume to amortize all this cost… EE 472 – Fall 2007 Lecture 1 - 32 P. Chiang with slides from C. Kozyrakis (Stanford)

Cost Vs Price • Price is really what your customer cares about • Price

Cost Vs Price • Price is really what your customer cares about • Price components for a system vendor – Component cost: buying the parts • 47% of list price for $1 K PC – Direct costs: labor, warranties, dealing with scrap, … • 10% of list price for $1 K PC – Gross margin: company overhead • R&D, marketing, sales, buildings, maintenance , taxes, … • 19% of list price for $1 K PC – Average discount: plan for volume discounts… • 25% of list price for $1 K PC • As computers become commodity components, price matters a lot! EE 472 – Fall 2007 Lecture 1 - 33 P. Chiang with slides from C. Kozyrakis (Stanford)

Historical Trend for Processor Price EE 472 – Fall 2007 Lecture 1 - 34

Historical Trend for Processor Price EE 472 – Fall 2007 Lecture 1 - 34 P. Chiang with slides from C. Kozyrakis (Stanford)

Summary • Computer architecture: – Design of efficient systems given the requirements of applications

Summary • Computer architecture: – Design of efficient systems given the requirements of applications and the capabilities/constraints of technology – Need to look a few years ahead with both applications & technology • Applications – Look for locality, parallelism, and predictability • Technology – Dealing with latency, power, and reliability are the upcoming challenges • Performance & cost – Two important efficiency metrics for most systems – Latency Vs. bandwidth performance metrics – Cost Vs. price EE 472 – Fall 2007 Lecture 1 - 35 P. Chiang with slides from C. Kozyrakis (Stanford)

EE 472 – Fall 2007 Lecture 1 - 36 P. Chiang with slides from

EE 472 – Fall 2007 Lecture 1 - 36 P. Chiang with slides from C. Kozyrakis (Stanford)

EE 472 – Fall 2007 Lecture 1 - 37 P. Chiang with slides from

EE 472 – Fall 2007 Lecture 1 - 37 P. Chiang with slides from C. Kozyrakis (Stanford)

EE 472 – Fall 2007 Lecture 1 - 38 P. Chiang with slides from

EE 472 – Fall 2007 Lecture 1 - 38 P. Chiang with slides from C. Kozyrakis (Stanford)

EE 472 – Fall 2007 Lecture 1 - 39 P. Chiang with slides from

EE 472 – Fall 2007 Lecture 1 - 39 P. Chiang with slides from C. Kozyrakis (Stanford)

EE 472 – Fall 2007 Lecture 1 - 40 P. Chiang with slides from

EE 472 – Fall 2007 Lecture 1 - 40 P. Chiang with slides from C. Kozyrakis (Stanford)

EE 472 – Fall 2007 Lecture 1 - 41 P. Chiang with slides from

EE 472 – Fall 2007 Lecture 1 - 41 P. Chiang with slides from C. Kozyrakis (Stanford)

EE 472 – Fall 2007 Lecture 1 - 42 P. Chiang with slides from

EE 472 – Fall 2007 Lecture 1 - 42 P. Chiang with slides from C. Kozyrakis (Stanford)

Multiple Processors on Single Chip • Two processors on single-chip • Two chips(w/ two

Multiple Processors on Single Chip • Two processors on single-chip • Two chips(w/ two processors) in single package • 16 – 64 – 256 processors on single die – Stream Processors – Sun Niagara • http: //www. ece. ucdavis. edu/~ocin 06/talks/ho. pdf EE 472 – Fall 2007 Lecture 1 - 43 P. Chiang with slides from C. Kozyrakis (Stanford)

EE 472 – Fall 2007 Lecture 1 - 44 P. Chiang with slides from

EE 472 – Fall 2007 Lecture 1 - 44 P. Chiang with slides from C. Kozyrakis (Stanford)

EE 472 – Fall 2007 Lecture 1 - 45 P. Chiang with slides from

EE 472 – Fall 2007 Lecture 1 - 45 P. Chiang with slides from C. Kozyrakis (Stanford)

EE 472 – Fall 2007 Lecture 1 - 46 P. Chiang with slides from

EE 472 – Fall 2007 Lecture 1 - 46 P. Chiang with slides from C. Kozyrakis (Stanford)

What does Moore’s Law buy you? EE 472 – Fall 2007 Lecture 1 -

What does Moore’s Law buy you? EE 472 – Fall 2007 Lecture 1 - 47 P. Chiang with slides from C. Kozyrakis (Stanford)

EE 472 – Fall 2007 Lecture 1 - 48 P. Chiang with slides from

EE 472 – Fall 2007 Lecture 1 - 48 P. Chiang with slides from C. Kozyrakis (Stanford)

EE 472 – Fall 2007 Lecture 1 - 49 P. Chiang with slides from

EE 472 – Fall 2007 Lecture 1 - 49 P. Chiang with slides from C. Kozyrakis (Stanford)

EE 472 – Fall 2007 Lecture 1 - 50 P. Chiang with slides from

EE 472 – Fall 2007 Lecture 1 - 50 P. Chiang with slides from C. Kozyrakis (Stanford)

EE 472 – Fall 2007 Lecture 1 - 51 P. Chiang with slides from

EE 472 – Fall 2007 Lecture 1 - 51 P. Chiang with slides from C. Kozyrakis (Stanford)

EE 472 – Fall 2007 Lecture 1 - 52 P. Chiang with slides from

EE 472 – Fall 2007 Lecture 1 - 52 P. Chiang with slides from C. Kozyrakis (Stanford)

EE 472 – Fall 2007 Lecture 1 - 53 P. Chiang with slides from

EE 472 – Fall 2007 Lecture 1 - 53 P. Chiang with slides from C. Kozyrakis (Stanford)

EE 472 – Fall 2007 Lecture 1 - 54 P. Chiang with slides from

EE 472 – Fall 2007 Lecture 1 - 54 P. Chiang with slides from C. Kozyrakis (Stanford)

EE 472 – Fall 2007 Lecture 1 - 55 P. Chiang with slides from

EE 472 – Fall 2007 Lecture 1 - 55 P. Chiang with slides from C. Kozyrakis (Stanford)

EE 472 – Fall 2007 Lecture 1 - 56 P. Chiang with slides from

EE 472 – Fall 2007 Lecture 1 - 56 P. Chiang with slides from C. Kozyrakis (Stanford)

EE 472 – Fall 2007 Lecture 1 - 57 P. Chiang with slides from

EE 472 – Fall 2007 Lecture 1 - 57 P. Chiang with slides from C. Kozyrakis (Stanford)

EE 472 – Fall 2007 Lecture 1 - 58 P. Chiang with slides from

EE 472 – Fall 2007 Lecture 1 - 58 P. Chiang with slides from C. Kozyrakis (Stanford)

EE 472 – Fall 2007 Lecture 1 - 59 P. Chiang with slides from

EE 472 – Fall 2007 Lecture 1 - 59 P. Chiang with slides from C. Kozyrakis (Stanford)

EE 472 – Fall 2007 Lecture 1 - 60 P. Chiang with slides from

EE 472 – Fall 2007 Lecture 1 - 60 P. Chiang with slides from C. Kozyrakis (Stanford)

EE 472 – Fall 2007 Lecture 1 - 61 P. Chiang with slides from

EE 472 – Fall 2007 Lecture 1 - 61 P. Chiang with slides from C. Kozyrakis (Stanford)

EE 472 – Fall 2007 Lecture 1 - 62 P. Chiang with slides from

EE 472 – Fall 2007 Lecture 1 - 62 P. Chiang with slides from C. Kozyrakis (Stanford)

EE 472 – Fall 2007 Lecture 1 - 63 P. Chiang with slides from

EE 472 – Fall 2007 Lecture 1 - 63 P. Chiang with slides from C. Kozyrakis (Stanford)

EE 472 – Fall 2007 Lecture 1 - 64 P. Chiang with slides from

EE 472 – Fall 2007 Lecture 1 - 64 P. Chiang with slides from C. Kozyrakis (Stanford)