ECE 472 Computer Architecture Patrick Chiang TA KangMin
- Slides: 64
ECE 472 Computer Architecture Patrick Chiang TA: Kang-Min Hu Department of Electrical Engineering Oregon State University http: //eecs. oregonstate. edu/~pchiang EE 472 – Spring 2007 Lecture 1 - 1 P. Chiang, with Slide Help from C. Kozyrakis (Stanford)
Is this class for you? • This class will not be easy – My first quarter of teaching computer architecture at Oregon State – Assumes good mastery of basic assembly language programming – What is the class makeup? • ECE 1/2 • CS 1/2 – This is “ECE 472”, and emphasizes the hardware side of Comp. Arch. • There is CS 472 in Spring 2008 quarter • Class Breakdown – 5 Homeworks: 10% – 1 Midterm: 20% – 1 Project: 30% – 1 Final: 40% • Average grade: around B/B+, with some flexibility EE 472 – Fall 2007 Lecture 1 - 2 P. Chiang with slides from C. Kozyrakis (Stanford)
Today: What’s the big picture? • Syllabus: Given this Thursday • Start with the C-code • Do the assembly language • FIRST: How to evaluate whether a computer is “fast”, or “good”? – Execution Time (time to run process(s)) – Power – Cost – Flexibility (complexity, programmability) EE 472 – Fall 2007 Lecture 1 - 3 P. Chiang with slides from C. Kozyrakis (Stanford)
Link I/O Chan Applications ISA API What do Computer Architects Do? Interfaces IR Regs Technology Machine Organization ECE 471: Digital VLSI Computer Architect Software Requirements Measurement & Analysis The science/art of constructing efficient systems for computing tasks EE 472 – Fall 2007 Lecture 1 - 4 P. Chiang with slides from C. Kozyrakis (Stanford)
What is Computer Architecture? • Understanding every level of the complete system: – – Software Compiler Computer Architecture VLSI digital circuit design • For SOC, even analog/mixed-signal design – Devices • For a engineer, you must understand “depth” and “breadth” – Everything is related – Must understand every level of the problem to make the right “choices” • • Cannot just black-box and say: “Not my problem. Someone else will solve it. ” Choice of where you want to go next depends on understanding changes along the entire vertical structure – How is the technology changing? Are there fundamental shifts? – i. e. multi-core, parallel processing • Execution Time = ? EE 472 – Fall 2007 Lecture 1 - 5 P. Chiang with slides from C. Kozyrakis (Stanford)
Write Some C Code for Me • C code • What does the complier do? – Assembly language EE 472 – Fall 2007 Lecture 1 - 6 P. Chiang with slides from C. Kozyrakis (Stanford)
Now that we have assembly code, how do we evaluate performance? • Execution time = • Is execution time the only metric for performance? • What about power? • What about cost? • What about usability/programmability? EE 472 – Fall 2007 Lecture 1 - 7 P. Chiang with slides from C. Kozyrakis (Stanford)
Notice one thing about your C Code: Application Specific • Where are you running this code? – Laptop – Desktop – Cellphone – Google Server Farm – Digital Signal Processor • Each application has completely different fundamentals and constraints EE 472 – Fall 2007 Lecture 1 - 8 P. Chiang with slides from C. Kozyrakis (Stanford)
Do a DSP Calculation now- • Write C-code for DSP – i. e. Polygon Rendering for X-box Halo 3 – MP 3 Decode • Write assembly code for this: EE 472 – Fall 2007 Lecture 1 - 9 P. Chiang with slides from C. Kozyrakis (Stanford)
Do a Transaction Processing Code Now- • Google query--? EE 472 – Fall 2007 Lecture 1 - 10 P. Chiang with slides from C. Kozyrakis (Stanford)
Processor-based Digital Systems • Systems with a programmable, general-purpose processor – Advantages ? ? • Computers are the canonical example – PCs, laptops, workstations, … • However, most processors are embedded or in servers – Game consoles, PDAs, cell phones, … – Printers, car electronics system, … – Web servers, database servers, … EE 472 – Fall 2007 Lecture 1 - 11 P. Chiang with slides from C. Kozyrakis (Stanford)
FUTURE: Why are we going here--? EE 472 – Fall 2007 Lecture 1 - 12 P. Chiang with slides from C. Kozyrakis (Stanford)
Overall System Architecture • Multiple interacting layers – Term “architecture” used with all of them Application Libraries Operating System • This class focuses on Drivers VM SW Scheduler – Hardware architecture • Memory, interconnect, IO • Clusters Processor • Reliability & low power systems VM HW System Bus Controller Main Graphics Memory HW IO Bus(es) Controller IO Net – Hardware-software interaction • Programming for performance • OS support • Cluster programming • Virtual machines & security EE 472 – Fall 2007 Lecture 1 - 13 P. Chiang with slides from C. Kozyrakis (Stanford)
Application: Constraints & Opportunities • Applications drive machine ‘balance’ – Scientific computations • Floating-point performance • Main memory bandwidth – Transaction/web processing • ? ? – Multimedia processing • ? ? – Embedded control • ? ? Architecture concepts typically exploit application behavior EE 472 – Fall 2007 Lecture 1 - 14 P. Chiang with slides from C. Kozyrakis (Stanford)
Applications Change over Time • Data-sets & memory requirements larger – Cache & memory architecture become more critical • Standalone networked – IO integration & system software become more critical • Single task multiple tasks – Parallel architectures become critical • Limited IO requirements rich IO requirements – 60 s: tapes & punch cards – 70 s: character oriented displays – 80 s: video displays, audio, hard disks – 90 s: 3 D graphics; networking, high-quality audio – 00 s: real-time video, immersion, … EE 472 – Fall 2007 Lecture 1 - 15 P. Chiang with slides from C. Kozyrakis (Stanford)
Application Properties to Exploit in Computer Design • Locality in memory/IO references – Programs work on subset of instructions/data at any point in time – Both spatial and temporal locality • Parallelism – – – Data-level (DLP): same operation on every element of a data sequence Instruction-level (ILP): independent instructions within sequential program Thread-level (TLP): parallel tasks within one program Multi-programming: independent programs Pipelining • Predictability – Control-flow direction, memory references, data values EE 472 – Fall 2007 Lecture 1 - 16 P. Chiang with slides from C. Kozyrakis (Stanford)
Technology Trends & Constraints: Yearly Improvement • Integrated circuits: logic – 60% more devices per chip 1992 – 15% faster devices – Long wires don’t improve • 1995 Integrated circuits: DRAM – 60% more devices per chip 1998 – 7% reduction in latency – 14% increase in bandwidth • Magnetic Disks – 60% to 100% increase in density • IO/networking – Little improvement in latency – Large improvements in bandwidth through fast/wide signaling EE 472 – Fall 2007 Lecture 1 - 17 2001 64 x more devices since 1992 4 x faster devices P. Chiang with slides from C. Kozyrakis (Stanford)
Changes in Technology & Applications lead to Changes in Architecture • • 1970 s – Multi-chip CPUs – 1 M - 64 M transistors, 64 b CPUs – Semiconductor memory very expensive – Complex control to exploit instructionlevel parallelism – Complex instruction sets (good code density) – Deep pipelines – Microcoded control • 1990 s – Multi-level caches • 1980 s 2000 s – 100 M - 5 B transistors – 5 K – 500 K transistors – On-chip memory possible – Slow wires, power consumption, design, complexity, memory latency, IO bottlenecks, … – Simple, hard-wired control – Multiprocessors & parallel systems – Simple instruction sets – Support & programming for parallelism? – Single-chip, pipelined CPUs – Small on-chip caches – <<your Ph. D. thesis goes here>> Keeps computer architecture interesting and challenging EE 472 – Fall 2007 Lecture 1 - 18 P. Chiang with slides from C. Kozyrakis (Stanford)
Rules of Thumb in Data Engineering by J. Gray and Prashant Shenoy Storage 1. Moore’s Law: Things get 4 x denser every three years. 2. You need an extra bit of addressing every 18 months. 3. Storage capacities increase 100 x per decade. 4. Storage device throughput increases 10 x per decade. 5. Disk data cools 10 x per decade. 6. Disk page sizes increase 5 x per decade. 7. Nearline. Tape: Online. Disk: RAM storage cost ratios are approximately 1: 3: 300. 8. In ten years RAM will cost what disk costs today. 9. A person can administer a million dollars of disk storage – Disks are replacing tapes as backup devices. – On random workloads, disk mirroring is preferable to RAID 5 parity because it spends disk space (which is plentiful) to save disk accesses (which are precious). EE 472 – Fall 2007 Lecture 1 - 19 P. Chiang with slides from C. Kozyrakis (Stanford)
Metrics of Efficiency • Desktop computing ($500 - $3 K) – Metrics: ? ? – Prominent processors: Intel Pentium, AMD Athlon, Power. PC G 5 • Server computing ($3 K - $1 M) – Metrics: ? ? – Prominent processors: IBM Power 5, Sun Ultra. Sparc, AMD Opteron • Embedded computing ($10 - $500) – Metrics: ? ? – Prominent processors: ARM, MIPS, Motorola 68 K, many others Diversity in requirements leads to diversity in architectures EE 472 – Fall 2007 Lecture 1 - 20 P. Chiang with slides from C. Kozyrakis (Stanford)
Performance Metrics Plane DC to Paris Speed Passengers Throughput (pmph) Boeing 747 6. 5 hours 610 mph 470 286, 700 BAD/Sud Concorde 3 hours 1350 mph 132 178, 200 • Latency or execution time or response time – Wall-clock time to complete a task – Important if all we have to run is a single or a time-critical time to run • Bandwidth or throughput or execution rate – Number of tasks completed per unit of time • Bandwidth = total amount of work / total execution time – Metric is independent of exact number of tasks executed – Important when we have many tasks to run • What about Power? What about Cost? What about Reliability? EE 472 – Fall 2007 Lecture 1 - 21 P. Chiang with slides from C. Kozyrakis (Stanford)
Examples • Latency metric: program execution time in seconds – Your system architecture can affect all of them • CPI: memory latency, IO latency, … • CCT: cache organization, … • IC: OS overhead, … EE 472 – Fall 2007 Lecture 1 - 22 P. Chiang with slides from C. Kozyrakis (Stanford)
A is Faster than B? • Given the CPUtime for machines A and B, A is X times faster than B means: • Example, CPUtime. A=3. 4 sec & CPUtime. B=5. 3 sec then – A is 5. 3/3. 4=1. 55 times faster than B or 55% faster • If you start with bandwidth metrics of performance, use inverse ratio EE 472 – Fall 2007 Lecture 1 - 23 P. Chiang with slides from C. Kozyrakis (Stanford)
Speedup and Amdahl’s Law • Speedup = CPUtimeold / CPUtimenew • Given an optimization x that accelerates fraction fx of program by a factor of Sx, how much is the overall speedup? • Lesson’s from Amdhal’s law – Make common cases fast: as fx→ 1, speedup→Sx – But don’t overoptimize common case: as Sx→ , speedup→ 1 / (1 -fx) • Speedup is limited by the fraction of the code that can be accelerated • Uncommon case will eventually become the common one EE 472 – Fall 2007 Lecture 1 - 24 P. Chiang with slides from C. Kozyrakis (Stanford)
Amdahl’s Law Example • If Sx=100, what is the overall speedup as a function of fx? EE 472 – Fall 2007 Lecture 1 - 25 P. Chiang with slides from C. Kozyrakis (Stanford)
Historical Trend for Computer Performance Integer Performance 55% faster per year EE 472 – Fall 2007 Lecture 1 - 26 P. Chiang with slides from C. Kozyrakis (Stanford)
To Put it Into Perspective • 1982 -2000: computers getting 55% faster per year – Total of 4, 000 x – Significant cost improvements as well • What if other areas showed similar improvement rates? – Cars: 176, 000 mph or 64, 000 miles/gal – Airplanes: LA to NY in 5. 5 sec (MACH 3200) – Wheat: 320, 000 bushels per acre EE 472 – Fall 2007 Lecture 1 - 27 P. Chiang with slides from C. Kozyrakis (Stanford)
Digital System Cost • Cost is a very important design constraint – Most digital systems are consumer electronic produces • Cost distribution for $1 K PC – Processor board: 37% • Processor, memory, … – IO devices: 37% • Hard disk, DVD, monitor, keyboard, … – Software: 20% – Cabinet: 6% • Integrated circuits represent significant part of the system cost – Processor, memory, hard disk controller, graphics chips, networking chip EE 472 – Fall 2007 Lecture 1 - 28 P. Chiang with slides from C. Kozyrakis (Stanford)
Cost of Integrated Circuits EE 472 – Fall 2007 Lecture 1 - 29 P. Chiang with slides from C. Kozyrakis (Stanford)
Chip Cost is a Function of Size Chip cost increases roughly with die area 4 EE 472 – Fall 2007 Lecture 1 - 30 P. Chiang with slides from C. Kozyrakis (Stanford)
Cost – Performance Tradeoff • The trade-off – Chip cost is primarily a function of die area 4 – But bigger dies provide more resources for higher performance • The goal of a good architect – Find the knee of the performance-cost curve OR – Get maximum performance for a fixed cost target EE 472 – Fall 2007 Lecture 1 - 31 P. Chiang with slides from C. Kozyrakis (Stanford)
Other Cost Contributors • Testing cost – Cost/die = (cost/hour x test time) / yield – Could be $10 -$20 or more for complex chips • IC Packaging – Depends on die size, number of pins, and power dissipation • Cost of cooling system – <2 W no heat-sink, <10 W no fan, >100+W liquid/spray cooling • And most of all, do not forget VOLUME – Cost of a modern IC fabrication facility: >$2 B – Cost of a set of masks for a wafer: $0. 5 M - $1 M – Design NRE cost: often ~$10 M – Need volume to amortize all this cost… EE 472 – Fall 2007 Lecture 1 - 32 P. Chiang with slides from C. Kozyrakis (Stanford)
Cost Vs Price • Price is really what your customer cares about • Price components for a system vendor – Component cost: buying the parts • 47% of list price for $1 K PC – Direct costs: labor, warranties, dealing with scrap, … • 10% of list price for $1 K PC – Gross margin: company overhead • R&D, marketing, sales, buildings, maintenance , taxes, … • 19% of list price for $1 K PC – Average discount: plan for volume discounts… • 25% of list price for $1 K PC • As computers become commodity components, price matters a lot! EE 472 – Fall 2007 Lecture 1 - 33 P. Chiang with slides from C. Kozyrakis (Stanford)
Historical Trend for Processor Price EE 472 – Fall 2007 Lecture 1 - 34 P. Chiang with slides from C. Kozyrakis (Stanford)
Summary • Computer architecture: – Design of efficient systems given the requirements of applications and the capabilities/constraints of technology – Need to look a few years ahead with both applications & technology • Applications – Look for locality, parallelism, and predictability • Technology – Dealing with latency, power, and reliability are the upcoming challenges • Performance & cost – Two important efficiency metrics for most systems – Latency Vs. bandwidth performance metrics – Cost Vs. price EE 472 – Fall 2007 Lecture 1 - 35 P. Chiang with slides from C. Kozyrakis (Stanford)
EE 472 – Fall 2007 Lecture 1 - 36 P. Chiang with slides from C. Kozyrakis (Stanford)
EE 472 – Fall 2007 Lecture 1 - 37 P. Chiang with slides from C. Kozyrakis (Stanford)
EE 472 – Fall 2007 Lecture 1 - 38 P. Chiang with slides from C. Kozyrakis (Stanford)
EE 472 – Fall 2007 Lecture 1 - 39 P. Chiang with slides from C. Kozyrakis (Stanford)
EE 472 – Fall 2007 Lecture 1 - 40 P. Chiang with slides from C. Kozyrakis (Stanford)
EE 472 – Fall 2007 Lecture 1 - 41 P. Chiang with slides from C. Kozyrakis (Stanford)
EE 472 – Fall 2007 Lecture 1 - 42 P. Chiang with slides from C. Kozyrakis (Stanford)
Multiple Processors on Single Chip • Two processors on single-chip • Two chips(w/ two processors) in single package • 16 – 64 – 256 processors on single die – Stream Processors – Sun Niagara • http: //www. ece. ucdavis. edu/~ocin 06/talks/ho. pdf EE 472 – Fall 2007 Lecture 1 - 43 P. Chiang with slides from C. Kozyrakis (Stanford)
EE 472 – Fall 2007 Lecture 1 - 44 P. Chiang with slides from C. Kozyrakis (Stanford)
EE 472 – Fall 2007 Lecture 1 - 45 P. Chiang with slides from C. Kozyrakis (Stanford)
EE 472 – Fall 2007 Lecture 1 - 46 P. Chiang with slides from C. Kozyrakis (Stanford)
What does Moore’s Law buy you? EE 472 – Fall 2007 Lecture 1 - 47 P. Chiang with slides from C. Kozyrakis (Stanford)
EE 472 – Fall 2007 Lecture 1 - 48 P. Chiang with slides from C. Kozyrakis (Stanford)
EE 472 – Fall 2007 Lecture 1 - 49 P. Chiang with slides from C. Kozyrakis (Stanford)
EE 472 – Fall 2007 Lecture 1 - 50 P. Chiang with slides from C. Kozyrakis (Stanford)
EE 472 – Fall 2007 Lecture 1 - 51 P. Chiang with slides from C. Kozyrakis (Stanford)
EE 472 – Fall 2007 Lecture 1 - 52 P. Chiang with slides from C. Kozyrakis (Stanford)
EE 472 – Fall 2007 Lecture 1 - 53 P. Chiang with slides from C. Kozyrakis (Stanford)
EE 472 – Fall 2007 Lecture 1 - 54 P. Chiang with slides from C. Kozyrakis (Stanford)
EE 472 – Fall 2007 Lecture 1 - 55 P. Chiang with slides from C. Kozyrakis (Stanford)
EE 472 – Fall 2007 Lecture 1 - 56 P. Chiang with slides from C. Kozyrakis (Stanford)
EE 472 – Fall 2007 Lecture 1 - 57 P. Chiang with slides from C. Kozyrakis (Stanford)
EE 472 – Fall 2007 Lecture 1 - 58 P. Chiang with slides from C. Kozyrakis (Stanford)
EE 472 – Fall 2007 Lecture 1 - 59 P. Chiang with slides from C. Kozyrakis (Stanford)
EE 472 – Fall 2007 Lecture 1 - 60 P. Chiang with slides from C. Kozyrakis (Stanford)
EE 472 – Fall 2007 Lecture 1 - 61 P. Chiang with slides from C. Kozyrakis (Stanford)
EE 472 – Fall 2007 Lecture 1 - 62 P. Chiang with slides from C. Kozyrakis (Stanford)
EE 472 – Fall 2007 Lecture 1 - 63 P. Chiang with slides from C. Kozyrakis (Stanford)
EE 472 – Fall 2007 Lecture 1 - 64 P. Chiang with slides from C. Kozyrakis (Stanford)
- Ece 472
- Patrick chiang
- Scp 472
- Nfpa 472
- 472
- Lossless compression in digital image processing
- Cs 472
- Three bus architecture
- Kevin chiang uvm
- Holiday garden chiang mai
- Mao zedong komunismo sun yat-sen
- Chiang mai azure development
- Mae lai
- Iatul
- Mao tse-tung
- Chiang
- Chen duxiu
- Chiang kai shek
- Chiang
- Chiang kai-shek
- Story about your life example
- Juan pablo bastida
- Computer architecture and organisation
- Basic computer organization and design
- Software architecture definition
- Call and return architecture
- Modular architecture vs integrated architecture
- Slot modular architecture examples
- Computer organization and architecture 10th solution
- Computer architecture 101
- Software engineering virtual lab iit kharagpur
- Introduction to computer organization and architecture
- Timing and control in computer architecture
- Evolution of computer architecture
- Programmed i/o in computer architecture
- Fp adder
- Absolute addressing mode in computer architecture
- Static interconnection network in computer architecture
- Smt in computer architecture
- Square root in mips
- 111011-100100
- Instruction format in computer architecture
- What is nano programming in computer architecture
- Microprogrammed control
- Memory system design in computer architecture
- Dram memory mapping
- Reservation table in pipeline
- Computer architecture definition
- Parallel processing definition
- Architect of number system
- Computer architecture definition
- Stack isa example
- Printer is an input device
- Branch prediction
- Computer architecture patterson
- Sisd simd misd mimd examples
- What is guard bit in computer architecture
- Types of interrupt in computer organisation
- Datapath in computer architecture
- Explain virtual memory in computer architecture
- Computer architecture definition
- Dynamic interconnection network in computer architecture
- Bus interconnection in computer architecture
- Digital design and computer architecture
- Memory hierarchy in computer architecture