COMP 206 Computer Architecture and Implementation Montek Singh

  • Slides: 24
Download presentation
COMP 206: Computer Architecture and Implementation Montek Singh Wed. , Aug 27, 2003 Lecture

COMP 206: Computer Architecture and Implementation Montek Singh Wed. , Aug 27, 2003 Lecture 1 1

Outline ã Course Information l Logistics l Grading l Syllabus l Course Overview ã

Outline ã Course Information l Logistics l Grading l Syllabus l Course Overview ã Technology Trends l Moore’s Law l The CPU-Memory Gap 2

Course Information (1) Time and Place l MW 2: 00 -3: 15 pm, Sitterson

Course Information (1) Time and Place l MW 2: 00 -3: 15 pm, Sitterson Hall 011 Instructor l Montek Singh l montek@cs. unc. edu (not singh@cs!) l SN 245, 962 -1832 l Office hours: MW 3: 15 -4: 15 pm, and by appointment Teaching Assistant l Maybe? Course Web Page l http: //www. cs. unc. edu/~montek l Portions may be password-protected 3

Course Information (2) Prerequisites l COMP 120 and digital logic (PHYS 102), or equivalent

Course Information (2) Prerequisites l COMP 120 and digital logic (PHYS 102), or equivalent l I assume you know the following topics Ø CPU: ALU, control unit, registers, buses, memory management Ø Control Unit: register transfer language, implementation, hardwired and microprogrammed control Ø Memory: address space, memory capacity Ø I/O: CPU-controlled (polling, interrupt), autonomous (DMA) l Representative books (available in Brauer Library) Ø Baron & Higbie: Computer Architecture. Addison Wesley, 1992 Ø Kuck: The Structure of Computers and Computations (Vol. 1). Wiley 1978 Ø Stallings: Computer Organization and Architecture: Designing for Performance (4 th edition). Prentice Hall, 1996 Ø Patterson & Hennessy: Computer Organization and Design: The Hardware/Software Interface (2 nd edition). Morgan Kaufmann Publishers, 1997 4

Course Information (3) Textbook l Hennessy & Patterson: Computer Architecture: A Quantitative Approach (3

Course Information (3) Textbook l Hennessy & Patterson: Computer Architecture: A Quantitative Approach (3 rd edition), Morgan Kaufmann Publishers, 2002 Ø available in the university bookstore Ø also from: www. amazon. com, www. bn. com, … 5

Course Information (4) Textbook (contd. ) l We will cover the following material: Ø

Course Information (4) Textbook (contd. ) l We will cover the following material: Ø Chapter 1 (Fundamentals of Computer Design) Ø Chapter 2 (Instruction Set Principles and Examples) Ø Appendix A (Pipelining: Basic and Intermediate Concepts) Ø Chapters 3 & 4 (Instruction-Level Parallelism) Ø Chapter 5 (Memory-Hierarchy Design) Ø Chapter 7 (Storage Systems) Ø Chapters 6 & 8 (Multiprocessors, Interconnection Networks) – selected topics, time permitting Additional readings/papers will be handed out in class l mostly on case studies 6

Course Information (5) Grading l 25% homework assignments (5 or so) l 25% midterm

Course Information (5) Grading l 25% homework assignments (5 or so) l 25% midterm exam l 15% small project Ø no system building, no extensive programming l 35% final exam Assignments are due at beginning of class on due date l Late assignments: penalty=20%/day Honor Code is in effect: for all homework/exams/projects l encouraged to discuss ideas/concepts with others l work handed in must be your own 7

What is in COMP 206 for me? Understand modern computer architecture so you can:

What is in COMP 206 for me? Understand modern computer architecture so you can: l Write better programs Ø Understand the performance implications of algorithms, data structures, and programming language choices l Write better compilers Ø Modern computers need better optimizing compilers and better programming languages l Write better operating systems Ø Need to re-evaluate the current assumptions and tradeoffs Ø Example: gigabit networks l Design better computer architectures Ø There are still many challenges left Ø Example: the CPU-memory gap l Satisfy the Distribution Requirement 8

Computer Architecture Is … “…the structure of a computer that a machine language programmer

Computer Architecture Is … “…the structure of a computer that a machine language programmer must understand to write a correct (timing independent) program for that machine. ” Amdahl, Blaauw, and Brooks, 1964 “Architecture of the IBM System 360”, IBM Journal of Research and Development 9

COMP 206 Course Focus Understanding the design techniques, machine structures, technology factors, and evaluation

COMP 206 Course Focus Understanding the design techniques, machine structures, technology factors, and evaluation methods that will determine the form of computers in 21 st century Technology Parallelism Applications Computer Architecture: • Instruction Set Design • Organization • Hardware Operating Systems Measurement & Evaluation Programming Languages Interface Design History 10

Computer Architecture Topics Input/Output and Storage Disks, Tape RAID Emerging Technologies Interleaving Bus protocols

Computer Architecture Topics Input/Output and Storage Disks, Tape RAID Emerging Technologies Interleaving Bus protocols DRAM Memory Hierarchy VLSI Coherence, Bandwidth, Latency L 2 Cache L 1 Cache Instruction Set Architecture Addressing, Protection, Exception Handling Pipelining, Hazard Resolution, Superscalar, Reordering, Prediction, Speculation Pipelining and Instruction Level Parallelism 11

Computer Engineering Methodology Evaluate Existing Systems for Bottlenecks Implementation Complexity Benchmarks Implement Next Generation

Computer Engineering Methodology Evaluate Existing Systems for Bottlenecks Implementation Complexity Benchmarks Implement Next Generation System Workloads Technology Trends Simulate New Designs and Organizations 12

Underlying Technologies Year 54 58 60 64 66 67 Generational 71 Logic Storage Tubes

Underlying Technologies Year 54 58 60 64 66 67 Generational 71 Logic Storage Tubes core (8 ms) Transistor (10µs) Evolutionary 73 75 78 80 84 Parallelism 87 89 92 (8 -bit µP) (16 -bit µP) VLSI (10 ns) Hybrid (1µs) IC (100 ns) LSI (10 ns) (32 -bit µP) ULSI Ga. As (64 -bit µP) Prog. Lang. Fortran Algol, Cobol thin film (200 ns) Lisp, APL, Basic PL 1, Simula, C 1 k DRAM 4 k DRAM 16 k DRAM 64 k DRAM 256 k DRAM 1 M DRAM 4 M DRAM 16 M DRAM O. O. O/S Batch Multiprog. V. M. Networks ADA C++ Fortran 90 13

Predictions for the Early 2000 s ã Technology l Very large dynamic RAM: 256

Predictions for the Early 2000 s ã Technology l Very large dynamic RAM: 256 Mbits to 1 Gb and beyond l Large fast static RAM: 16 MB, 5 ns ã Complete systems on a chip l 100+ million transistors ã Parallelism l Superscalar, Superpipelined, Vector, Multiprocessors? l Processor Arrays? ã Special-Purpose Architectures? ã Reconfigurable Computers? 14

Predictions for the Early 2000 s (2) ã Low Power l 50% of PCs

Predictions for the Early 2000 s (2) ã Low Power l 50% of PCs portable now (? ) l Hand held communicators l Performance per watt, battery life l Transmeta l Asynchronous (clockless) design ã Parallel I/O l Many applications I/O limited, not computation l Computation scaling, but memory, I/O bandwidth not keeping pace ã Multimedia l New interface technologies l Video, speech, handwriting, virtual reality, … 15

Diversion: Clocked Digital Design Most current digital systems are synchronous: l Clock: a global

Diversion: Clocked Digital Design Most current digital systems are synchronous: l Clock: a global signal that paces operation of all components clock Benefit of clocking: enables discrete-time representation l l all components operate exactly once per clock tick component outputs need to be ready by next clock tick Ø allows “glitchy” or incorrect outputs between clock ticks 16

Microelectronics Trends Current and Future Trends: Significant Challenges l Large-Scale “Systems-on-a-Chip” (So. C) Ø

Microelectronics Trends Current and Future Trends: Significant Challenges l Large-Scale “Systems-on-a-Chip” (So. C) Ø 100 Million ~ 1 Billion transistors/chip l Very High Speeds Ø multiple Giga. Hertz clock rates l Explosive Growth in Consumer Electronics Ø demand for ever-increasing functionality … Ø … with very low power consumption (limited battery life) l Higher Portability/Modularity/Reusability Ø “plug ’n play” components, robust interfaces 17

Alternative Paradigm: Asynchronous Design ã Digital design with no centralized clock ã Synchronization using

Alternative Paradigm: Asynchronous Design ã Digital design with no centralized clock ã Synchronization using local “handshaking” clock Synchronous System (Centralized Control) handshaking interface Asynchronous System (Distributed Control) Asynchronous Benefits: l Higher Performance: not limited by slowest component l Lower Power: zero clock power; inactive parts consume little power l Reduced Electromagnetic Noise: no clock spikes [e. g. , Philips pagers] l Greater Modularity: variable-speed interfaces; reusable components 18

Tech Trends: Moore’s Law CMOS improvements: • Die size: 2 x every 3 yrs

Tech Trends: Moore’s Law CMOS improvements: • Die size: 2 x every 3 yrs • Line width: halve / 7 yrs ã 4004 (’ 71): ã 8086 (’ 78): ã 486™ DX (’ 89): ã Pentium 4 (’ 00): 2, 250 transistors 29, 000 transistors 1, 180, 000 transistors 42, 000 transistors 19

Tech Trends: Memory Capacity # Megabytes on single DRAM chip year 1980 1983 1986

Tech Trends: Memory Capacity # Megabytes on single DRAM chip year 1980 1983 1986 1989 1992 1995 2002 size 64 Kb 256 Kb 1 Mb 4 Mb 16 Mb 64 Mb 512 Mb cyc time 250 ns 220 ns 190 ns 165 ns 145 ns 100 ns 60 ns 20

Technology Trends (Summary) Capacity Speed Logic 2 x in 2 years 2 x in

Technology Trends (Summary) Capacity Speed Logic 2 x in 2 years 2 x in 3 years DRAM 4 x in 3 years 1. 4 x in 10 years Disk 2 x in 3 years 1. 4 x in 10 years 21

Processor Perspective 23

Processor Perspective 23

Measurement Tools ã Die Area, Power, Speed Estimation Tools ã Benchmarks, Traces, Mixes ã

Measurement Tools ã Die Area, Power, Speed Estimation Tools ã Benchmarks, Traces, Mixes ã Simulation (many levels) l ISA, RT, Gate, Circuit ã Queuing Theory ã Rules of Thumb ã Fundamental Laws 25

The Bottom Line: Performance (and Cost) Plane DC to Paris Speed Passengers Throughput (pmph)

The Bottom Line: Performance (and Cost) Plane DC to Paris Speed Passengers Throughput (pmph) Boeing 747 6. 5 hours 610 mph 470 286, 700 Concorde 3 hours 1350 mph 132 178, 200 • Time to run the task (Ex. Time) – Execution time, response time, latency • Tasks per day, hour, week, sec, ns … (Performance) – Throughput, bandwidth 26