6 375 Complex Digital System Spring 2007 Lecturers

  • Slides: 37
Download presentation
6. 375 Complex Digital System Spring 2007 Lecturers: Arvind & Krste Asanović TAs: Myron

6. 375 Complex Digital System Spring 2007 Lecturers: Arvind & Krste Asanović TAs: Myron King & Ajay Joshi Assistant: Sally Lee February 7, 2007 http: //csg. csail. mit. edu/6. 375/ L 01 -1

Do we need more chips (ASICs)? ASIC=Application-Specific Integrated Circuit February 7, 2007 http: //csg.

Do we need more chips (ASICs)? ASIC=Application-Specific Integrated Circuit February 7, 2007 http: //csg. csail. mit. edu/6. 375/ 2

Wide Variety of Products Rely on ASICs Sensor Nets Cameras Media Players Set-top boxes

Wide Variety of Products Rely on ASICs Sensor Nets Cameras Media Players Set-top boxes Laptops Smart phones Servers Routers Automobiles February 7, 2007 Games Robots Supercomputers http: //csg. csail. mit. edu/6. 375/ 3

What’s required? ICs with dramatically higher performance, optimized for applications and at a size

What’s required? ICs with dramatically higher performance, optimized for applications and at a size and power to deliver mobility cost to address mass consumer markets Source: http: //www. intel. com/technology/silicon/mooreslaw/index. htm February 7, 2007 http: //csg. csail. mit. edu/6. 375/ 4

Let’s take a look at current CMOS technology. . . February 7, 2007 http:

Let’s take a look at current CMOS technology. . . February 7, 2007 http: //csg. csail. mit. edu/6. 375/ 5

Chip = Transistors + Wires “Vias” connect one layer to another Wiring added in

Chip = Transistors + Wires “Vias” connect one layer to another Wiring added in layers on top of wafer “Glass” on top seals chip Thicker wires on higher layers used for power and ground, and long range signals Thinner wires on lower layers used for dense local wiring Transistors fabricated first on original surface of wafer Bulk of wafer Cross-section through IBM 90 nm process, 10 metal layers February 7, 2007 http: //csg. csail. mit. edu/6. 375/ • [ISSCC 2004] 6

FET = Field-Effect Transistor A four terminal device (gate, source, drain, bulk) gate inversion

FET = Field-Effect Transistor A four terminal device (gate, source, drain, bulk) gate inversion happens here Surface of wafer Source diffusion Eh Ev Drain diffusion bulk Reverse side of wafer Inversion: A vertical field creates a channel between the source and drain. Conduction: If a channel exists, a horizontal field causes a drift current from the drain to the source. February 7, 2007 http: //csg. csail. mit. edu/6. 375/ 7

Simplified FET Model Binary logic values represented by voltages: “High” = Supply Voltage, “Low”

Simplified FET Model Binary logic values represented by voltages: “High” = Supply Voltage, “Low” = Ground Voltage S G D D G S February 7, 2007 Supply Voltage = VDD PFET connects S and D when G=“low”=0 V NFET connects D and S when G=“high”=VDD G PFET only good at pulling up G NFET only good at pulling down Ground = GND = 0 V http: //csg. csail. mit. edu/6. 375/ 8

NAND Gate A B (A. B) B A § When both A and B

NAND Gate A B (A. B) B A § When both A and B are high, output is low § When either A or B is low, output is high February 7, 2007 http: //csg. csail. mit. edu/6. 375/ 9

NAND Gate Layout Parallel PMOS Transistors P-Diffusion VDD (in N-well) (A. B) B A

NAND Gate Layout Parallel PMOS Transistors P-Diffusion VDD (in N-well) (A. B) B A (A. B) Poly wire connects PMOS & NMOS gates Output on Metal-1 Metal 1 -Diffusion Contact GND February 7, 2007 http: //csg. csail. mit. edu/6. 375/ A B N-Diffusion Series NMOS Transistors 10

Exponential growth: Moore’s Law Intel 8080 A, 1974 3 Mhz, 6 K transistors, 6

Exponential growth: Moore’s Law Intel 8080 A, 1974 3 Mhz, 6 K transistors, 6 u Intel 486, 1989, 81 mm 2 50 Mhz, 1. 2 M transistors, . 8 u Intel 8086, 1978, 33 mm 2 10 Mhz, 29 K transistors, 3 u Intel Pentium, 1993/1994/1996, 295/147/90 mm 2 66 Mhz, 3. 1 M transistors, . 8 u/. 6 u/. 35 u Shown with approximate relative sizes February 7, 2007 Intel 80286, 1982, 47 mm 2 12. 5 Mhz, 134 K transistors, 1. 5 u Intel 386 DX, 1985, 43 mm 2 33 Mhz, 275 K transistors, 1 u Intel Pentium II, 1997, 203 mm 2/104 mm 2 300/333 Mhz, 7. 5 M transistors, . 35 u/. 25 u http: //www. intel. com/intelis/museum/exhibit/hist_micro/hof_main. htm http: //csg. csail. mit. edu/6. 375/ 11

Intel Penryn (2007) Dual core Quad-issue out-of-order superscalar processors 6 MB shared L 2

Intel Penryn (2007) Dual core Quad-issue out-of-order superscalar processors 6 MB shared L 2 cache 45 nm technology n n Metal gate transistors High-K gate dielectric 410 Million transistors 3+? GHz clock frequency Could fit over 500 486 processors on same size die. February 7, 2007 http: //csg. csail. mit. edu/6. 375/ 12

. . But Design Effort Growing Nvidia Graphics Processing Units Transistors (M) Design Effort

. . But Design Effort Growing Nvidia Graphics Processing Units Transistors (M) Design Effort per Chip Relative staffing on back-end 9 x growth in back-end staff 5 x growth in front-end staff Relative staffing on front-end Front-end is designing the logic (RTL) Back-end is fitting all the gates and wires on the chip; meeting timing specifications; wiring up power, ground, and clock February 7, 2007 http: //csg. csail. mit. edu/6. 375/ 13

Design Cost Impacts Chip Cost 90 nm ASIC cost breakdown, $30 M total n

Design Cost Impacts Chip Cost 90 nm ASIC cost breakdown, $30 M total n n n (Altera study): 59% chip design (architecture, logic & I/O design, product & test engineering) 30% software and applications development 11% prototyping (masks, wafers, boards) If we sell 100, 000 units, Non-Recurring Engineering (NRE) costs add $30 M/100 K = $300 per chip! Example above is for design using automated tools n Similar to what we’ll be using in 6. 375 Hand-crafted IBM-Sony-Toshiba Cell microprocessor achieves 4 GHz in 90 nm, but development cost was >$400 M February 7, 2007 http: //csg. csail. mit. edu/6. 375/ 14

Topics to address in 6. 375 How can we design complex billion transistor ASICs

Topics to address in 6. 375 How can we design complex billion transistor ASICs with reasonable effort? How good are our designs? n Performance, area, power February 7, 2007 http: //csg. csail. mit. edu/6. 375/ 15

Designer’s Dilemma ASIC Complexity n n n 2000: 1 M+ logic gates 2005: 10

Designer’s Dilemma ASIC Complexity n n n 2000: 1 M+ logic gates 2005: 10 M+ logic gates 2010: 100 M+ logic gates Constants Designer must take shortcuts n n Conservative design No time for exploration Educated guess & code Gates are free mentality n n n 10 -30 person design team size 18 month design schedule Design flow -- unchanged for 10+ years! LPM Pipeline. Area Speed LPM example: Pipeline n Which is best? (gates) (ns) Memory Util (%) Static 8, 898 3. 60 63. 5 Linear 15, 910 4. 70 99. 9 Circular 8, 170 3. 67 99. 9 Static (2) 2, 391 3. 32 63. 5 [ICCAD’ 04] What happens when a designer must implement a 1 M gate block? Sub-optimal implementations! Alternatives? February 7, 2007 http: //csg. csail. mit. edu/6. 375/ 16

6. 375 Course Philosophy Effective abstractions to reduce design effort n n n High-level

6. 375 Course Philosophy Effective abstractions to reduce design effort n n n High-level design language rather than logic gates Control specified with Guarded Atomic Actions rather than with finite state machines Guarded module interfaces automatically ensure correctness of composition of existing modules Design discipline to avoid bad design points n Decoupled units rather than tightly coupled state machines Design space exploration to find good designs n Architecture choice has largest impact on solution quality A unified view of languages, disciplines and tools that supports rapid design space exploration to find best area, power, and performance point with reduced design effort February 7, 2007 http: //csg. csail. mit. edu/6. 375/ 17

6. 375 Objectives By end of term, you should be able to: Decompose system

6. 375 Objectives By end of term, you should be able to: Decompose system requirements into a hierarchy of sub-units that are easy to specify, implement, and verify, and which can be reused Develop efficient verification and test plans Select appropriate microarchitectures for a unit and perform microarchitectural exploration to meet price, performance, and power goals Use industry-standard tool flows Complete a working million gate chip design! Make millions $$$ at a new chip startup (Don’t forget your alma mater!) February 7, 2007 http: //csg. csail. mit. edu/6. 375/ 18

6. 375 Prerequisites You must be familiar with undergraduate (6. 004) logic design :

6. 375 Prerequisites You must be familiar with undergraduate (6. 004) logic design : n n n Combinational and sequential logic design Dynamic Discipline (clocking, setup and hold) Finite State Machine design Binary arithmetic and other encodings Simple pipelining ROMs/RAMs/register files Additional circuit knowledge (6. 002, 6. 374) useful but not vital Architecture knowledge (6. 823) helpful for projects February 7, 2007 http: //csg. csail. mit. edu/6. 375/ 19

6. 375 Structure First half of term (before Spring Break) n n Lecture or

6. 375 Structure First half of term (before Spring Break) n n Lecture or tutorial MWF, 2: 30 pm to 4: 00 pm in 32 -124 Three labs (on Athena, lab machines in 38 -301) Form project teams (2 -3 students); prepare project proposal (watch website for project ideas) Closed-book 90 minute quiz (Friday before Spring Break) Second half of term (after Spring Break) n n Weekly project milestones, with 1 -2 page report Weekly project meeting with the instructors and TAs Final project presentations in last week of classes Final project report (~15 -20 pages) due Thursday May 17 (no extensions) February 7, 2007 http: //csg. csail. mit. edu/6. 375/ 20

6. 375 Grade Breakdown Three labs Quiz Five project milestones Final project report 30%

6. 375 Grade Breakdown Three labs Quiz Five project milestones Final project report 30% 25% 25% (including presentation) February 7, 2007 http: //csg. csail. mit. edu/6. 375/ 21

6. 375 Collaboration Policy We strongly encourage students to collaborate on understanding the course

6. 375 Collaboration Policy We strongly encourage students to collaborate on understanding the course material, BUT: n n n Each student must turn in individual solutions to labs Students must not discuss quiz contents with students who have not yet taken the quiz If you’re inadvertently exposed to quiz contents before the exam, by whatever means, you must immediately inform the instructors or TA February 7, 2007 http: //csg. csail. mit. edu/6. 375/ 22

ASIC Design Styles February 7, 2007 http: //csg. csail. mit. edu/6. 375/ L 01

ASIC Design Styles February 7, 2007 http: //csg. csail. mit. edu/6. 375/ L 01 -23

Hardware Design Abstraction Levels Application Algorithm Unit-Transaction Level (UTL) Model Guarded Atomic Actions (Bluespec)

Hardware Design Abstraction Levels Application Algorithm Unit-Transaction Level (UTL) Model Guarded Atomic Actions (Bluespec) Register-Transfer Level (Verilog RTL) Gates Circuits Devices Physics February 7, 2007 http: //csg. csail. mit. edu/6. 375/ 24

ASIC Design Styles Full-Custom (every transistor hand-drawn) n Best possible performance: as used by

ASIC Design Styles Full-Custom (every transistor hand-drawn) n Best possible performance: as used by Intel m. Ps Semi-Custom (Some custom + some cell-based design) n Reduced design effort: AMD m. Ps plus recent Intel m. Ps Cell-Based ASICs (Only use cells in standard library) n n High-volume, moderate performance: Graphics chips, network chips, cellphone chips This is what we’ll use in 6. 375 Mask-Programmed Gate Arrays/Structured ASICs n Medium-volume, moderate performance applications Field-Programmable Gate Arrays n Low-volume, low-moderate performance applications, and prototyping Comparing styles: how many design-specific mask layers per ASIC? n how much freedom to develop own circuits? n what design methods and tools are needed? n February 7, 2007 http: //csg. csail. mit. edu/6. 375/ 25

Custom and Semi-Custom Usually, in-house design team develops own libraries of cells for commonly

Custom and Semi-Custom Usually, in-house design team develops own libraries of cells for commonly used components: memories n register files n datapath cells n random logic cells n repeaters n clock buffers n I/O pads n In extreme cases, every transistor instance can be individually sized ($$$$) n approach used in Alpha microprocessor development The trend is towards greater use of semi-custom design style use a few great circuit designers to create cells n redirect most effort at microarchitecture and cell placement to keep wires short n February 7, 2007 http: //csg. csail. mit. edu/6. 375/ 26

Custom Designer works with Low-Level Design Rules Surround rule Exclusion rule Extension rules Width

Custom Designer works with Low-Level Design Rules Surround rule Exclusion rule Extension rules Width rules Spacing rules An abstraction of the fabrication process that specify various geometric constraints on how different masks can be drawn Design rules can be absolute measurements (e. g. in nm) or scaled to an abstract unit, the lambda. The value of lambda depends on the manufacturing process finally used. February 7, 2007 http: //csg. csail. mit. edu/6. 375/ 27

Standard Cell ASICs aka Cell-Based ICs (CBICs) Fixed library of cells + memory generators,

Standard Cell ASICs aka Cell-Based ICs (CBICs) Fixed library of cells + memory generators, often provided by fabrication foundry or third-party library providers Cells can be synthesized from HDL, or entered in schematics Cells placed and routed automatically Requires complete set of custom masks for each design Currently most popular hard-wired ASIC type (6. 375 will use this) Cells arranged in rows Mem 1 February 7, 2007 Mem 2 http: //csg. csail. mit. edu/6. 375/ Generated memory arrays 28

Standard Cell Library Components Well Contact under Power Rail Clock Rail (not typical) Clock

Standard Cell Library Components Well Contact under Power Rail Clock Rail (not typical) Clock Rail VDD Rail Cell I/O on M 2 Power Rails in M 1 GND Rail NAND 2 Flip-flop Cells have standard height but vary in width Designed to connect power, ground, and wells by abutment February 7, 2007 http: //csg. csail. mit. edu/6. 375/ 29

6. 375 Standard Cell Design Flow Bluespec System. Verilog source Bluespec Compiler Blueview Verilog

6. 375 Standard Cell Design Flow Bluespec System. Verilog source Bluespec Compiler Blueview Verilog 95 RTL C Bluespec C sim Cycle Accurate Verilog sim VCD output Legend files Bluespec tools 3 rd party tools February 7, 2007 RTL synthesis gates Debussy Visualization http: //csg. csail. mit. edu/6. 375/ 30

Standard Cell Design Examples Channel routing for 1. 0 mm 2 -metal stdcells Over

Standard Cell Design Examples Channel routing for 1. 0 mm 2 -metal stdcells Over cell routing for 0. 18 mm 6 -metal stdcells February 7, 2007 http: //csg. csail. mit. edu/6. 375/ 31

Mask-Programmed Gate Arrays Can cut mask costs by prefabricating arrays of fixed size transistors

Mask-Programmed Gate Arrays Can cut mask costs by prefabricating arrays of fixed size transistors on wafers Only customize metal layer for each design GND NMOS PMOS VDD PMOS Two kinds: § Channeled Gate Arrays – Leave space between rows of transistors for routing § Sea-of-Gates – Route over the top of unused transistors NMOS GND [ OCEAN Sea-of-Gates Base Pattern ] February 7, 2007 http: //csg. csail. mit. edu/6. 375/ 32

Gate Array Personalization Isolating transistors by shared GND contact Isolating transistors with “off” gate

Gate Array Personalization Isolating transistors by shared GND contact Isolating transistors with “off” gate GND February 7, 2007 http: //csg. csail. mit. edu/6. 375/ 33

Gate Array Pros and Cons Cheaper and quicker since less masks to make n

Gate Array Pros and Cons Cheaper and quicker since less masks to make n Can stockpile wafers with diffusion and poly finished Memory inefficient when made from gate array n n Embedded gate arrays add multiple fixed memory blocks to improve density (=>Structured ASICs) Cell-based array designed to provide efficient memory cell (6 transistors in basic cell) Logic slow and big due to fixed transistors and wiring overhead n Advanced cell-based arrays hardwire logic functions (NANDs/NORs/LUTs) which are personalized with metal February 7, 2007 http: //csg. csail. mit. edu/6. 375/ 34

Field-Programmable Gate Arrays (FPGAs) Arrays mass-produced and programmed by customer after fabrication n Can

Field-Programmable Gate Arrays (FPGAs) Arrays mass-produced and programmed by customer after fabrication n Can be programmed by blowing fuses, loading SRAM bits, or loading FLASH memory Each cell in array contains a programmable logic function Array has programmable interconnect between logic functions Overhead of programmability makes arrays expensive and slow but startup costs are low, so much cheaper than ASIC for small volumes February 7, 2007 http: //csg. csail. mit. edu/6. 375/ 35

Xilinx Configurable Logic Block February 7, 2007 http: //csg. csail. mit. edu/6. 375/ 36

Xilinx Configurable Logic Block February 7, 2007 http: //csg. csail. mit. edu/6. 375/ 36

FPGA Pros and Cons Advantages n n Dramatically reduce the cost of errors Remove

FPGA Pros and Cons Advantages n n Dramatically reduce the cost of errors Remove the reticle costs from each design Disadvantages (as compared to an ASIC) [Kuon & Rose, FPGA 2006] n n n Switching power around ~12 X worse Performance up 3 -4 X worse Still requires Area 20 -40 X greater tremendous design effort at RTL level February 7, 2007 http: //csg. csail. mit. edu/6. 375/ 37