MSc Microprocessors Dr Konstantinos Tatas com tkfit ac

  • Slides: 97
Download presentation
MSc - Microprocessors Dr. Konstantinos Tatas com. tk@fit. ac. cy 1

MSc - Microprocessors Dr. Konstantinos Tatas com. tk@fit. ac. cy 1

Useful Information n Instructor: Lecturer K. Tatas – Office hours: TBA – E-mail: com.

Useful Information n Instructor: Lecturer K. Tatas – Office hours: TBA – E-mail: com. tk@fit. ac. cy – http: //staff. fit. ac. cy/com. tk Lecture periods/week: 4 n Duration: 10 weeks n ECTS: 7 (175 hours) n 2

Course Objectives n By the end of the course students should be able to:

Course Objectives n By the end of the course students should be able to: – Evaluate the complex trade-offs involved in embedded system design – Write detailed embedded system requirements and specification documents – Write executable specifications using UML/System. C – Develop applications using ARM Developer Suite – Write efficient ARM assembly and C programs in ARM and Thumb mode – Analyze program performance using traces – Use code transformations to improve performance/code size/power consumption. 3

Course Outline (1/2) n n Week 1: Introduction to microprocessors for general purpose and

Course Outline (1/2) n n Week 1: Introduction to microprocessors for general purpose and embedded systems – Embedded microprocessor evolution – Design metrics and constraints (performance, power, cost, time-to-market) and design optimization challenges Key embedded system technologies – Integrated Circuit technology – Microprocessor technology – CAD tool technology Week 2: Embedded system specification and modeling – Objectoriented specification (UML/C++/System. C) – Assignment 1 Week 3: Computer Architecture – Instruction sets – RISC vs. CISC – pipelining - The ARM microprocessor architecture - ARM assembly – ARM mode – Thumb mode - ARM and Thumb instruction set ARM conditional execution Week 4: Processor I/O – Serial I/O – Busy/wait I/O – Interrupts – Exceptions – Traps – ARM memory mapped I/O - Caches – Memory Management Units – Protection Units – ARM cache and MMU – Assignment 2 4

Course Outline (2/2) n n n Week 5: Programme design and analysis – DFGs

Course Outline (2/2) n n n Week 5: Programme design and analysis – DFGs – Compilers – Assemblers – Linkers – Basic compiler optimizations/code transformations – Measuring programme speed – Trace-driven performance analysis – Energy optimization – programme size optimization Week 6: Code transformations – Loop unrolling – loop merging – loop tiling – performance optimizing transformations Week 7: Test Week 8: Assignment 3 Week 9: Week 10: Revision 5

Course Assessment Final exam: 60% n Coursework: 40% n – Assignment 1: 8% –

Course Assessment Final exam: 60% n Coursework: 40% n – Assignment 1: 8% – Assignment 2: 8% – Assignment 3: 8% – Test: 10% – Lab exercises: 6% 6

Outline n n VLSI and microprocessor evolution Microprocessors in embedded systems Design challenge –

Outline n n VLSI and microprocessor evolution Microprocessors in embedded systems Design challenge – optimizing design metrics Technologies – Processor technologies – IC technologies – Design technologies 7

ENIAC – The first electronic computer (1946) 8

ENIAC – The first electronic computer (1946) 8

Cross-Section of CMOS Technology 9

Cross-Section of CMOS Technology 9

Moore’s Law 13

Moore’s Law 13

Intel 4004 Micro-Processor 14

Intel 4004 Micro-Processor 14

15

15

16

16

17

17

18

18

Cell Processor for Playstation 3 19

Cell Processor for Playstation 3 19

IBM POwer 5 20

IBM POwer 5 20

IBM Power. PC history 21

IBM Power. PC history 21

Technology Process Evolution 22 Node years: 2007/65 nm, 2010/45 nm, 2013/33 nm, 2016/23 nm

Technology Process Evolution 22 Node years: 2007/65 nm, 2010/45 nm, 2013/33 nm, 2016/23 nm

Technology Process Evolution 23

Technology Process Evolution 23

Intel’s Technology Roadmap Mark Bohr: Intel 04 24

Intel’s Technology Roadmap Mark Bohr: Intel 04 24

Raising the Level of Abstraction for Design IP Blocks & No. C CPU Core

Raising the Level of Abstraction for Design IP Blocks & No. C CPU Core RTOS DSP Core Performance driven HWR Blue. Tooth Driver 3 G ENGINE Comp/Enc Driver IR/RS 232 Driver FPGA Blue. Tooth Controller RTL Compression & Encryption Engine IR & RS 232 Functional Differentiation Software Gate Transistor Courtesy from: Walden C. Rhines – Mentor Graphics Corporation, DAC 2004 25

Bibliography n Books – W. Wolf, “Computers as Components” – S. Furber, “ARM System-on-Chip

Bibliography n Books – W. Wolf, “Computers as Components” – S. Furber, “ARM System-on-Chip Architecture” – P. Panda, “Memory Issues in Embedded Systems-on-Chip” – F. Vahid and T. Givargis, “Embedded System Design: A Unified Hardware/Software Introduction” – F. Catthoor, “Data Access and Storage Management for Embedded Programmable Processors” 26

Microprocessors for Embedded systems n n Computing systems are everywhere Most of us think

Microprocessors for Embedded systems n n Computing systems are everywhere Most of us think of “desktop” computers – – n PC’s Laptops Mainframes Servers But there’s another type of computing system – Far more common. . . 27

Embedded systems overview n Embedded computing systems Computers are in here. . . –

Embedded systems overview n Embedded computing systems Computers are in here. . . – Computing systems embedded within electronic devices – Hard to define. Nearly any computing system other than a desktop computer – Billions of units produced yearly, versus millions of desktop units – Perhaps 50 per household and per automobile and here. . . and even here. . . Lots more of these, though they cost a lot less each. 28

A “short list” of embedded systems Anti-lock brakes Auto-focus cameras Automatic teller machines Automatic

A “short list” of embedded systems Anti-lock brakes Auto-focus cameras Automatic teller machines Automatic toll systems Automatic transmission Avionic systems Battery chargers Camcorders Cell phones Cell-phone base stations Cordless phones Cruise control Curbside check-in systems Digital cameras Disk drives Electronic card readers Electronic instruments Electronic toys/games Factory control Fax machines Fingerprint identifiers Home security systems Life-support systems Medical testing systems Modems MPEG decoders Network cards Network switches/routers On-board navigation Pagers Photocopiers Point-of-sale systems Portable video games Printers Satellite phones Scanners Smart ovens/dishwashers Speech recognizers Stereo systems Teleconferencing systems Televisions Temperature controllers Theft tracking systems TV set-top boxes VCR’s, DVD players Video game consoles Video phones Washers and dryers And the list goes on and on 29

Some common characteristics of embedded systems n Single-functioned – Executes a single program, repeatedly

Some common characteristics of embedded systems n Single-functioned – Executes a single program, repeatedly n Tightly-constrained – Low cost, low power, small, fast, etc. n Reactive and real-time – Continually reacts to changes in the system’s environment – Must compute certain results in real-time without delay 30

An embedded system example – Digital camera CCD Digital camera chip A 2 D

An embedded system example – Digital camera CCD Digital camera chip A 2 D CCD preprocessor Pixel coprocessor D 2 A lens JPEG codec Microcontroller Multiplier/Accum DMA controller Memory controller n n n Display ctrl ISA bus interface UART LCD ctrl Single-functioned -- always a digital camera Tightly-constrained -- Low cost, low power, small, fast Reactive and real-time -- only to a small extent 31

Embedded Software Development Requires as Much/More Design Effort Than Hardware 32

Embedded Software Development Requires as Much/More Design Effort Than Hardware 32

A System-on-a-Chip: Example Courtesy: Philips 33

A System-on-a-Chip: Example Courtesy: Philips 33

Design at a crossroad Multi- 500 k Gates FPGA Spectral + 1 Gbit DRAM

Design at a crossroad Multi- 500 k Gates FPGA Spectral + 1 Gbit DRAM Imager Preprocessing 64 SIMD Processor Array + SRAM Image Conditioning 100 GOPS Analog System-on-a-Chip m. C system +2 Gbit DRAM Recognition n n Embedded applications where cost, performance, and energy are the real issues! DSP and control intensive Mixed-mode Combines programmable and application-specific modules Software plays crucial role 34

Disciplines involved in Embedded System Design n n n n Digital System Design Software

Disciplines involved in Embedded System Design n n n n Digital System Design Software Design Analog/Mixed-Signal/RF System Design Operating Systems Microprocessors/Computer Architecture Verification Testing etc 35

Languages traditionally used in Embedded System Design n Specification/modeling – – – n n

Languages traditionally used in Embedded System Design n Specification/modeling – – – n n – – – UML SDL C/C++ Hardware design – VHDL – Verilog Software design n C/C++ Java Assembly Verification – – VHDL/Verilog System. Verilog Tcl/tk Vera 36

Design Challenges n n How much hardware do we need? How do we meet

Design Challenges n n How much hardware do we need? How do we meet (system) deadlines? – Faster clock? n How do we minimize power consumption? – Slower clock? n n How do we design for upgradeability? How do you know it really works? – Complex testing – Limited observability and controllability 37

Design challenge – optimizing design metrics n Obvious design goal: – Construct an implementation

Design challenge – optimizing design metrics n Obvious design goal: – Construct an implementation with desired functionality n Key design challenge: – Simultaneously optimize numerous design metrics n Design metric – A measurable feature of a system’s implementation – Optimizing design metrics is a key challenge 38

Design challenge – optimizing design metrics n Common metrics – Unit cost: the monetary

Design challenge – optimizing design metrics n Common metrics – Unit cost: the monetary cost of manufacturing each copy of the system, excluding NRE cost – NRE cost (Non-Recurring Engineering cost): The one-time monetary cost of designing the system – Size: the physical space required by the system – Performance: the execution time or throughput of the system – Power: the amount of power consumed by the system – Flexibility: the ability to change the functionality of the system without incurring heavy NRE cost 39

Design challenge – optimizing design metrics n Common metrics (continued) – Time-to-prototype: the time

Design challenge – optimizing design metrics n Common metrics (continued) – Time-to-prototype: the time needed to build a working version of the system – Time-to-market: the time required to develop a system to the point that it can be released and sold to customers – Maintainability: the ability to modify the system after its initial release – Correctness, safety, many more 40

Design metric competition -- improving one may worsen others Power n Performance Size NRE

Design metric competition -- improving one may worsen others Power n Performance Size NRE cost CCD Digital camera chip A 2 D CCD preprocessor Pixel coprocessor D 2 A lens JPEG codec Microcontroller Multiplier/Accum DMA controller Memory controller Display ctrl ISA bus interface UART LCD ctrl Expertise with both software and hardware is needed to optimize design metrics – Not just a hardware or software expert, as is common – A designer must be comfortable with various technologies in order to choose the best for a given application and constraints 41

Time-to-market: a demanding design metric Revenues ($) n n Time (months) n n Time

Time-to-market: a demanding design metric Revenues ($) n n Time (months) n n Time required to develop a product to the point it can be sold to customers Market window – Period during which the product would have highest sales Average time-tomarket constraint is about 8 months 42 Delays can be costly

Losses due to delayed market entry Revenues ($) Peak revenue On-time Peak revenue from

Losses due to delayed market entry Revenues ($) Peak revenue On-time Peak revenue from delayed entry Market fall Market rise Delayed D On-time entry n Delayed entry W 2 W Time n Simplified revenue model – Product life = 2 W, peak at W – Time of market entry defines a triangle, representing market penetration – Triangle area equals revenue Loss – The difference between the on-time and delayed triangle areas 43

Revenues ($) Losses due to delayed market entry (cont. ) n Peak revenue On-time

Revenues ($) Losses due to delayed market entry (cont. ) n Peak revenue On-time n Market fall Market rise Delayed D On-time entry Peak revenue from delayed entry Delayed entry n W 2 W Time – – – Area = 1/2 * base * height – On-time = 1/2 * 2 W * W – Delayed = 1/2 * (WD+W)*(W-D) Percentage revenue loss = (D(3 W-D)/2 W 2)*100% Try some examples Lifetime 2 W=52 wks, delay D=4 wks (4*(3*26 – 4)/2*26^2) = 22% Lifetime 2 W=52 wks, delay D=10 wks (10*(3*26 – 10)/2*26^2) = 50% Delays are costly! 44

The performance design metric n n Widely-used measure of system, widely-abused – Clock frequency,

The performance design metric n n Widely-used measure of system, widely-abused – Clock frequency, instructions per second – not good measures – Digital camera example – a user cares about how fast it processes images, not clock speed or instructions per second Latency (response time) – Time between task start and end – e. g. , Camera’s A and B process images in 0. 25 seconds Throughput – Tasks per second, e. g. Camera A processes 4 images per second – Throughput can be more than latency seems to imply due to concurrency, e. g. Camera B may process 8 images per second (by capturing a new image while previous image is being stored). Speedup of B over S = B’s performance / A’s performance 45 – Throughput speedup = 8/4 = 2

Three key embedded system technologies n Technology – A manner of accomplishing a task,

Three key embedded system technologies n Technology – A manner of accomplishing a task, especially using technical processes, methods, or knowledge n Three key technologies for embedded systems – Processor technology – IC technology – Design technology 46

Processor technology n n The architecture of the computation engine used to implement a

Processor technology n n The architecture of the computation engine used to implement a system’s desired functionality Processor does not have to be programmable – “Processor” not equal to general-purpose processor Controller Datapath Control logic and State register Register file Control logic and State register Registers Control logic index State register + IR PC General ALU IR Custom ALU PC Data memory Program memory Assembly code for: Data memory total = 0 for i =1 to … General-purpose (“software”) total Data memory Program memory Assembly code for: total = 0 for i =1 to … Application-specific 47 Single-purpose (“hardware”)

Processor technology n Processors vary in their customization for the problem at hand total

Processor technology n Processors vary in their customization for the problem at hand total = 0 Desired functionality General-purpose processor Application-specific processor for i = 1 to N loop total += M[i] end loop Single-purpose processor 48

General-purpose processors n n Programmable device used in a variety of applications – Also

General-purpose processors n n Programmable device used in a variety of applications – Also known as “microprocessor” Features – Program memory – General datapath with large register file and general ALU User benefits – Low time-to-market and NRE costs – High flexibility “Pentium” the most well-known, but there are hundreds of others Controller Datapath Control logic and State register Register file IR PC Program memory General ALU Data memory Assembly code for: total = 0 for i =1 to … 49

Single-purpose processors n Digital circuit designed to execute exactly one program – a. k.

Single-purpose processors n Digital circuit designed to execute exactly one program – a. k. a. coprocessor, accelerator or peripheral n Datapath Control logic index total State register + Features – Contains only the components needed to execute a single program – No program memory n Controller Data memory Benefits – Fast – Low power – Small size 50

Application-specific processors n Programmable processor optimized for a particular class of applications having common

Application-specific processors n Programmable processor optimized for a particular class of applications having common characteristics Controller Datapath Control logic and State register Registers Custom ALU IR PC – Compromise between general-purpose Program memory and single-purpose processors n Features – – – n Program memory Optimized datapath Special functional units Data memory Assembly code for: total = 0 for i =1 to … Benefits – Some flexibility, good performance, size and power 51

IC technology n The manner in which a digital (gate-level) implementation is mapped onto

IC technology n The manner in which a digital (gate-level) implementation is mapped onto an IC – IC: Integrated circuit, or “chip” – IC technologies differ in their customization to a design – IC’s consist of numerous layers (perhaps 10 or more) n IC technologies differ with respect to who builds each layer and when gate IC package oxide IC source channel drain Silicon substrate 52

IC technology Design Approaches IC Technology Implementation Approaches Custom Semicustom Cell-based Standard Cells Compiled

IC technology Design Approaches IC Technology Implementation Approaches Custom Semicustom Cell-based Standard Cells Compiled Cells Array-based Macro Cells Pre-diffused (Gate Arrays) Pre-wired (FPGA's) 53

Full-custom design n All layers are optimized for an embedded system’s particular digital implementation

Full-custom design n All layers are optimized for an embedded system’s particular digital implementation – – – n Placing transistors Sizing transistors Routing wires Benefits – Excellent performance, small size, low power n Drawbacks – High NRE cost (e. g. , $300 k), long time-to-market 54

The Custom Approach Intel 4004 55 Courtesy Intel

The Custom Approach Intel 4004 55 Courtesy Intel

Transition to Automation and Regular Structures Intel 4004 (‘ 71) Intel 8286 Intel 8080

Transition to Automation and Regular Structures Intel 4004 (‘ 71) Intel 8286 Intel 8080 Intel 8085 56 Courtesy Intel 8486

57

57

IC technology Design Approaches IC Technology Implementation Approaches Custom Semicustom Cell-based Standard Cells Compiled

IC technology Design Approaches IC Technology Implementation Approaches Custom Semicustom Cell-based Standard Cells Compiled Cells Array-based Macro Cells Pre-diffused (Gate Arrays) Pre-wired (FPGA's) 58

Semi-custom n Lower layers are fully or partially built – Designers are left with

Semi-custom n Lower layers are fully or partially built – Designers are left with routing of wires and maybe placing some blocks n Benefits – Good performance, good size, less NRE cost than a full-custom implementation (perhaps $10 k to $100 k) n Drawbacks – Still require weeks to months to develop 59

Cell-based Design (or standard cells) Routing channel requirements are reduced by presence of more

Cell-based Design (or standard cells) Routing channel requirements are reduced by presence of more interconnect layers 60

Standard Cell — Example [Brodersen 92] 61

Standard Cell — Example [Brodersen 92] 61

Standard Cell - Example 3 -input NAND cell (from ST Microelectronics): C = Load

Standard Cell - Example 3 -input NAND cell (from ST Microelectronics): C = Load capacitance T = input rise/fall time 62

IC technology Design Approaches IC Technology Implementation Approaches Custom Semicustom Cell-based Standard Cells Compiled

IC technology Design Approaches IC Technology Implementation Approaches Custom Semicustom Cell-based Standard Cells Compiled Cells Array-based Macro Cells Pre-diffused (Gate Arrays) Pre-wired (FPGA's) 63

Programmable Logic Devices n All layers (diffusion, polysilicon, [multi-] metal) may exist – Designers

Programmable Logic Devices n All layers (diffusion, polysilicon, [multi-] metal) may exist – Designers can purchase an IC – Connections on the IC are either created or destroyed to implement desired functionality – Field-Programmable Gate Array (FPGA) and recently Gate Arrays are very popular n Benefits – Low NRE costs, almost instant IC availability n Drawbacks – Bigger, expensive (perhaps $30 per unit), power hungry, slower 64

Gate Array — Sea-ofgates Uncommited Cell Committed Cell (4 -input NOR) 65

Gate Array — Sea-ofgates Uncommited Cell Committed Cell (4 -input NOR) 65

Sea-of-gate Primitive Cells Using oxide-isolation Using gate-isolation 66

Sea-of-gate Primitive Cells Using oxide-isolation Using gate-isolation 66

Sea-of-gates Random Logic Memory Subsystem LSI Logic LEA 300 K (0. 6 mm CMOS)67

Sea-of-gates Random Logic Memory Subsystem LSI Logic LEA 300 K (0. 6 mm CMOS)67 67

Prewired Arrays Classification of prewired arrays (or fieldprogrammable devices): n Based on Programming Technique

Prewired Arrays Classification of prewired arrays (or fieldprogrammable devices): n Based on Programming Technique – Fuse-based (program-once) – Non-volatile EPROM based – RAM based n Programmable Logic Style – Array-Based – Look-up Table n Programmable Interconnect Style – Channel-routing – Mesh networks 68

Altera MAX 69 From Smith 97

Altera MAX 69 From Smith 97

Altera MAX Interconnect Architecture column channel row channel t PIA LAB 1 LAB 2

Altera MAX Interconnect Architecture column channel row channel t PIA LAB 1 LAB 2 LAB PIA t PIA LAB 6 Array-based (MAX 3000 -7000) Mesh-based (MAX 9000) 70

LUT-Based Logic Cell 4 C 1. . C 4 xx D 4 D 3

LUT-Based Logic Cell 4 C 1. . C 4 xx D 4 D 3 D 2 Logic function of xxx D 1 F 3 F 2 F 1 xxxx Bits control xx xx Logic functionx of xxx F 4 xxxx Logic function of xxx x xxxxx Xilinx 4000 Series xxxx xx x x Bits control xx xx xxxx xx xx H P x Multiplexer Controlled by Configuration Program x 71

Array-Based Programmable Wiring Interconnect Point Programmed interconnection Input/output pin Cell Horizontal tracks Vertical tracks

Array-Based Programmable Wiring Interconnect Point Programmed interconnection Input/output pin Cell Horizontal tracks Vertical tracks 72

Transistor Implementation of Mesh 73 Courtesy Dehon and Wawrzyniek

Transistor Implementation of Mesh 73 Courtesy Dehon and Wawrzyniek

RAM-based FPGA Xilinx XC 4000 ex 74

RAM-based FPGA Xilinx XC 4000 ex 74

Design Technology n The manner in which we convert our concept of desired system

Design Technology n The manner in which we convert our concept of desired system functionality into an Compilation/ Libraries/ Test/ Synthesis IP Verification implementation Compilation/Synthesis: Automates exploration and insertion of implementation details for lower level. Libraries/IP: Incorporates predesigned implementation from lower abstraction level into higher level. Test/Verification: Ensures correct functionality at each level, thus reducing costly iterations between levels. System specification System synthesis Hw/Sw/ OS Model simulat. / checkers Behavioral specification Behavior synthesis Cores Hw-Sw cosimulators RT specification RT synthesis RT components HDL simulators Logic specification Logic synthesis Gates/ Cells Gate simulators To final implementation 75

The co-design ladder n n In the past: – Hardware and software design technologies

The co-design ladder n n In the past: – Hardware and software design technologies were very different – Recent maturation of synthesis enables a unified view of hardware and software Hardware/software “codesign” Sequential program code (e. g. , C, VHDL) Compilers (1960's, 1970's) Behavioral synthesis (1990's) Register transfers Assembly instructions RT synthesis (1980's, 1990's) Assemblers, linkers (1950's, 1960's) Logic equations / FSM's Machine instructions Logic synthesis (1970's, 1980's) Logic gates Implementation Microprocessor plus VLSI, ASIC, or PLD program bits: “software” implementation: “hardware” The choice of hardware versus software for a particular function is simply a tradeoff among various design metrics, like performance, power, size, NRE cost, and especially flexibility; there is no fundamental difference between what hardware or software can implement. 76

Independence of processor and IC technologies n Basic tradeoff – General vs. custom –

Independence of processor and IC technologies n Basic tradeoff – General vs. custom – With respect to processor technology or IC technology – The two technologies are independent General, providing improved: Generalpurpose processor ASIP Singlepurpose processor Flexibility Maintainability NRE cost Time- to-prototype Time-to-market Cost (low volume) Customized, providing improved: Power efficiency Performance Size Cost (high volume) PLD Semi-custom Full-custom 77

Design Decision Trade-offs 78

Design Decision Trade-offs 78

Generalised Design Flow 79

Generalised Design Flow 79

Architecture Re. Use n Silicon System Platform – – – n Flexible architecture for

Architecture Re. Use n Silicon System Platform – – – n Flexible architecture for hardware and software Specific (programmable) components Network architecture Software modules Rules and guidelines for design of HW and SW Has been successful in PC’s – Dominance of a few players who specify and control architecture n Application-domain specific (difference in constraints) – – Speed (compute power) Dissipation Costs Real / non-real time data 80

Platform-Based Design n n “Only the consumer gets freedom of choice; designers need freedom

Platform-Based Design n n “Only the consumer gets freedom of choice; designers need freedom from choice” (Orfali, et al, 1996, p. 522) A platform is a restriction on the space of possible implementation choices, providing a well-defined abstraction of the underlying technology for the application developer New platforms will be defined at the architecture-micro-architecture boundary They will be component-based, and will provide a range of choices from structured-custom to fully programmable implementations Key to such approaches is the representation of communication in the platform model Source: R. Newton 81

Platform-based Design – System-on-Chip n n n Use of predefined Intellectual Property (IP) A

Platform-based Design – System-on-Chip n n n Use of predefined Intellectual Property (IP) A platform-based system consists of a RISC processor, memories, busses and a common language Platform-based design poses the problem of partitioning a solution between hardware (HDL) and software (programming processors) 82

Platforms Enable Simplified So. C Design Core n Near Peripherals n Far Peripherals Customer

Platforms Enable Simplified So. C Design Core n Near Peripherals n Far Peripherals Customer demands – Fast turn-around time – Easy access to pre-qualified building blocks – Web enabled Design technology – – – Core platforms ‘Big’ IP Emerging So. C bus standards Embedded software 83 HW/SW co-verification

And Automation of IP Selection & Integration 84

And Automation of IP Selection & Integration 84

Heterogeneous Programmable Platforms FPGA Fabric Embedded memories Embedded Power. Pc Hardwired multipliers Xilinx Vertex-II

Heterogeneous Programmable Platforms FPGA Fabric Embedded memories Embedded Power. Pc Hardwired multipliers Xilinx Vertex-II Pro High-speed I/O 85

Xilinx’s products 86

Xilinx’s products 86

Xilinx’s products 87

Xilinx’s products 87

Comparison of CMOS design methods Design Method NRE Unit Cost Power Dissipation Complexity of

Comparison of CMOS design methods Design Method NRE Unit Cost Power Dissipation Complexity of Implement ation Time-to. Market Performance Flexibility μProcessor /DSP low medium high low low high PLA low medium low FPGA low high medium medium Gate/Array medium low medium Cell Based high low high low Custom Design high low high Very high low Platform Based high Low/mediu m low high Medium/l ow high medium 88

None 100 -1000 10 -100 1 -10 Somewhat flexible Embedded microprocessor Domain-specific processor (e.

None 100 -1000 10 -100 1 -10 Somewhat flexible Embedded microprocessor Domain-specific processor (e. g. DSP) Configurable/Parameterizable Hardwired custom Energy Efficiency (in MOPS/m. W) Impact of Implementation Choices 0. 1 -1 Fully flexible Flexibility 89 (or application scope)

Design Economics (1) n n The selling price of an IC Stotal=Ctotal/(1 -m), Ctotal

Design Economics (1) n n The selling price of an IC Stotal=Ctotal/(1 -m), Ctotal is manufacturing cost for a single IC, m desired profit margin Costs for produce an IC – Non-recurring engineering costs (NREs) – Recurring engineering costs – Fixed costs 90

Design Economics (2) n Non-recurring engineering costs (NREs) – Engineering design cost – Prototype

Design Economics (2) n Non-recurring engineering costs (NREs) – Engineering design cost – Prototype manufacturing cost n Recurring costs – Process – Package – Test 91

NRE and unit cost metrics n Costs: – Unit cost: the monetary cost of

NRE and unit cost metrics n Costs: – Unit cost: the monetary cost of manufacturing each copy of the system, excluding NRE cost – NRE cost (Non-Recurring Engineering cost): The one-time monetary cost of designing the system – total cost = NRE cost + unit cost * # of units – per-product cost = total cost / # of units = (NRE cost / # of units) + unit cost • Example – NRE=$2000, unit=$100 – For 10 units – total cost = $2000 + 10*$100 = $3000 – per-product cost = $2000/10 + $100 = $300 Amortizing NRE cost over the units results in an additional $200 per unit 92

NRE and unit cost metrics n Compare technologies by costs -- best depends on

NRE and unit cost metrics n Compare technologies by costs -- best depends on quantity – Technology A: NRE=$2, 000, unit=$100 – Technology B: NRE=$30, 000, unit=$30 – Technology C: NRE=$100, 000, unit=$2 • But, must also consider time-to-market 93

Wafer and die cost Die yield: number of good dies/total number of dies 94

Wafer and die cost Die yield: number of good dies/total number of dies 94

Example n Assuming: n Calculate the minimum shelf price of the chip – 20

Example n Assuming: n Calculate the minimum shelf price of the chip – 20 engineers are employed full-time for a year with a $50, 000/year average salary – Additional 200, 000 overhead costs of which 100, 000 for total testing – A wafer cost of $200 per wafer – A $2 packaging cost per chip – 10 dies/wafer – 70% die yield – 98% final test yield – A market for 100, 000 items 95

Design productivity exponential increase 100, 000 100 10 1 Productivity (K) Trans. /Staff –

Design productivity exponential increase 100, 000 100 10 1 Productivity (K) Trans. /Staff – Mo. 10, 000 n 2009 2007 2005 2003 2001 1999 1997 1995 1993 1991 1989 1987 1985 1983 1981 0. 01 Exponential increase over the past few decades 96

The growing designproductivity gap Moore’s Law: Design Productivity Crisis (SRC 1997) Standard cell density

The growing designproductivity gap Moore’s Law: Design Productivity Crisis (SRC 1997) Standard cell density and speed 01 20 03 20 0 5 20 20 7 09 11 13 15 20 0 10 1 0. 1 ed nd te u po Ra m h t Logic Tr. / Chip co w yr Gro / Tr. / S. M. % y 58 lexit mp o C d unde o p om Rate / yr c h t % 1 w 2 ro ity G v x x i t c u Prod xx xx 0. 01 0. 001 x 100, 000 10, 000 100 10 1 x 0. 1 0. 01 09 20 7 0 20 05 20 03 20 01 20 99 19 7 9 19 95 19 3 9 19 91 19 89 19 7 8 19 85 19 83 19 1 8 19 20 Logic Transistor per Chip ( M ) Density (Kgates / mm 2) ASIC clock (MHz) 1, 000 Equivalent Added Complexity Productivity ( K) Trans. /Staff – Mo. Clock Gates 100 Potential Design Complexity and Designer Productivity 10, 000 97

Design productivity gap n n 1981 leading edge chip required 100 designer months –

Design productivity gap n n 1981 leading edge chip required 100 designer months – 10, 000 transistors / 100 transistors/month 2002 leading edge chip requires 30, 000 designer months – 150, 000 / 5000 transistors/month Designer cost increase from $1 M to $300 M While designer productivity has grown at an impressive rate over the past decades, the rate of improvement has not kept pace with chip capacity 98

The mythical man-month The situation is even worse than the productivity gap indicates In

The mythical man-month The situation is even worse than the productivity gap indicates In theory, adding designers to team reduces project completion time In reality, productivity per designer decreases due to complexities of team management and communication In the software community, known as “the mythical man-month” (Brooks 1975) At some point, can actually lengthen project completion time! (“Too many cooks”) n n n n 60000 1 M transistors, 1 designer=50000 trans/month 40000 Each additional 30000 designer reduces for 20000 100 trans/month So 2 designers produce 10000 4900 trans/month each 16 15 Team 16 19 18 23 24 Months until completion 43 Individual 0 10 20 30 Number of designers 40 99

Summary n n Embedded systems are everywhere Key challenge: optimization of design metrics –

Summary n n Embedded systems are everywhere Key challenge: optimization of design metrics – Design metrics compete with one another n n A unified view of hardware and software is necessary to improve productivity Three key technologies – Processor: general-purpose, application-specific, singlepurpose – IC: Full-custom, semi-custom, PLD – Design: Compilation/synthesis, libraries/IP, test/verification 100