VLSI DESIGN 1998 TUTORIAL Part 1 Core Building

VLSI DESIGN 1998 TUTORIAL Part 1. Core Building Blocks and Building Systems using Cores Þ What are cores? Building systems using cores Challenges in using cores Rajesh K. Gupta University of California, Irvine.

68030 ARM 810 Available “Core” Building Blocks PPC 401 © 1998 R. Gupta 2

What Is A Core Cell? • • • Working definition at least 5 K gates pre-designed pre-verified “re-usable” Examples: – Processor: LSI logic CW 4001/4010/4100, ARM 7 TDMI, ARM 810, NEC 85 x, Motorola 680 x 0, IBM PPC – DSP cores: TI TMS 320 C 54 X, Pine, Oak – Encryption: PKu. P, DES – Controllers: USB, PCI, UART – Multimedia: JPEG comp. , MPEG decoder, DAC – Networking: ATM SAR, Ethernet © 1998 R. Gupta 3

Core Types • Soft cores (“code”) – HDL description – flexible, i. e. , can be changed to suit an application – technology independent: may be resynthesized across processes – significant IP protection risks • Firm cores (“code+structure”) – gate-level netlist to be placed and routed – technology sampled • Hard cores (“physical”) – – – © 1998 R. Gupta ready for “drop in” include layout and timing (technology dependent) IP is easily protected mostly processors and memory functional test vectors or ATPG vectors available. 4

Core Types and Their Use Behavioral HDL “Soft” “Synthesizable RTL” ISA model system design RTL Functional control generation, FSM synthesis Gate Netlist “Firm” logic design Gate Functional floorplanning, placement, routing Mask Data © 1998 R. Gupta Bus Functional scheduling, binding RTL HDL “Hard” system specification physical design Timing models Power models Fault Coverage Technology: ASIC or FPGA 5

Core Portability • Determined by technology independence and data format. – Technology independence based on the type of core – both open and proprietary data formats are current in use. DEF = Design Exchange Format (Cadence) SPEF = Standard Parasitic Extended Format (Cadence) GDSII = Layout format (Cadence) ITL = Interpolated Table Lookup cell-level timing model (Mentor) LEF = Layout Exchange Format (Cadence) © 1998 R. Gupta MMF = Motive Modeling Format (Viewlogic) NLDM = Non-linear Delay Model (Synopsys) TLF = Table Lookup Format (Cadence) VCD = Verilog Change Dump (Cadence) WGL = Waveform Graphical Language (TSSI) 6

Timing Information in Firm and Hard Cores • Timing behavior can be generated from SPICE inputs • However, it is not always possible for big cores – static timing information is necessary • Basic delay model – – © 1998 R. Gupta propagation delay model from inputs to outputs slew model (as a function of load and input slew) input/output capacitances setup and hold constraints on inputs. 7

What are cores? ÞBuilding systems using cores Challenges in using cores © 1998 R. Gupta 8

PCI Interface VRAM Graphics Glue Video Commodity Software: MEMORY - encryption/decryption - device drivers Cache/SRAM - legacy code or even DRAM - operating/runtime system Processor -compression -encryption Core -modem -signal proc. -image proc. SCSI Encryption/ Decryption Glue Motion LAN Interface Processor Core Commodity DSP Hardware: I/O Interface Building Systems-On-A-Chip Using Cores EISA Interface Hub Architecture © 1998 R. Gupta SOC is a SM of LSI Logic Corporation. 9

S-O-C Application Classes Set-top Games Video MPEG 1 Audio & MPEG 2 High-end PDA VOD+l. HQ Graphics Conferencing encoding Video encoding Set-top Derivatives Bridging Time-constrained computing systems. © 1998 R. Gupta 10

Systems-On-A-Chip (SOCs) Two Types: • Technology-Driven – Developed In-House, maximum leverage of technology “crown-jewels” – Close cooperation between module developers and system designers – or wide-ranging cross-licensing agreements between partners • Component-Driven – Core cells as IP carriers » IP encapsulated into “usable” products » design “reuse” is critical to IP products © 1998 R. Gupta 11

Component-Driven SOC • Core supplier different from core user – “Third party IP providers” • Significant technology packaging without importing it – The IP provider wants to sell a product and not the technology behind the product • Enormous technical, and legal challenges – can it be done successfully? – who guarantees if a SOC works as required – who is liable in case the end product does not perform? © 1998 R. Gupta 12

ASIC Cores Availability • • LSI logic Core. Ware IBM Microelectronics Motorola Flex. Ware One-Stop Lucent • • Digital Design & Dev: MIDI Hitachi: MPGE, PCI, SCSI, u. C Palmchip: MPEG, UART, ECC Silicon Engg. : micro VGA One-stop Shops • Logic. Vision: BIST, JTAG • ROHM: UART, SIO, PIO, FIFOc, Add, Mpy, ALU • Synopsys: Design. Ware, ISA, Intel u. C • Chip Express: FIFO, RAM, ROM • VLSI Libraries: Memory, Mpy • Focus Semi: PLL, VCXO • VLSI Cores: Encryption, DES • ASIC Intl: DES © 1998 R. Gupta • • • 3 Soft: u. C, DSP, LAN, SCSI, PI ARM: u. C, u. P Plessey: per. controllers, DSP Scenix: u. C, PCI, DMA Western Digital Center: u. C TI: DSP; NEC: DSP, u. C Symbios: ARM 7 TC VAutomation: u. P, controllers CAST: 2910 A, IDT 49 C 410, DMAc • Butterfly DSP: DSP, FFT, DFT, ADSL, OFDM • Int. Sil. Systems: ADPCM, FIR • Analog Devices: DSP • DSP Group: Pine, Oak • • • Eureka: PCI; Virtual Chips: PCI, USB Logic Innovations: PCI, ATM OKI: PCI, PCMCIA, DMA, UART Sand: USB, PCI Sierra: ATM SAR, Ether, R 3000 NOT EXHAUSTIVE. 13

FPGA/CPLD Cores Availability • Capacity constrained cores – do not include wide/high performance PCI, ATM SAR, or Microprocessors • Altera – 8 -bit 6502 – DMAC 8237 • Xilinx – PCI • Actel – System Programmable Gate Array (SPGA) » combine FPGA with customer ASIC » ASIC examples: PCI, Router, DMA controller. © 1998 R. Gupta 14

Foundary Captive Licensable Current Core Market Models Three ways: • 1. A design house licenses design and tools – DSP Group (Pine and Oak Cores), 3 Soft, ARM (RISC) – offering includes HDL simulation model, tool and/or an emulator – customer does the design, fab. • 2. Core vendor designs and fabs ICs – TI, Motorola, Lucent – VLSI, SSI, Cirrus, Adaptec • 3. Core vendor sells cores, takes customer designs and fabs ICs © 1998 R. Gupta – LSI logic, TI, Lucent Foundary captive cores do not have to reveal internal design and layoutof the core. The foundary provides a bounding box. 15

Core Trends: 1997 Survey of Designers Months to completion • 74% hardware designers. • 26% plan to purchase core for next design: – 40% hard, 68% soft, 32% firm © 1998 R. Gupta Source: Integrated System Design 16

Application Needs MEMORY PROCESSORS INTERFACE etc. ANALOG GENERICS Source: Integrated System Design © 1998 R. Gupta 17

Using Cores : PCI • Class of interface cores such as CPU – USB, UART, SCSI, PCI, 1394 etc. Host Bus • Identify target technology – ASIC, FPGA • PCI (Peripheral Component Interface) – processor independent CPU interface to peripherals – multi-master, peer-to-peer protocol – synchronous: 8 -33 MHz (132 MB/s) – arbitration: central, access oriented, “hidden” – variable length bursting on reads and writes – (I/O, Mem) x (Read, Write) and IACK commands © 1998 R. Gupta PCI controller ASIC Primary PCI Bus IDE PCI/IDE/ ISA Bus 18

PCI Cores • VHDL/Verilog synthesizable cores with options: – – PCI-Host, PCI-Satellite 32 -bit (33 MHz) or 64 -bit (66 MHz) FIFO or register data storage Synchronous or Asynchronous host interface • Core components – Master/Target Read/Write FIFOs, – Master/Target State Machines – Configuration registers • Timing requirements – input setup time = 7 ns; clock to output delay = 11 ns • DC Specs: input pin caps: 10 p. F, clk pin 12 p. F, ID Sel 8 p. F © 1998 R. Gupta 19

User Experience • Huges Network Systems: – Direc. PC ASIC in a satellite receiver card – 80 K gates device on Chip Express process • Direc. PC consists of – IDT R 3041 RISC controller – Memory, Demodulator, Error-check, PCI core Source: EE Times • PCI core from Virtual Chips – 17 K gates including asynchronous FIFOs – Guesstimate: 4 K extra gates due to the core (5%) • Comments: “Their test vectors assume you have direct access to the internal interface of the core. I looked through their test vectors and tried to do the same things using my back end. ” “They were kind of giving us a reference documentation. It wasn’t turnkey. ” © 1998 R. Gupta 20

Using Cores: DSPs • 16 -bit fixed point processors are most commonly used. • DSPs – simple: Clarkspur Design CD 2450 (variable data width) – compatible: DSPGroup, TI, SGS-T: 320 C 5 x – clone: • Options – memory, mem controller, interrupt controller, host port, serial port • Criticals – power consumption as most DSP applications go into portable products © 1998 R. Gupta 21

Design using DSP Cores • Core vendors often supply a development chip or core version of the COTS processor – board-level prototyping fairly common – followed by single-chip solution • To avoid board-level prototyping, a full-functional simulation model is a must, particularly for foundry captive cores. • Software tools provided – assembler, linker, instruction set simulator, debugger, (high -level language compiler? ) © 1998 R. Gupta 22

DSP Sample Points • TI TEC 320 C 52 – 16 -bit fixed-point TMS 320 C 52 » 1 Kx 16 data RAM, 4 Kx 16 program RAM » 2 serial ports, 1 16 -bit timer – and 0. 8 micron 15, 000 -gate array • Motorola 7 -Day CSIC – 8 -16 MHz HC 08, DMA, MMU, . . • SGS-Thomson ST 18932, ST 18950 – 16 -bit fixed-point DSPs, 0. 5 u, 3. 3 volt CMOS, 80 MHz – has no off-the-shelf DSP IC – used in PC sound cards, 950 has a better assembly Not exhaustive, only a representative sample. © 1998 R. Gupta 23

Third Party DSP Cores • DSPGroup Pine – 16 -bit fixed-point, 0. 8 u CMOS, 5. 0/3. 3 V, 40 MHz – 36 -bit ALU, 16 -bit MPY, 2 Kx 16 RAM/ROM, (prog mem is outside core) – used in pagers and answering machines • DSPGroup Oak – same as Pine, plus includes a bit manipulation unit – Viterbi decoding support instructions (min, max) – used in digital cellular telephony • Clarkspur CD 2400, CD 2450 – 16 -bit fixed-point – 24 -bit ALU, MPY, Acc, 2 x 256 x 16 data RAM/450 makes it 48 bits – used in fax-modem © 1998 R. Gupta 24

One-Stop Shops: LSI Logic Core. Ware • Cores for building ASIC for most embedded applications: – laser printer, ATM, PDA, Set-top, Router, Graphics accelerators, etc. • CPU cores: mini. RISC CW 4 K, Oak DSP – – mini. RISC compatible with MIPS R 4000 0. 5 u CMOS, 2 m. W/MHz, 60 MHz, 3 -stage pipeline 32 -bit address/data bus full scan: 99% fault coverage, gate-level timing model • Interface: PCI, Fibre Channel, Serial. Link • Networking: Ethernet, ATM (SAR), Viterbi, RS • Compression etc: MPEG, JPEG, DAC/ADC. © 1998 R. Gupta 25

Core Examples • Only a representative sample of cores. Not exhaustive or even comparative. • Processor cores – LSI Logic CW 4001, CW 4010 – ARM (7) processors – Motorola Flex. Core • Memory cores – 16 M/18 M Rambus DRAM • Multimedia cores – Comp. Core CD 2 • Networking – Media Access Controller (MAC) • Encryption cores © 1998 R. Gupta – VLSI cores, ASIC international. 26

LSI Logic: CW 4001 Core • Behavioral Verilog/VHDL model • Gate-level timing accurate model • Specifications – 60 MHz, 60 MIPS (45 MIPS average), 3 stage pipeline – 0. 5 micron CMOS process, 4 sq. mm. , 2 m. W/MHz – Full-scan with 99% fault coverage. • Interfaces: – CBUS, Computational Bolt-On (CBO), Co-processor, MMU • Customizability: Courtesy: S. Dey, ICCAD’ 96 LSI Logic. – BIU, cache controller, MDU, MMU, DRAM/SRAM controllers, timers, caches (<16 K), RAM/ROM, DMAc – Upto 3 Co-processors (FPU, Graphics, Compression, Network Protocol), MPY/DIV unit, CRC, direct access to CPU GPRs Register File CP 0 ALU Flex. Link Shifter CBus

Using CW 4001 coprocessor Co-proc Interface CU CW 4001 CPUBus Interface Flex. Link Interface MMU DRAM Controller Cache BIU, Cache Controller DMA Controller BBus Extended BIU (XC) RAM/ROM Write Buffer Mult/Div Timer XBus Courtesy: S. Dey, ICCAD’ 96 LSI Logic. • Co-processor has its own instruction set including • read data bus for instruction, rd/wr to external mem. • read/write to CPU registers, stall and interrupt CPU • CW delivers [0: 5] and [26: 31] opc fields to Co-processor instr. decoder • Coprocessor executs in lockstep with CPU pipeline stages. © 1998 R. Gupta 28

CW 4010 CPU Core • Verilog/VHDL model with gate-level timing • 80 MHz, 160 MIPS (110 MIPS average), 6 stage pipeline • 0. 5 micron CMOS, 9 sq. mm. , 5 m. W/MHz • Integrated cache controllers with separate I and D caches – cache size from 2 -16 KB • 64 -bit memory and cache interface • Up to 3 co-processors • Full-scan with 99% fault coverage. © 1998 R. Gupta 29

Advanced RISC Machines (ARM ) • • • A family of 32 -bit RISC processor cores ARM 6, ARM 7: MPU with Cache, MMU, Write Buffer and JTAG ARM 7 TDMI : ARM 7 with Thumb ISA, ICE, Debug & MPY ARM 8 : cached, low power, 5 -stage pipe (vs 3 in others) Strong. ARM 1, Strong. ARM 2: available as Digital SA-110 (21285) • Piccolo: DSP co-processor for ARM, shares system bus (AMBA) – – – support for Viterbi, bit manipulation operations four nestable zero-overhead hardware loop constructs splittable ALU, 1 cycle dual 16 -bit operations saturation arithmetic 1024 point in place complex radix 2 FFT in 33, 331 cycles • Manufacturing partnerships and/or licensing with – Cirrus logic, GEC Plessey, Sharp, TI and VLSI Tech.

ARM Processor Cores Source: ARM Inc. • Enhancements: ARM 7 D, ARM 7 DMI M = 64 -bit result hardware multiplier running at 8 bits/cycle D = 2 boundary scan chains for basic debug I = Embedded ICE debug – Thumb instruction set © 1998 R. Gupta 31

ARM Enhancements: Embedded ICE • The Embedded. ICE core cell allows debugging of ARM core embedded with an ASIC: – real time address and data-dependent breakpoints – full access and control of the CPU – can be reduced for size savings once the part goes into production. 40 KB/s software download ASIC ICE Uses boundary scan pins ARM Core Debug Host running ARMsd © 1998 R. Gupta Embedded. ICE Cell (creates to core) Source: ARM Inc. 32

ARM Enhancements: Thumb ISA • 8 - or 16 -bit external, 32 -bit internal • Thumb instruction set is a subset of 32 -bit ARM instruction set – 16 -bit instructions – expanded into 32 -bit ARM instructions at run time without any penalty • Up to 65 -70% smaller code size compared to ARM • 130% of ARM performance with 8/16 bit memory • 85% of ARM performance with 32 -bit memory 001 10 Rd Constant maj. opc. min. opc. dest. and src. 16 -bit Thumb instr. ADD Rd #constant zero extended always 1110 001 01001 0 Rd 0000 Constant 32 -bit ARM instr.

ARM Applications • Widely used in a variety of applications – low cost 16 -bit applications » mobile phones, modems, fax machines, pagers » hard disk and CD drive controllers » engine management – low cost 32 -bit applications » smart cards » ATM and ethernet network interfaces » low power, on-chip application code – high performance 32 -bit applications » digital cameras » set top boxes, network switches, laser printers » external memory system (RAM, ROMs) Courtesy: S. Dey, ICCAD’ 96 © 1998 R. Gupta 34

Motorola Flex. Core • CPU cores based on 680 x 0 family – EC 000, EC 020, EC 030 – all with static operation, 5/3. 3 volt supplies – performance: » EC 000: 2. 7 MIPS @16. 67 MHz, 33 m. W » EC 020: 7. 4 MIPS @25 MHz, 150 m. W » EC 030: 11. 8 MIPS @33 MHz, 258 m. W • • • Serial I/O cores: 68681 UART, MBus, SPI RT clock, Dual timer cores SCSCI, Parallel I/O, 8051 interfaces DRAM, Interrupt, JTAG controllers PLA, PLL, oscillators, power management cells. © 1998 R. Gupta 35

Memory Core Example • Virtual Chips 16 M/18 M bit Rambus DRAM • Verilog/VHDL simulation model • Organization – two banks, 512 pages per bank, 72 x 256 per page – dual internal banks, 2 K byte cache per bank • Programmable ack, write, read delays through control registers • Synchronous protocol for fast block oriented xfrs. • Modes of operation – reset, stand-by, power-down, active • Deliverable: VHDL, Verilog source, test bench, test vectors, documentations. • Others: Sand DRAM, VRAM verilog models. © 1998 R. Gupta 36

Multimedia Cores MPEG input • JPEG compression, MPEG decoding, Video DAC, etc. • IBM Microelectronics, LSI logic, Palm. Chip, Silicon Engineering, Mentor Graphics, Comp. Core, Intrinsix VGA • Example: MPEG-2 decoder from Comp. Core – 70 K-80 K gates – 18 K bits of internal SRAM – 16 Mbit SDRAM (external) » bitstream buffering, frames – 54 MHz, 16 -bit external mem. bus © 1998 R. Gupta Source: Comp. Core CD 2 Decoder microc. interface virtual mem. controller synchronization Audio Decoder SRAM Video Decoder SRAM phy. mem. controller 1 Mx 16 SDRAM audio stream video str. 37

Other Core Categories Networking • Protocol choices: – switched Ether, s. TR, ATM 155, ATM 25 • Example: SYM 1000 from Symbios – HDL code, 3. 3 V, 0. 5 u – CSMA/CD ethernet – programmable interpacket gap. – Optional CRC insertion, and check – MII interface to physical layer device – Host bus interface Encryption • VLSI Cores – PKu. P encryption core » implements modular exponentiation » synthesizable HDL core – DES core as a synthesizable Verilog model » two models: 8 bytes/8 cycle, 8 bytes/16 cycles • ASIC International – DES cores – Exponentiator Engine – Hash function cores • LSI Logic: ATMizer © 1998 R. Gupta 38

What are cores? Building systems using cores ÞChallenges in using cores © 1998 R. Gupta 39

Challenges in Using Cores • A core cell is not a single product – a PCI cell consists of 25 separate Verilog files » plus as many synthesis scripts – immature interface abstraction » e. g. , there is no direct access to the core from the end product. Access must be created. • A core is not an end product – a core cell is design + know-how to use it for a particular process, tools and even application • Testability and testing is a challenge – as opposed to design, testing is not a hierarchical problem » using 90% testable cores does not give 90% system testability » tests are core-specific, not applicable from primary IO © 1998 R. Gupta What is an efficient design methodology using cores? 40

SOC Design Problem Components 2. HDL Modeling Architectural synthesis Logic synthesis Physical synthesis 1. Design environment, co-simulation constraint analysis. Interface 4. Test Issues, Test access, Isolation, ATPG DMA Interface Processor Analog I/O ASIC 3. Software synthesis, Optimization, Retargetable code gen. , Debugging & Programming environ. Memory Processor cores introduce software part of system design. © 1998 R. Gupta 41

Co-Design Components • Specification, Modeling and Analysis – How to capture designer intent efficiently in a design language? » HDL optimizations » Constraint modeling and analysis • System Validation – How to use description in building a (computational) prototype capable of running actual applications? » Co-simulation, Formal Verification • System Design and Synthesis – Delayed partitioning of hardware and software – Software synthesis and optimizations – Interface design and optimizations. © 1998 R. Gupta 9 42

System Specification: Goals & Characteristics • Main purpose: provide clear and unambiguous description of the system function, and to provide a – documentation of the initial design process • Support – diverse models of computation – allow the application of computer-aided design tools for » design space exploration » partitioning » software-hardware synthesis » validation (verification, simulation) » testing • Should not constrain the implementation options. – diverse implementation technologies. © 1998 R. Gupta 43

Embedded System Modeling • • • Reactive and time-constrained interactions Consist of structural and behavioral components. Hierarchically organized components. Synchronous and asynchronous communications. Locally or globally clocked. Idealized as Synchronous Reactive Systems. © 1998 R. Gupta 44

Synchronous Reactive Modeling • • Zero computation time System outputs produced in synchrony with inputs Instantaneous broadcast communications Deterministic behavior: – a given sequence of inputs always produces same output sequence. • Examples languages using this model – ESTEREL, LUSTURE. – More later. © 1998 R. Gupta 45

Example: Esterel • Reactive and atomicity of reactions – – “watching” implements a generalized watchdog Time as discrete “instants” Easily translated into a transducer (FSM generation) Perfect synchrony hypothesis • Instantaneous broadcast – Implicit communication architecture. – Using signals which are present or absent and may carry a value. – Pure signals do not carry a value. © 1998 R. Gupta 46

Constraint and Interface Modeling • Source of timing constraints – Time-constrained interactions between system components and environment – Specified using statement tags on HDL descriptions. • Types of constraints – Delay and interval constraints (latency-type) – Rate constraints (throughput-type) • Constraint satisfiability – Are constraints satisfied for a given implementation? – Given an implementation, resynthesize to satisfy a given set of constraints. © 1998 R. Gupta 47

Example VEHICLE CRUISE CONTROLLER DATA-RATE RUNTIME SYSTEM DISPLAY INFO Cur. Fuel Rot. Clk brake gear valve speed ave_speed consumption CALIBRATION OP-DELAY 1000/sec 1/sec ROUTINE GET INFO Inst. Vel <= 1 ms Ave. Vel maintenance Sec. Pulse ROUTINE STATE Sec. Clk CLOCK 1/sec Derived from events at system interfaces. © 1998 R. Gupta 48

Interface Modeling using Constraints • Interface described using events. • Events are instances of actions. • Most common interface action is a signal transition on a wire. • Temporal relationship between events: – Propagation delays: – Bounds on event separation intervals: min, max, linear – Absolute versus relative rate constraints. © 1998 R. Gupta 49

Binary Delay Constraints i j k MAX max i j k min © 1998 R. Gupta max i j k MIN LINEAR min 50

Interface Delay Timing Constraints • Three types: (Mc. Millan & Dill) – Given events i and j with time stamps ti and tj respectively and dij as the delay i to event j, such that lij <= dij <= uij : » min constraints: tj = mini<j (ti +dij ) » max constraints: tj = maxi<j (ti +dij ) » linear constraints: tj - ti <= sij where sij is maximum achievable separation between i and j. • Constraint graph: – nodes <=> events; edges <=> constraints. • Synthesis: find maximum achievable separation between pairs of events (minimum separation depends upon operation delays. ) • Rate constraint analysis and “debugging. ” © 1998 R. Gupta 51

Hardware Modeling As A Programming Activity • Programming languages are often used for constructing system models • Core based designs assume that all new designs originate as an “HDL” model • Hardware – concurrency in operations – I/O ports and interconnection of blocks – exact event timing is important: open computation • Software – typically sequential execution – structural information is less important – exact event timing is not important: closed computation. © 1998 R. Gupta 52

HDL Semantic Necessities • Abstraction – provide a mechanism for building larger systems by composing smaller ones • Reactive programming – provide mechansims to model non-terminating interaction with other components – watching (signal) and waiting (condition) » must be separate (else one is an implementation of the other) – exception handling • Determinism – provide a “predictable” simulation behavior • Simultaneity – model hardware parallelism, multiple clocks © 1998 R. Gupta 53

HDL Pragmatics • Data types – simple (bit/Boolean): Hardware. C, Verilog – complex (records): VHDL • Interface abstraction – provide an external view independent of implementation » Classes (packages) in C++, VHDL » Entity interfaces or Tasks: VHDL, ADA © 1998 R. Gupta 54

Pragmatics (contd. ) • Communication – shared variables using explicit communication architectures – synchronous handshaking using implicit communications (ADA task entry call) – instantaneous broadcast (Esterel) – asynchronous message passing using explicitly communication architectures • Time – global, multiple clocks, logics. © 1998 R. Gupta 55

Going from HLL to HDL (Restricted) HLL Description Add reactivity, clock(s), waiting & watching CONTROL Refine data types - bit true, fixed point - saturation arithmetic DATA HDL Description © 1998 R. Gupta 56

HLL Restrictions • Classes for synthesis target do not use – – unions, floating, pointers (only interface with lib) type casts virtual functions (restricted to only library classes) policy of use on shared variables • Suggestions: – explicit initialization blocks – use “defines” instead of conditional process enables for statically determined conditions © 1998 R. Gupta 57

Adding Reactivity • Reactivity can be added in one of three ways: 1. use annotations, comments » commonly used in “home-grown” C-based HDLs » sometime use “semantic overloads” that is association an alternative interpretations. 2. use library assists » additional library elements that can be used by the programmer in modeling hardware. » example: additional classes in C++ 3. use additional language constructs » new constructs require a specific language frontend, new debugging tools. » example: divide operations across cycles using next() © 1998 R. Gupta 58

Adding Data Types • Identify signals – storage elements, structured memory blocks • Type variables : signed, unsigned, std_logic • Size state variables on instantiation © 1998 R. Gupta 59

Language Comparisons • Verilog, VHDL: compiler produces inputs to run a DES simulator. • Esterel: compiler produces a single deterministic FSM. • Scenic: compiler produces (synthesizable) processes and a simulator. © 1998 R. Gupta 60

From HDL to Circuit/System: Compilation & Synthesis • Compilation spans programming language theory, architecture and algorithms • Synthesis spans concurrency, finite automata, switching theory and algorithms • In practice, the two tasks are inter-related. • Compilation and synthesis tasks are done in three steps: – front-end, intermediate optimizations, back-end. © 1998 R. Gupta 61

Compilation • Program compilation for software target – Front-end parsing into intermediate form – Optimization over the intermediate form – Back-end code-generation for a given processor • HDL compilation for hardware target – Front-end parsing into intermediate form – Optimization over the intermediate form – Back-end architecture, logic and physical synthesis. © 1998 R. Gupta 62

Synthesis and Optimization • Substantial growth in last twenty years • Industry-standard tools in – Logic synthesis – Physical synthesis • Behavioral synthesis just becoming commercial. • Substantial room for growth when considered together with software compilation. © 1998 R. Gupta 63

Behavioral to RTL • Basic transformations needed – 1. Operation scheduling – 2. Resource binding – 3. Control generation: central or distributed. . • Evolutionary growth to synthesis tools – Designer expertise today lies in the RTL coding – Synthesis tools are strongly dependent upon design methodology. • Generate a structure suitable for synchronous and single-phase circuits – resource performance in terms of execution delay – in number of clock cycles • Design space: – area, cycle time, latency, throughput © 1998 R. Gupta 64

Synthesis Tasks • Operation scheduling, resource binding, control generation • Scheduling determines operation start times – minimize latency • Resource binding: resource selection, allocation – minimize area (maximize sharing) • Control synthesis: – data-path = “connectivity synthesis” » detailed resource connections » steering logic » connection to the interface – control synthesis » synthesize controller that provides operations/resource enables, operation synchronization, resource arbitration © 1998 R. Gupta 65

A CAD Methodology for SW • Automated software synthesis from specs. – Synthesis tools generate implementation – Global optimization of the program. • • Optimization used to achieve design goals. Analysis and verification tools for feedback. Compilation for embeddable software Software Optimizations – – © 1998 R. Gupta Code compression Optimization for power Instruction-set generation Static memory allocation 66

Compression • Block-based compression – Program compressed in small blocks to preserve randomaccess properties (e. g. , cache line blocks) • Transparent code compression – – ISA unchanged. Compression uses compiler output. Decompression performed by cache refill engine. Processor sees only uncompressed code. Techniques: Huffman coding. • Key issue: code location in memory after compression? © 1998 R. Gupta 67

Compilation: What is New? • Machine description – in terms of architecture -> programming – in terms of organization -> hardware • Retargetable code generation has traditionally addressed the problem of compilation for an architecture. • SOCs also need input about machine organization in order to perform timing analysis on generated code – Two approaches: » describe detailed machine » extract ISA from machine organization © 1998 R. Gupta 68

Co-Design Framework Application Development EDA Code generator Algorithm(s) Machine Definition Hardware Design & Synthesis © 1998 R. Gupta Compiler Generator C code Compiler Assembly 69

Test Strategy for Firm/Hard Cores • System-level test strategy – build test sets for cores » generate functional vectors » fault grade for interconnects – prepare cores for test application from primary inputs through access/isolation, Scan/DFT – if BIST, schedule BIST application and signature analysis. • System-level DFT – goal is to reduce testing cost – increase accessability of the internal nodes » controllability: ability to establish a specific signal value at each node from primary inputs (PIs) » observability: determine signal value by controlling Pis and observing primary outputs » tradeoffs: area, I/O pins, performance, yield, TTM © 1998 R. Gupta 70

DFT Techniques • Commonly used approach is to modify a sequential circuit into a combinational one during test. – Automatic test generation is much easier for combinational circuits • Current monitoring techniques. • For sequential circuits, scan techniques are often used – link memory elements into a shift register – serially load and read out – boundary scan is commonly used to test board-level devices • Built-In Self Test – minimal external support, high fault coverage, easy access requirements, protect IP © 1998 R. Gupta 71

Test Access for Cores • Peripheral access techniques – parallel access, serial access or functional access • Parallel access – add MUXs to connect core IOs, high routing overhead, pin limitations may prevent parallel access • Serial access – most common is ring approach, during test core I/Os are connected via a scan chain, low overhead, delay penalty, easy to test user-defined logic, long test application time • Functional access – sensitize path through cores, low hardware cost, parallel test pattern translation possible. • Also need isolation mechanisms for cores. © 1998 R. Gupta 72

Summary of Part I • Core cells present a new market opportunity – core cells are breathing life into many “old” designs (6502) – a new class of “third-party vendors” who bridge the gap between design houses and EDA vendors. • Productization of cores faces many challenges – – portability of cores versus design reuse socketing standards (portability and reuse) IP protection: encryption, product versus technology design and test methodologies • Research outlook is aligned with industry expectations – all new designs start with HDL description – immediate focus on validation, testability issues – long term focus on software optimization, complexity management. © 1998 R. Gupta 73
- Slides: 73