Basic FPGA Architecture FPGA Design Flow Workshop 2003

Basic FPGA Architecture FPGA Design Flow Workshop © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

Objectives After completing this module, you will be able to: • • Recognize the basic architectural resources of the Virtex®-II FPGA List the differences between Virtex-II, Virtex-II Pro™, and Spartan®-3 Basic FPGA Architecture 2 - 3 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

Outline • • Basic FPGA Architecture 2 - 4 Overview Slice Resources I/O Resources Other Virtex-II Features Spartan-3 versus Virtex-II Pro Features Summary Appendix © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

Overview • All Xilinx FPGAs contain the same basic resources – – Slices contain combinatorial logic and register resources IOBs interface between the FPGA and the outside world Programmable interconnect Other resources • • Basic FPGA Architecture 2 - 5 Global clock buffers Boundary scan logic © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

Outline • • Basic FPGA Architecture 2 - 6 Overview Slice Resources I/O Resources Other Virtex-II Features Spartan-3 versus Virtex-II Pro Features Summary Appendix © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

Slices and CLBs • Each Virtex -II CLB contains four slices – – Local routing provides feedback between slices in the same CLB, and it provides routing to neighboring CLBs A switch matrix provides access to general routing resources COUT BUF T COUT Slice S 3 Slice S 2 Switch Matrix SHIFT Slice S 1 Slice S 0 CIN Basic FPGA Architecture 2 - 7 © 2003 Xilinx, Inc. All Rights Reserved Local Routing CIN For Academic Use Only

Simplified Slice Structure • Each slice has four outputs – – • Two registered outputs, two non-registered outputs Two BUFTs associated with each CLB, accessible by all 16 CLB outputs Slice 0 LUT Carry CLR Carry logic runs vertically, up only – Two independent carry chains per CLB PRE D Q CE LUT Carry D PRE Q CE CLR Basic FPGA Architecture 2 - 8 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

Detailed Slice Structure • The next slides will discuss the slice features – – – LUTs MUXF 5, MUXF 6, MUXF 7, MUXF 8 (only the F 5 and F 6 MUX are shown in the diagram) Carry Logic MULT_ANDs Sequential Elements Basic FPGA Architecture 2 - 9 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

Look-Up Tables • Combinatorial logic is stored in Look-Up Tables A (LUTs) B C – – • 0 0 Also called Function Generators (FGs) 0 0 Capacity is limited by number of inputs, not complexity Delay through the LUT is constant Combinatorial Logic A B C D Basic FPGA Architecture 2 - 10 Z © 2003 Xilinx, Inc. All Rights Reserved D Z 0 0 1 0 0 0 0 1 1 1 0 0 1 0 1 1 . . . 1 1 0 0 0 1 1 1 0 0 1 1 1 For Academic Use Only

Connecting Look-Up Tables Basic FPGA Architecture 2 - 11 F 6 Slice S 0 F 5 Slice S 1 F 5 F 7 Slice S 2 F 5 F 6 Slice S 3 F 5 F 8 CLB MUXF 8 combines the two MUXF 7 outputs (from the CLB above or below) MUXF 6 combines slices S 2 and S 3 MUXF 7 combines the two MUXF 6 outputs MUXF 6 combines slices S 0 and S 1 MUXF 5 combines LUTs in each slice © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

Fast Carry Logic • Simple, fast, and complete arithmetic Logic – – – Dedicated XOR gate for single-level sum completion Uses dedicated routing resources All synthesis tools can infer carry logic Basic FPGA Architecture 2 - 12 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

MULT_AND Gate • Highly efficient multiply and add implementation – – Earlier FPGA architectures require two LUTs per bit to perform the multiplication and addition The MULT_AND gate enables an area reduction by performing the multiply and the add in one LUT per bit LUT A CY_MUX S CO DI CI CY_XOR MULT_AND Ax. B LUT B Basic FPGA Architecture 2 - 13 LUT © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

Flexible Sequential Elements • • Can be flip-flops or latches Two in each slice; eight in each CLB Inputs can come from LUTs or from an independent CLB input Separate set and reset controls – • D S Control signals can be inverted locally within a slice Basic FPGA Architecture 2 - 14 © 2003 Xilinx, Inc. All Rights Reserved Q CE R FDCPE D PRE Q CE Can be synchronous or asynchronous All controls are shared within a slice – FDRSE_1 CLR LDCPE D PRE Q CE G CLR For Academic Use Only

Shift Register LUT (SRL 16 CE) • Dynamically addressable serial shift registers – – D CE CLK Maximum delay of 16 clock cycles per LUT (128 per CLB) Cascadable to other LUTs or CLBs for longer shift registers • – LUT D Q CE Dedicated connection from Q 15 to D input of the next SRL 16 CE LUT Shift register length can be changed asynchronously by toggling address A Basic FPGA Architecture 2 - 15 © 2003 Xilinx, Inc. All Rights Reserved D Q CE Q D Q CE A[3: 0] Q 15 (cascade out) For Academic Use Only

Shift Register LUT Example • The SRL can be used to create a No Operation (NOPs) – This example uses 64 LUTs (8 CLBs) to replace 576 flip-flops (72 CLBs) and associated routing and delays 12 Cycles 64 Operation A Operation B 4 Cycles 8 Cycles Operation C Operation D NOP 3 Cycles 9 Cycles 64 Paths are Statically Balanced 12 Cycles Basic FPGA Architecture 2 - 16 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

Outline • • Basic FPGA Architecture 2 - 17 Overview Slice Resources I/O Resources Other Virtex-II Features Spartan-3 versus Virtex-II Pro Features Summary Appendix © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

IOB Element • Input path – • Two DDR registers Reg DDR MUX Output path – • IOB OCK 1 Two DDR registers Two 3 -state enable DDR registers Reg ICK 1 Reg OCK 2 Separate clocks and clock enables for I and O Set and reset signals are shared 3 -state Reg ICK 2 Reg DDR MUX OCK 1 Reg OCK 2 Basic FPGA Architecture 2 - 18 Input © 2003 Xilinx, Inc. All Rights Reserved PAD Output For Academic Use Only

Select. IO Standard • Allows direct connections to external signals of varied voltages and thresholds – – • Differential signaling standards – – – • Optimizes the speed/noise tradeoff Saves having to place interface components onto your board LVDS, BLVDS, ULVDS LDT LVPECL Single-ended I/O standards – – LVTTL, LVCMOS (3. 3 V, 2. 5 V, 1. 8 V, and 1. 5 V) PCI-X at 133 MHz, PCI (3. 3 V at 33 MHz and 66 MHz) GTL, GTLP and more! Basic FPGA Architecture 2 - 19 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

Digital Controlled Impedance (DCI) • DCI provides – – • Output drivers that match the impedance of the traces On-chip termination for receivers and transmitters DCI advantages – – – Improves signal integrity by eliminating stub reflections Reduces board routing complexity and component count by eliminating external resistors Internal feedback circuit eliminates the effects of temperature, voltage, and process variations Basic FPGA Architecture 2 - 20 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

Outline • • Basic FPGA Architecture 2 - 21 Overview Slice Resources I/O Resources Other Virtex-II Features Spartan-3 versus Virtex-II Pro Features Summary Appendix © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

Other Virtex-II Features • Distributed RAM and block RAM – – • • Distributed RAMs use the CLB resources (1 LUT = 16 RAM bits) Block RAMs are dedicated resources on the device (18 k bit blocks) Dedicated 18 x 18 multipliers next to block RAMs Clock management resources – – Sixteen dedicated global clock multiplexers Digital Clock Managers (DCMs) Basic FPGA Architecture 2 - 22 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

Distributed Select. RAM Resources • • • Uses a LUT in a slice as memory Synchronous write Asynchronous read – • RAM and ROM are initialized during configuration – • Accompanying flip-flops can be used to create synchronous read Data can be written to RAM after configuration LUT Slice LUT RAM 16 X 1 S D WE WCLK A 0 A 1 A 2 A 3 O RAM 32 X 1 S D WE WCLK A 0 O A 1 A 2 A 3 A 4 LUT Emulated dual-port RAM – – One read/write port One read-only port Basic FPGA Architecture 2 - 23 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only RAM 16 X 1 D D WE WCLK A 0 SPO A 1 A 2 A 3 DPRA 0 DPO DPRA 1 DPRA 2 DPRA 3

Block Select. RAM Resources • Up to 3. 5 Mb of RAM in 18 -kb blocks – • True dual-port memory – – • • • Synchronous read and write Each port has synchronous read and write capability Different clocks for each port Supports initial values Synchronous reset on output latches Supports parity bits – One parity bit per eight data bits Basic FPGA Architecture 2 - 24 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

Dedicated Multiplier Blocks • • • 18 -bit twos complement signed operation Optimized to implement multiply and accumulate functions Multipliers are physically located next to block Select. RAM™ memory Data_A (18 bits) 4 x 4 signed 18 x 18 Multiplier 8 x 8 signed Output (36 bits) 18 x 18 signed Data_B (18 bits) Basic FPGA Architecture 2 - 25 12 x 12 signed © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

Global Clock Routing Resources • Sixteen dedicated global clock multiplexers – – • Global clock multiplexers provide: – – – • Eight on the top-center of the die, eight on the bottom-center Can be driven by a clock input pad, a Digital Clock Manager (DCM), or local routing Global clock enable capability (BUFGCE) Glitch-free switching between clock signals (BUFGMUX) Traditional clock buffer (BUFG) function Up to eight clock nets can be used in each quadrant of the device Basic FPGA Architecture 2 - 26 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

Digital Clock Manager (DCM) • Up to twelve DCMs per device – – • DCMs provide: – – – • Located on the top and bottom edges of the die Driven by clock input pads Delay-Locked Loop (DLL) Digital Frequency Synthesizer (DFS) Digital Phase Shifter (DPS) Up to four outputs of each DCM can drive onto global clock buffers – All DCM outputs can drive general routing Basic FPGA Architecture 2 - 27 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

Outline • • Basic FPGA Architecture 2 - 28 Overview CLB Resources I/O Resources Other Virtex-II Features Spartan-3 versus Virtex-II Pro Features Summary Appendix © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

Spartan-3 versus Virtex-II • • Lower cost Smaller process = lower core voltage – – • . 09 micron versus. 15 micron Vccint = 1. 2 V versus 1. 5 V • • • Different I/O standard support – – New standards: 1. 2 V LVCMOS, 1. 8 V HSTL and SSTL Default is LVCMOS, versus LVTTL Basic FPGA Architecture 2 - 29 More I/O pins per package Only half of the slices support RAM or SRL 16 s (SLICEM) Fewer block RAMs and multiplier blocks – • • • Same size and functionality 8 global clock multiplexers 2 or 4 DCM blocks No internal 3 -state buffers – © 2003 Xilinx, Inc. All Rights Reserved 3 -state buffers are in the I/O For Academic Use Only

SLICEM and SLICEL • – • Left-Hand SLICEM Right-Hand SLICEL Each Spartan™-3 CLB contains four slices COUT Similar to Virtex™-II Slice X 1 Y 1 Slices are grouped in pairs – – Slice X 1 Y 0 Left-hand SLICEM (Memory) • Switch Matrix LUTs can be configured as memory or SRL 16 LUT can be used as logic only Basic FPGA Architecture 2 - 30 SHIFTIN Slice X 0 Y 1 Fast Connect Slice X 0 Y 0 Right-hand SLICEL (Logic) • COUT © 2003 Xilinx, Inc. All Rights Reserved SHIFTOUT CIN For Academic Use Only

Outline • • Basic FPGA Architecture 2 - 31 Overview CLB Resources I/O Resources Other Virtex-II Features Spartan-3 versus Virtex-II Pro Features Summary Appendix © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

Xilinx: Editor: Check if Core. Connect is an IBM trademark • • 0. 13 micron process Up to 24 Rocket. IO™ Multi-Gigabit Transceiver (MGT) blocks – – • Virtex-II Pro Features Serializer and deserializer (SERDES) Fibre Channel, Gigabit Ethernet, XAUI, Infiniband compliant transceivers, …and others 8 -, 16 -, and 32 -bit selectable FPGA interface 8 B/10 B encoder and decoder Up to four Power. PC RISC processor blocks – – – Thirty-two 32 -bit General Purpose Registers (GPRs) Low power consumption: 0. 9 m. W/MHz IBM Core. Connect bus architecture support Basic FPGA Architecture 2 - 32 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

Outline • • Basic FPGA Architecture 2 - 33 Overview CLB Resources I/O Resources Other Virtex-II Features Spartan-3 versus Virtex-II Pro Features Summary Appendix © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

Skills Check Basic FPGA Architecture 2 - 34 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

Review Questions • List the primary slice features • List the three ways a LUT can be configured Basic FPGA Architecture 2 - 35 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

Answers • List the primary slice features – – – • Look-up tables and function generators (two per slice, eight per CLB) Registers (two per slice, eight per CLB) Dedicated multiplexers (MUXF 5, MUXF 6, MUXF 7, MUXF 8) Carry logic MULT_AND gate List the three ways a LUT can be configured – – – Combinatorial logic Shift register (SRL 16 CE) Distributed memory Basic FPGA Architecture 2 - 36 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

Summary • Slices contain LUTs, registers, and carry logic – – • • • LUTs are connected with dedicated multiplexers and carry logic LUTs can be configured as shift registers or memory IOBs contain DDR registers Select. IO™ standards and DCI enable direct connection to multiple I/O standards while reducing component count Virtex™-II memory resources include: – – Distributed Select. RAM™ resources and distributed Select. ROM (uses CLB LUTs) 18 -kb block Select. RAM resources Basic FPGA Architecture 2 - 37 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

Summary • • Virtex™-II contains dedicated 18 x 18 multipliers next to each block Select. RAM™ resource Digital Clock Managers provide: – – – Delay-Locked Loop (DLL) Digital Frequency Synthesizer (DFS) Digital Phase Shifter (DPS) Basic FPGA Architecture 2 - 38 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

Where Can I Learn More? • User Guides – • http: //support. xilinx. com Documentation Application Notes – http: //support. xilinx. com Documentation App Notes Basic FPGA Architecture 2 - 39 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

Outline • • Basic FPGA Architecture 2 - 40 Overview CLB Resources I/O Resources Other Virtex-II Features Spartan-3 versus Virtex-II Pro Features Summary Appendix © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

Virtex-II Architecture I/O Blocks (IOBs) Block Select. RAM™ resource Programmable interconnect Dedicated multipliers Configurable Logic Blocks (CLBs) • Virtex™-II architecture’s core voltage operates at 1. 5 V Basic FPGA Architecture 2 - 41 Clock Management (DCMs, BUFGMUXes) © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

Double Data Rate Registers • DDR registers can be clocked by – – Clock and NOT(Clock) if the duty cycle is 50/50 The outputs CLK 0 and CLK 180 of a DCM D 1 Clock Reg OCK 1 DDR mux OBUF PAD D 2 Reg OCK 2 • FDDR If D 1 = “ 1” and D 2 = “ 0”, the output is a copy of Clock – Use this technique to generate a clock output that is synchronized to DDR output data Basic FPGA Architecture 2 - 42 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only

Dual-Port Block RAM Configurations • • Configurations Configuratio available on n each port 16 k x 1 Depth Data Bits Parity Bits 16 kb 1 0 8 k x 2 8 kb 2 0 4 k x 4 4 kb 4 0 2 k x 9 2 kb 8 1 1 k x 18 1 kb 16 2 512 x 36 512 32 4 Independent configurations on ports A and B – IN 8 -bit Supports data width conversion, including parity bits Basic FPGA Architecture 2 - 43 © 2003 Xilinx, Inc. All Rights Reserved Port A: 8 -b Port B: 32 b OUT 32 -bit For Academic Use Only

Clock Buffer Configurations • Clock Buffer (BUFG) – • Low-skew clock distribution I O BUFG Clock Enable Buffer (BUFGCE) – – – Holds the clock output low when CE is inactive CE can be active-High or active. Low Changes in CE are only recognized when the clock input is low to avoid glitches and short clock pulses Basic FPGA Architecture 2 - 44 © 2003 Xilinx, Inc. All Rights Reserved I O BUFGCE CE For Academic Use Only

• Clock Multiplexer (BUFGMUX) – – – I 0 Switches glitch-free from one clock to another After a change on S, the BUFGMUX waits for the currently selected clock input to go Low The output is held Low until the newly selected clock goes Low, then switches Basic FPGA Architecture 2 - 45 I 1 BUFGMUX Clock Buffer Configurations S S Wait for low I 0 I 1 Switch O © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only O
- Slides: 44