Xilinx FPGA Architecture Overview VirtexSpartanII Toplevel Architecture w

  • Slides: 41
Download presentation
Xilinx FPGA Architecture Overview ®

Xilinx FPGA Architecture Overview ®

Virtex/Spartan-II Top-level Architecture w Gate-array like architecture w Configurable logic blocks — Implement logic

Virtex/Spartan-II Top-level Architecture w Gate-array like architecture w Configurable logic blocks — Implement logic here! w I/O blocks — 16 signal standards w Block RAM — On-chip memory for higher performance w Clocks & Delay-Locked Loop w Interconnect resources — Three-state internal buses ® www. xilinx. com

Logic Cell Capacity w A better first-order alternative to gate counting w Better comparisons

Logic Cell Capacity w A better first-order alternative to gate counting w Better comparisons among different FPGAs w Logic cell definition: — 4 -input look-up table + dedicated flip-flop w Logic cells per CLB: — Xc 4000/Spartan 2. 375 (2 4 -LUTs, 1 3 -LUT, 2 FFs) — Virtex/Spartan-IIwww. xilinx. com 4. 5 (4 4 -LUTs, 1 ®

Configurable Logic Block (CLB) w Combinational logic generated in a lookup table (LUT) —

Configurable Logic Block (CLB) w Combinational logic generated in a lookup table (LUT) — Any function of available inputs w LUT output feeds CLB output or D input of flip-flop Inputs Combinational Logic Function (LUT) Flip. Flop Outputs ® www. xilinx. com

Virtex/Spartan-II Function Generators w Four 4 -input function generators — Independent inputs (4 functions

Virtex/Spartan-II Function Generators w Four 4 -input function generators — Independent inputs (4 functions of 4 inputs) w MUXF 5 combines 2 LUTs to form — 4 x 1 multiplexer — Or any 5 -input function Slice MUXF 6 LUT MUXF 5 Slice LUT MUXF 5 w MUXF 6 combines 2 slices to form — 8 x 1 multiplexer CLB www. xilinx. com ®

Lookup Table w Generates any function of its inputs — Typically 4 inputs w

Lookup Table w Generates any function of its inputs — Typically 4 inputs w Logically equivalent to a 16 x 1 ROM Inputs Output 0000 0001 0010 0011 0 1 1 0 LUT ® www. xilinx. com

Targeting LUT-based Logic w LUT limit is on inputs, not complexity — Reducing inputs/function

Targeting LUT-based Logic w LUT limit is on inputs, not complexity — Reducing inputs/function (fanin) to fit CLBs improves density and speed — Automatically done by Xilinx synthesis and implementation tools CLB Lookup Table w Inverters are free ® www. xilinx. com

Duplicating Logic Can Improve Results w Collapsing of logic into CLBs affects number of

Duplicating Logic Can Improve Results w Collapsing of logic into CLBs affects number of levels required and therefore speed w The gates you use will determine mapping O 1 — Nets with a fanout >1 may be. N 1 Aoutside a CLB O 1 I 1 N 1 B N 1 must go to two places, so O 1 may require a second level of logic Duplicating first gate allows N 1 A to always be collapsed inside a single lookup table www. xilinx. com ®

Defining Lookup Tables With Gate Primitives w Example of gate primitive AND 2 w

Defining Lookup Tables With Gate Primitives w Example of gate primitive AND 2 w Up to five inputs with all combinations of inversion — AND 2 B 1 indicates 1 “bubbled” or inverted input w Up to nine inputs non-inverted — Add external INV primitives if desired ® www. xilinx. com

Flip-Flops w Stores data (D) on rising edge of clock (K) — Clock enable

Flip-Flops w Stores data (D) on rising edge of clock (K) — Clock enable (CE) — Asynchronous clear (C) K X 0 CE x 1 x C 1 0 0 D x d x Q 0 d q D Q CE K C ® www. xilinx. com

Additional Flip-Flop Controls w Reset (Clear) and/or Set w Global initialization (GSR) — Use

Additional Flip-Flop Controls w Reset (Clear) and/or Set w Global initialization (GSR) — Use to initialize all flip-flops w Programmable clock polarity w Clock enable can be left unconnected ® www. xilinx. com

Virtex/Spartan-II CLB Slice w 1 CLB holds 2 slices w Each slice has two

Virtex/Spartan-II CLB Slice w 1 CLB holds 2 slices w Each slice has two sets of — Four-input LUT – Any 4 -input logic function – Or 16 -bit x 1 RAM – Or 16 -bit shift register — Carry & Control – Fast arithmetic logic – Multiplier logic – Multiplexer logic — Storage element – Latch or flip-flop – Set and reset ® www. xilinx. com

Dedicated Multiplier Logic w Highly efficient ‘Shift & Add’ implementation — For a 16

Dedicated Multiplier Logic w Highly efficient ‘Shift & Add’ implementation — For a 16 x 16 multiplier – 30% reduction in area – 1 less logic level ® www. xilinx. com

On-chip RAM w All Xilinx FPGAs use RAM-based programming w Adding Write Enable to

On-chip RAM w All Xilinx FPGAs use RAM-based programming w Adding Write Enable to LUT creates onchip Select. RAM memory ® www. xilinx. com

Select. RAM Benefits w Single-Port Data — Synchronous — Simple timing Write Enable Write

Select. RAM Benefits w Single-Port Data — Synchronous — Simple timing Write Enable Write Clock Output Address w Dual-Port Data Write Enable Write Clock Write Address/ Single-Port Read Address Dual-Port Read Address Single-Port Output Dual-Port Output ® www. xilinx. com

Memory Bandwidth and Flexibility w Virtex/Spartan-II On-Chip Select. RAM+ Large FIFOs Memory Packet Buffers

Memory Bandwidth and Flexibility w Virtex/Spartan-II On-Chip Select. RAM+ Large FIFOs Memory Packet Buffers DSP Coefficients Small FIFOs Shallow/Wide 16 x 1 Distributed RAM bytes Video Line Buffers Cache Tag Memory Deep/Wide 4 Kx 1 2 Kx 2 1 Kx 4 512 x 8 256 x 16 SDRAM ZBTRAM SSRAM SGRAM Block RAM External RAM kilobytes megabytes 200 MHz Memory Continuum ® www. xilinx. com

Spartan-II Memory w CLB LUTs provide small distributed RAM (16 bits/LUT) w Block RAM

Spartan-II Memory w CLB LUTs provide small distributed RAM (16 bits/LUT) w Block RAM provides 4 K bits each — Dual read/write port. Each port has… – Independent Clock, R/W, and Enable – Independently configurable data width from 4 K x 1 to 256 x 16 W R R Port B Port A Spartan-II Dual-R/W Port Block RAM W W W R R ® www. xilinx. com

I/O Block (IOB) w Periphery of identical I/O blocks — Input, output, or bidirectional

I/O Block (IOB) w Periphery of identical I/O blocks — Input, output, or bidirectional — Direct or registered (or latched input) — Pullup/Pulldown I — Programmable slew rate O — Three-state output TS — Programmable Clocks thresholds IOB Pad Bonded to Package Pin ® www. xilinx. com

Use Special IOB Primitives w User explicitly defines what resources in the IOB are

Use Special IOB Primitives w User explicitly defines what resources in the IOB are to be used w I/Os are defined with — 1 pad primitive — At least 1 function primitive – 1 input element, 1 output element or both – Inverters may also be pulled into IOBs IPAD IBUF ® www. xilinx. com

Locking Down I/O Locations w LOC=Pxx attribute defines I/O pad location(s) w Avoid locking

Locking Down I/O Locations w LOC=Pxx attribute defines I/O pad location(s) w Avoid locking IOBs early — Makes routing more difficult w Use IOB LOC= to lock pins late in design cycle once PCB is built — Can lock IOBs if floorplanning the connected CLBs ® www. xilinx. com

Use Pullups/Pulldowns w Pullup automatically connected on unused IOBs w User can specify PULLUP

Use Pullups/Pulldowns w Pullup automatically connected on unused IOBs w User can specify PULLUP or PULLDOWN primitive on used IOBs w Inputs should not be left floating — Add Pullup to design inputs that may be left floating to reduce power and noise IPAD IBUF ® www. xilinx. com

Faster Setup With NODELAY w Delay included by default — Compensates for clock routing

Faster Setup With NODELAY w Delay included by default — Compensates for clock routing delay to prevent hold time w NODELAY attribute removes delay element IOB Externaltime External Clock —Example Creates hold Data Q D Delay Input Buffer Routed Clock Pa d External Data Delay Data X X External Clock Routing Delay Pad ® www. xilinx. com

Slew Rate Control w Slew rate controls output speed w Default slow slew rate

Slew Rate Control w Slew rate controls output speed w Default slow slew rate reduces noise & ground bounce w Use fast slew rate wherever speed is important — FAST parameter on output logic primitive FAST OPAD OBUF ® www. xilinx. com

Output Three-State Control w Free inverter on output buffer control — Use OBUFE macro

Output Three-State Control w Free inverter on output buffer control — Use OBUFE macro for active-high enable — Use OBUFT primitive for active-low enable OBUFE OE T OBUFT OE T ® www. xilinx. com

Global Three-State w 3 -state control either local and/or via a dedicated global net

Global Three-State w 3 -state control either local and/or via a dedicated global net — Global three-state controlled by STARTUP. . . primitive STARTUP GTS GSR ® www. xilinx. com

Virtex/Spartan-II I/O Block (Simplified) ® www. xilinx. com

Virtex/Spartan-II I/O Block (Simplified) ® www. xilinx. com

Multiple I/O Interface Standards w 16 to 20 I/O interface standards supported w CMOS,

Multiple I/O Interface Standards w 16 to 20 I/O interface standards supported w CMOS, HSTL, SSTL, GTL, CTT, PCI w As many as eight banks on a device — Package dependent w Different banks can support different standards at the same time — Logic level translation — Boards with mixed standards www. xilinx. com ®

High Performance Routing w Hierarchical Routing 2 ns — Singles, Hexes, Longs w Sparse

High Performance Routing w Hierarchical Routing 2 ns — Singles, Hexes, Longs w Sparse connections on longer interconnects for high speed 2 ns s 2 n w Routing delay depends primarily on distance 2 ns CLB Array — Direction independent ® www. xilinx. com

Flexible General-Purpose Interconnect w Flexible but slow if crosses many channels — Programmable switch

Flexible General-Purpose Interconnect w Flexible but slow if crosses many channels — Programmable switch matrix at each channel crossing — Connects across, changes direction or fans out ® www. xilinx. com

Switch Matrix w Bidirectional pass transistors w High routing flexibility ® www. xilinx. com

Switch Matrix w Bidirectional pass transistors w High routing flexibility ® www. xilinx. com

Reduce Fanout w Higher fanout nets (>16 loads) are harder to route & slower

Reduce Fanout w Higher fanout nets (>16 loads) are harder to route & slower w Consider duplicating source in schematic to improve routing or speed fn 1 D Q ® www. xilinx. com

Long Lines for High Fanout Nets w Metal lines that traverse length & width

Long Lines for High Fanout Nets w Metal lines that traverse length & width of chip CLB CLB w Lowest skew w Ideal for high fan-out signals w Ideal for clocking w Requires vertical or horizontal alignment of loads ® www. xilinx. com

Internal Three-State Buses w Two 3 -state drivers per CLB w OR-AND logic implementation

Internal Three-State Buses w Two 3 -state drivers per CLB w OR-AND logic implementation in place of 3 state drivers — With no drivers enabled, bus is a logic 1 w Low power — No danger of contention when multiple BUFTs enabled — No physical pullups or large capacitance to drive ® www. xilinx. com

General Clock Support w Use clock buffers for highest fanout clocks — Drive high-speed

General Clock Support w Use clock buffers for highest fanout clocks — Drive high-speed long line resources – Lowest skew across a device – No internal hold times — Use generic BUFG primitive – Allows software to choose best type of buffer – Allows easy migration across families w Four dedicated global low skew buffers — Dedicated input pin (clock distribution only) w Additional shared resources (i. e. , long lines) — Distribute low-skew/high-fanout signals (10 ns max. ) w Four delay-locked loops on each device — All-digital implementation — Two global buffers associated with each DLL pair ® www. xilinx. com

Configuration w Schematic or HDL description is converted to a configuration file by the

Configuration w Schematic or HDL description is converted to a configuration file by the Xilinx development system w Configuration file is loaded into FPGA on power-up — Stored in configuration latches — Controls CLBs, IOBs, interconnect, etceteras ® www. xilinx. com

Configuration Bitstream w Binary programming file w Length depends only on device, not utilization

Configuration Bitstream w Binary programming file w Length depends only on device, not utilization — Typically 1 ms per bit (total from a few ms to <1 s) w FPGA can load its configuration automatically on power-up, or under microprocessor control w Can be loaded directly into device/configuration PROM www. xilinx. com ®

Configuration Modes w Bit-serial configuration — Simple, uses few device pins — Controlled by

Configuration Modes w Bit-serial configuration — Simple, uses few device pins — Controlled by FPGA (Master) or externally (Slave) — Xilinx serial proms available w Byte-parallel configuration — Can drive PROM addresses (Master) — Can be microprocessor-controlled ® www. xilinx. com

Configuration Pins w Configuration starts on power-up w Mode pin(s) checked to determine method

Configuration Pins w Configuration starts on power-up w Mode pin(s) checked to determine method — Usable as extra I/O after configuration w All I/O not used for configuration are disabled w Reconfiguration possible by pulling PROGRAM pin low ® www. xilinx. com

Readback w Configuration data can be read back serially — Allows verification of programming

Readback w Configuration data can be read back serially — Allows verification of programming w Readback data can include user-register values — Allows in-circuit functional verification CLK DATA — Requires READBACK. . . symbol TRIG READBACK RIP ® www. xilinx. com

Boundary Scan w IEEE 1149. 1 -compatible boundary scan (JTAG) w Available before configuration

Boundary Scan w IEEE 1149. 1 -compatible boundary scan (JTAG) w Available before configuration w Configuration & readback possible via boundary scan logic ® www. xilinx. com

Power Consumption w CMOS SRAM technology provides low standby power w Operating power is

Power Consumption w CMOS SRAM technology provides low standby power w Operating power is mostly dynamic — Proportional to transition frequency of internal nodes — Xilinx segmented interconnect minimizes amount of metal capacitance to switch, minimizing power ® www. xilinx. com