Xilinx FPGA Architecture w Gatearraylike architecture w Programmable

Xilinx FPGA Architecture w Gate-array-like architecture w Programmable logic, I/O & interconnect Programmable Interconnect I/O Blocks (IOBs) Configurable Logic Blocks (CLBs)

Logic Cell Capacity w A better first-order alternative to gate counting w Better comparisons among different FPGAs w Logic cell definition: l 4 -input look-up table + dedicated flip-flop w Logic cells per CLB: l l XC 4000 Spartan Virtex XC 5200 2. 375 (2 4 -LUTs, 1 3 -LUT, 2 FFs) 4. 5 (4 4 -LUTs, 1 F 5 MUX, 4 FFs) 4 (4 4 -LUTs, 4 FFs)

Configurable Logic Block (CLB) w Combinational logic generated in a lookup table (LUT) l Any function of available inputs w LUT output feeds CLB output or D input of flipflop Inputs Combinational Logic Function (LUT) Flip. Flop Outputs

XC 4000/Spartan Series Function Generators w Two 4 -input function generators l Independent inputs (2 functions of 4 inputs) w One 3 -input function generator l l Independent inputs Combines 4 -input functions make any 5 -input function & some 9 -input functions to F H G

Lookup Table w Generates any function of its inputs l Typically 4 inputs w Logically equivalent to a 16 x 1 ROM Inputs 0000 0001 0010 0011 Output 0 1 1 0 LUT

Targetting LUT-based Logic w LUT limit is on inputs, not complexity l l Reducing inputs/function (fan-in) to fit CLBs improves density and speed Automatically done by Xilinx synthesis and implementation tools w Inverters are free CLB Lookup Table

Duplicating Logic Can Improve Results w Collapsing of logic into CLBs affects number of levels required and therefore speed w The gates you use will determine mapping l Nets with a fanout >1 may be outside a CLB O 1 I 1 N 1 A N 1 B N 1 must go to two places, so O 1 may require a second level of logic Duplicating first gate allows N 1 A to always be collapsed inside a single lookup table

Defining Lookup Tables with Gate Primitives w Example of gate primitive AND 2 w Up to five inputs with all combinations of inversion l AND 2 B 1 indicates 1 “bubbled” or inverted input w Up to nine inputs non-inverted l Add external INV primitives if desired

Flip-Flops w Stores data (D) on rising edge of clock (K) l l K X 0 Clock Enable (CE) Asynchronous Clear (C) CE X 1 X C 1 0 0 D X Q 0 D Q CE K C

Additional Flip-Flop Controls w Reset (Clear) or Set w Global initialization (GSR) w Programmable clock polarity w Clock enable can be left unconnected

Use Global Reset w All flip-flops initialized on configuration and global net w Source of global net specified via STARTUP component GSR Q 2 GTS Q 3 STARTUP Q 1 Q 4 CLK Done. In

Direct Input w Direct Input bypasses LUT and goes directly to flip-flop w Provides higher speed if no logic is required l Frees LUT for other functions DIN D LUT Q

On-Chip RAM w All Xilinx FPGAs use RAM-based programming w Adding Write Enable to LUT creates on-chip Select. RAM memory

Select. RAM Benefits w Asynchronous l Compatible with original XC 4000 w Synchronous l Data Output Address Data Simpler timing w Dual-Port Write Enable Write Clock Output Address Data Write Enable Write Clock Write Address/ Single-Port Read Address Dual-Port Read Address Single-Port Output Dual-Port Output

Write Clock w Same clock as for flip-flops w Programmable polarity l Independent of flip-flop polarity w Self-timed write l l Latches data, write enable, address on edge Generates write pulse w No effect on read operation

Supported RAM Modes w Per CLB: 16 x 1 16 x 2 32 x 1 Edge. Triggered Timing Level. Sensitive Timing Single. Port X X X Dual. Port X X

I/O Block (IOB) w Periphery of identical I/O blocks l l l Input, output, or bidirectional Direct or registered (or latched input) Pullup/pulldown Programmable slew rate Three-state output Programmable thresholds I O TS Clocks IOB Pad Bonded to Package Pin

Use Special IOB Primitives w User explicitly defines what resources in the IOB are to be used w I/Os are defined with l l 1 pad primitive At least 1 function primitive – – 1 input element, 1 output element or both Inverters may also be pulled into IOBs IPAD IBUF

Locking Down I/O Locations w LOC=Pxx attribute defines I/O pad location(s) w Avoid locking IOBs early l Makes routing more difficult w Use IOB LOC= to lock pins late in design cycle once PCB is built l Can lock IOBs if floorplanning the connected CLBs

Use Pullups/Pulldowns w Pullup automatically connected on unused IOBs w User can specify PULLUP or PULLDOWN primitive on used IOBs w Inputs should not be left floating l Add pullup to design inputs that may be left floating to reduce power and noise IPAD IBUF

Faster Setup with NODELAY w Delay included by default l Compensates for clock routing delay to prevent hold time w NODELAY attribute removes delay element l Creates hold time Example IOB Q D External Data Pad Delay External Clock Routed Clock External Data Input Buffer Delay Data External Clock Routing Delay Pad X X

Slew Rate Control w Slew rate controls output speed w Default slow slew rate reduces noise & ground bounce w Use fast slew rate wherever speed is important l FAST parameter on output logic primitive FAST OPAD OBUF

Output Three-State Control w Free inverter on output buffer control l l Use OBUFE macro for active-high enable Use OBUFT primitive for active-low enable OBUFE OE T OBUFT OE T

Global Three-State w 3 -state control either local and/or via a dedicated global net l Global three-state controlled by STARTUP primitive STARTUP GTS GSR

I/O Thresholds w 5 V devices have globally selectable TTL or CMOS I/O thresholds w Inputs and outputs separately controllable l Default is TTL w 3 V devices can interface to 5 V or 3 V logic w 2. 5 V Virtex devices have programmable interfaces

Programmable Interconnect w Resources to create arbitrary interconnection networks w Various types of interconnect l Flexible general-purpose interconnect Low-skew long lines Long Lines l w Internal three-state buffers CLB Switch Matrix General Purpose CLB

Interconnect w Single-length, double-length, and long lines w Clock buffers and dedicated long lines w Global set/reset and global three-state

Fast Direct Interconnect w Direct connections from CLB to adjacent CLB or IOB w Fastest interconnect l < 1 ns delay w Carry logic uses direct interconnect

Flexible General-Purpose Interconnect w Flexible but slow if crosses many channels l l Programmable switch matrix at each channel crossing Connects across, changes direction or fans out w Single-Length lines w Double-Length lines skip every other switch matrix

Switch Matrix w Bidirectional pass transistors w High routing flexibility

Reduce Fanout w Higher fanout nets (>16 loads) are harder to route & slower w Consider duplicating source in schematic to improve routing or speed fn 1 D Q

Long Lines for High Fanout Nets w Metal lines that traverse length & width of chip w Lowest skew w Ideal for high fan-out signals CLB CLB w Ideal for clocking w Requires vertical or horizontal alignment of loads

Advantages of Vertical Orientation w Bidirectional data bus lines run horizontally l Enable lines run vertically w Large registered functions align vertically l l l Clock lines run vertically Most non-clock, non-BUFT long lines run vertically Carry logic runs vertically DQ DQ

Use Global Clock Buffers w Use clock buffers for highest fanout clocks l Drive high-speed long line resources – – l <2 ns skew across a device No internal hold times Use generic BUFG primitive – – Allows software to choose best type of buffer Allows easy migration across families

Using a Clock Generated Off-Chip w Connect IPAD directly to clock buffer primitive l l Required for BUFGP Place & route uses special fast input pin w Provides higher speed and uses fewer routing resources D IPAD BUFG

Internal Oscillator w Oscillator used to generate configuration clock can be used after configuration as part of design w +/- 50% frequency range w Can be divided down to desired frequency range

Use BUFT for Buses w BUFT references internal three-state buffers w Use to multiplex signals onto long routing lines to use as buses w Multiplexer macros use lookup tables (M 4_1 E, etc. ) _ENABLE_A _ENABLE_B A 3 B 3 A 2 B 2 A 1 B 1 A 0 BUS<3> BUS<2> BUS<1> BUS<0> BUFT

BUFT Output Never Floats w Cross-coupled inverters remember last value to insure that line never floats l l Valid signal is always read from output of BUFT No need to reference “keeper” circuit

Special Resources w Arithmetic/counter carry logic w Wide decode or cascade functions w Configuration w Boundary scan (JTAG)

Carry Logic w Use carry logic in CLBs to increase arithmetic speed w High density via serial implementation of carry w Carry propagates in upward direction w Use library’s carry-based macros (RPMs) or Logi. BLOX synthesis CLB c a r r y

Wide Decoders w Decoder is effectively a dedicated wired-AND w 4 decoder lines per edge w Direct inputs from all IOBs on an edge l l Half as many general inputs Useful for address decoding

Using Wide Decoders w Use DECODEx macro l Diamond indicates open-drain – – Can tie multiple outputs together Must use a PULLUP primitive DECODE 8 A 0 A 1 A 2 A 3 A 4 A 5 A 6 A 7 O

Wide Wired-AND Using Three. State Buffers WAND 8 Use WANDx symbol I 1 I 2 I 3 I 4 I 5 I 6 I 7 I 8 O Underlying implementation (Horizontal Long Line) BUFT A BUFT B BUFT H

Configuration w Schematic or HDL description is converted to a configuration file by the Xilinx development system w Configuration file is loaded into FPGA on powerup l l Stored in configuration latches Controls CLBs, IOBs, interconnect, etc.

Configuration Bitstream w Binary programming file w Length depends only on device, not utilization l Typically 1 ms per bit (total from a few ms to <1 s) w FPGA can load its configuration automatically on power-up, or under microprocessor control w Can be loaded directly into device/configuration PROM

Configuration Modes w Bit-serial configuration l l l Simple, uses few device pins Controlled by FPGA (Master) or externally (Slave) Xilinx Serial PROMs available w Byte-parallel configuration l l Can drive PROM addresses (Master) Can be microprocessor-controlled (Peripheral)

Configuration Pins w Configuration starts on power-up w Mode pin(s) checked to determine method l Usable as extra I/O after configuration w All I/O not used for configuration are disabled w Reconfiguration possible by pulling PROGRAM pin Low l No partial configuration

Readback w Configuration data can be read back serially l Allows verification of programming w Readback data can include user-register values l l Allows in-circuit functional verification Requires READBACK symbol CLK TRIG DATA READBACK RIP

Boundary Scan w IEEE 1149. 1 -compatible boundary scan (JTAG) w Available before configuration w Configuration & readback possible via boundary scan logic

Power Consumption w CMOS SRAM technology provides low standby power w Operating power is mostly dynamic l l Proportional to transition frequency of internal nodes Xilinx segmented interconnect minimizes amount of metal capacitance to switch, minimizing power
- Slides: 50