Programmable Logic Training Course FPGA Combinatorial Logic Xilinx
Programmable Logic Training Course FPGA Combinatorial Logic
Xilinx FPGA Architecture I/O Blocks (IOBs) Programmable Interconnect Configurable Logic Blocks (CLBs)
Configurable Logic Blocks • The Configurable Logic Block is the work horse of the FPGA • Each CLB contains Look-Up tables and Flip-Flops • Combinatorial Logic is implemented in Look-Up Tables (LUTs) or Function Generators • Each CLB is associated with two Three-State Buffers – Buffers are just outside CLB
Look-Up Table Review • Combinatorial Logic is stored in Look-Up Tables (LUTs) – Look-Up Tables are also known as Function Generators • Example: Look-Up Table A B C D Combinatorial Logic A B Z C D 0 0 0 0 0 1 1 0 0 0 1 0 1 Z 0 0 0 1 1 1 . . . • Capacity is limited by number of inputs, not complexity 1 1 1 1 0 0 1 1 0 1 0 0 0 1
9 input AND Example • How is a 9 - input AND gate implemented in a CLB? – Three stages shown below explain the mapping process o CLB FFX O FFY
4 -to-1 MUX Example (1) • Consider implementation of a 4 -to-1 MUX with Enable: – Three Function Generators or LUTs are used D 0 D 2 D 0 D 1 O D 2 D 3 E S 1 D 1 S 0 S 1 E O S 0 D 3 S 0 A. 4: 1 MUX B. 4: 1 MUX, designed to fit into one CLB
4 -to-1 MUX Example (2) E D 0 D 2 S 0 CLB FFX O D 1 D 3 S 0 E FFY S 1 C. The 4: 1 MUX is implemented in one CLB. Question: Can the 8: 1 MUX be implemented in one CLB?
Three-State Buffers • Each CLB is associated with two Three-State Buffers (BUFT) – BUFTs are used independently of LUTs and Flip-Flops • Three-State library components: – Three-State Buffers: BUFT, BUFT 4, BUFT 8, BUFT 16 – Wired-AND (open Drain) : WAND 1, WAND 4, WAND 8, WAND 16 – Two input OR driving Wired-AND : WOR 2 AND • Delay varies per family – 3. 7 ns in the XC 4005 XL (-1) – 13. 6 ns in the XC 4085 XL (-1)
BUFTs for Multiplexers • BUFT can be used to build large MUXes – Wide MUXes composed of LUTs need multiple levels of logic – Wide MUXes composed of BUFTs have only one level of logic * CLB resources are not used • Xilinx library BUFT components – BUFT, BUFT 4, BUFT 8, BUFT 16 • Note: Multiplexer macros use Look-Up Tables – Example: M 4_1 E
Carry Logic • Each CLB contains dedicated arithmetic logic for fast carry and borrow signals – Carry logic is associated with F and G function generators • Carry logic components have a vertical orientation – Needed for speed and utilization – Known as RPM or “Relationally Placed Macro” – Examples: A<3> * ADDx adders B<3> * ADSUx adder/subtractors A<2> ADD 4 * CCx counters B<2> * COMPMCx magnitude A<1> comparators B<1> A<0> B<0> Z<3> Z<2> Z<1> Z<0>
Programmable Logic Training Course FPGA Registers
CLB Registers • • • S/R DIN Each register can be configured as a Flip. Flop or Latch K Independent clock (CLOCK) polarity Asynchronous Preset or Clear Synchronous Set or Reset Clock Enable Direct input from CLB input EC (CLOCK (Connections ENABLE) bypass LUTs) S/R Control F G H D 1 SET Q QX EC RESET S/R Control F G H D SET Q EC 1 RESET QY
Set and Reset Capabilities • CLB Flip-Flop features include Asynchronous Preset/ Clear or Synchronous Set/Reset – Synchronous Set/Reset is implemented in LUT – Asynchronous Clear/Preset has two sources * Dedicated Global Set/Reset (GSR) net * Local Asynchronous Preset/Clear D Synch. Set/Reset LUT D D Q Reset CLK Local Async. Preset/Clear Q CLB GSR FDC
Global Reset • All Flip-Flops are always initialized during power up – Via the Global Set/Reset network • Access Global Set/Reset network by instantiating the STARTUP primitive GSR Q 2 GTS – Assert GSR for global set or reset CLK – GSR is automatically connected to all CLB Flip. Flops using dedicated routing resources – Saves general use routing resources for the design – DO NOT CONNECT GSR to set/reset inputs on Flip -Flops STARTUP Q 3 Q 1 Q 4 Done. In
Specify Asyn Initialization Value • Instantiation: library components specify asynchronous initialization – FDC (Clear) will be reset * Default value for FD – FDP (Preset) will be set • Behavioral Synthesis designs: consult interface guide • Applies to both global and local asynchronous reset
Flip-Flop Clock Enable • Dedicated CE pin – Both Flip-Flops in a CLB must use same clock enable – Flip-Flops with Asynchronous Clear or Preset – Examples: FDCE, FDCPE • CE implemented in the Look Up Table – Flip-Flops in CLB can have a separate clock enable – Flip-Flops with Synchronous Set or Reset – Examples: FDRE, FDSRE • Comparison of the two types of Flip Flops: – Dedicated version is faster and allows more functionality in the LUT (Look Up Table) – LUT version allows higher utilization of an FPGA with many clock enables
Global Clock Buffers • Clock Buffers are low-skew, high-drive buffers – – Also known as Global Buffers Drive low-skew, high-speed long line resources Drive all Flip-Flops and Latches in FPGA Can also be used for high-fanout non-clock signals • Instantiation: if the BUFG component is instantiated, software will select one of these buffers based on the design • Synthesis: Clocks are identified by different means depending on Vendor – Example: Synopsys FPGA compiler connects clock buffers to all fan-in of clock pins * Control clock buffer insertion with separate commands * Consult Synthesis interface guide or vendor
Generating Clock On-Chip • Internal configuration clock available after configuration – Use OSC 4 primitive OSC 4 F 8 M F 500 k F 16 k F 490 BUFGS F 15 – Nominal values (approximately): * 8 MHz, (500 k. Hz, 16 k. Hz, 490 Hz, 15 Hz)
Flip-Flop and Latch Components • Standard and Logi. BLOX libraries contain a wide variety of Flip-Flops and Latches – – – D, JK Flip-Flops Rising and falling edge triggered Clock Enable Asynchronous Preset and Clear Synchronous Set and Reset
Naming Conventions Flip-Flop D-Type (D), JK-Type (JK), Toggle-Type (T) Asynchronous Preset (P), Asynchronous Clear (C) Synchronous Set (S), Synchronous Reset (R) Clock Enable Inverted Clock LDCE_1 Transparent D Latch Asynchronous Preset (P), Asynchronous Clear (C) Gate Enable Inverted Gate FD PE _1 FD 16 R E Flip-Flop, D Type Size Synchronous Reset Clock Enable
Shift Register & Counter • A variety of shift registers are available. – Example: SR 8 RLED = Shift Register, 8 bits, Load, Enable, BI-directional • Shift Register function is determined by inputs: – Example: add clock enable using EC pin • Libraries support a wide variety of fast and efficient counters – Counters offer trade-offs between speed, utilization, and complexity – Example: Logi. Blox counter styles * Binary: predictable outputs, uses carry logic * Johnson: fastest practical counter, uses few Flip-Flops * LFSR: fast & dense, but pseudo-random outputs * One-Hot: useful for generating series of enables * Carry Chain: High speed and utilization – Most synthesis tools select a component based on the design, or the designer can instantiate a component using Logi. BLOX.
16 Bit Counter Examples The following are implemented in XC 4000 XL-3 Macro CB 16 CLE/D CC 16 CLE X-BLOX: LFSR CLBs 18 - 20 19 9 9 Clock 23 - 24 ns 19 ns 16 ns 7 ns • Simpler functions are faster and smaller • Carry Logic Counters are generally faster (depends on size)
Pipeline for Speed • Use Synchronous Design • Pipelining improves speed – Consider wherever latency is not an issue – Use for terminal counts, carry lookahead, etc. • How to estimate the clock period Example for 50 MHz clock frequency in XC 4000 XL-3: Clock period One level Delay allowance Each added level Added levels of logic allowed CLB t. CO 20 ns - 8 ns (t. CO + t. NET + t. SU) 12 ns / 6 ns (t. PD + t. NET) 2 CLBs t. NET t. PD CLB t. NET t. SU
Counter Tips • Do not use binary sequence if unnecessary • Consider higher performance or smaller counter types – Examples: LFSR, Pre-scaled, Gray • Use Pre-Scaling on non-loadable counters to increase speed – LSBs toggle quickly • Use Gray code counters if decoding outputs – Glitch free, since one-bit changes per transition • Consider Linear Feedback Shift Register for speed when terminal count is all that is needed – Or when any regular sequence is acceptable (e. g. , FIFO) Fast TC Small Counter CE Large Dense Counter with Slower Carry Q 0 10 -bit SR Q 6 Q 9
State Machine Design Tips(1) • Use One-Hot Encoding for State Machines – Shift-register like structure – One Flip-Flop is assigned to each state – Works well in Xilinx “registerrich” FPGAs – Number of required Flip-Flops may be higher than other state machines, but logic to generate state is less complex Qx I 1 In Qy I 1 In Qz I 1 In D Q FF Qn + 1 D Q Qn + 2 FF Prototype OHE State Machine: Qx, Qy, and Qz are composed of state variables from previous states
State Machine Design Tips(2) • Split complex states • Need to minimize number of inputs, not number of Flip -Flops, in FPGAs – Use One-Hot encoding for medium to large state machines (greater than 12 states) • Complex states may be improved by breaking up into additional simpler states State A cond 1 State B State A 1 cond 1 State A 2 State B cond 1
State Machine Design Tips(3) • Consider a pipeline: break the state machine into two or more clock cycles – Two clock cycles for a state is better than having to slow the clock for the entire state machine – This basically means to breakup wide input equations using intermediate nodes in the state diagram. State A State C State A State B State C
Avoid Gate Clock and Asyn Reset • Move gating to non-clock pin to prevent glitch from affecting logic. Example: Poor Design: TC and Q may glitch during the transition of Q<0: 2> from 011 to 100 Binary Counter CK Q 0 Q 1 Q 2 D Q TC Improved Design: Binary Counter CK Q 0 Q 1 Q 2 Carry-1 D Q CE TC TC will not glitch during the transition of Q<0: 2> from 011 to 100
Use Clock Enables FDx. E • Use clock enable when using most of or all logic inputs D Q – Avoid gating of clock signal directly CE • Use MUXed data when using only 1 -2 logic inputs or for a gated clock enable – Or, when two different clock enables must drive Flip-Flops in one CLB D CE Q
Programmable Logic Training Course FPGA Memory
ROM and RAM components • RAM and ROM components are available in the Xilinx Libraries – ROM components are stored in LUTs (Look-Up Tables) like standard logic – May be easier to define logic in LUT format instead of gate format • FPGA Look-Up Tables are blocks of RAM – Data is written during configuration – Data is read and written after configuration ROM Description Gate Description I 1 I 2 F 1 F 2 O = I 1*I 2 A 0 X O A 1 DATA(0)=0 DATA(1)=0 X F 2 DATA(2)=0 DATA(3)=1 F 1 DOUT
LUT Provides 16 X Flip-Flops • 32 bits versus 2 bits of storage – Two 16 x 1 RAMS or One 32 X 1 Single-Port Ram fit in one CLB – One 16 x 1 Dual-Port RAM fits in one CLB D 1 A 0 A 1 A 2 A 3 A 4 32 bits WE CLB D 1 DQ Q 1 D 2 2 bits DQ Q 2 O 1 CLK • 32 x 8 shift register with RAM = 11 CLBs – Using Flip-Flops, takes 128 CLBs for data alone
RAM Guidelines • Less than 32 words is best for fastest performance – 32 x 1 or 16 x 2 RAM fits in one CLB * Delays are short (one level of logic) – Data and output MUXes are required to expand depth • Less than 256 words recommended per RAM – Use external memory for 256 words or more • Width easily expanded – Connect the address lines to multiple blocks • Recommendation: Use less than 1/2 of max memory resources – Maximum memory uses all logic resources of CLBs
128 X 1 RAM Example • Implementation of a synchronous single-port RAM is shown here – Available in the Logi. BLOX and Unified libraries • For a 128 X 1 RAM, a 4 -to-1 MUX is used on the output A<6: 5> WE Decoder RAM 32 X 1 D D WE 0 WE WLK CK O A<4: 0> Decoder RAM 32 X 1 D D WE 1 WE WLK CK O A<4: 0> Decoder RAM 32 X 1 D D WE 2 WE WLK CK O A<4: 0> RAM 32 X 1 D D WE 3 WE WLK CK O A<4: 0> 4: 1 MUX 4: 1 A<6: 5> DOUT
128 X 1 RAM with Tri-State • Three-State Buffers multiplex output of four 32 X 1 RAMs – This RAM is not available in libraries A<6: 5> WE Decode 8 RAM 32 X 1 D D WE 0 WE WLK CK O DOUT A<4: 0> Decode 8 RAM 32 X 1 D D WE 1 WE WLK CK O Decode 8 RAM 32 X 1 D D WE 2 WE WLK CK O RAM 32 X 1 D D WE 3 WE WLK CK O A<4: 0> WE 0 WE 1 WE 2 WE 3 BUFT
How to Generate Memory • Use Logi. Blox utility to create arbitrary size RAM or ROM – Select type: ROM, Synchronous, Asynchronous, or dual port RAM – Specify Depth: number of words must be a multiple of 16, ranging from 16 to 256 words – Specify Width: word size ranges from 1 to 64 bits – Specify initialization values with Mem file • Logi. BLOX also creates RAM interface – Entity and component declaration - cut and paste into the design (VHDL designs) – Module declaration (Verilog designs) – Symbol graphic (schematic entry designs)
Initializing Memory (1) • Logi. BLOX can create a template memory – The MEM file must be edited to specify correct values • MEM file has two sections: – Header section describes size, default initialization value * Values listed use Header radix – Data section defines initialization values for each word * Values listed use Data radix * Values can be listed in order, or specify each address and data * Words may be defined using ASCII strings * Comments are preceded by semicolons
Initializing Memory (2) • Example MEM file: . . . ; Header Section RADIX 16 DEPTH 8 WIDTH 4 Default 5 ; unspecified words are initialized to 5. . . ; Data Section RADIX 16 DATA 6, A ; values for each word 0 and word 1 4: 2, 7 ; values for word 4 and word 5 7: 3 , values for word 7 • What is the size of the memory? • List the value of each word after initialization.
Programmable Logic Training Course FPGA I/O
IOB Block Diagram • Three-state output • Registered Input or Output • Bi-directional I/O • Output Slew Rate control • Programmable setup/hold delay
Instantiation of IO Blocks (1) • User must explicitly define resources to be used in the IOB • I/O components are defined with – One pad primitive – At least one function primitive * Buffer, Flip-Flop, or Latch • Input Examples: IOB IN 1_PAD IOB IN 2_PAD ILD IPAD IN 1_PAD IBUF • Recommendation: Keep I/O at the top level of the design.
Instantiation of IO Blocks (2) • Output Examples: IOB OUT 1_PAD OBUF OUT 1_PAD IOB OUT 2_PAD OFD OUT 2_PAD • Bi-directional Example: OFDT contains Three-State Buffer which drives net B 15 OFDT IOB B 15_PAD IOPAD BI 5 IFD OPAD
Locking Down I/O Locations • Place (or lock) IOBs down late in the design cycle – Pin locking makes routing more difficult • To define I/O pad locations, assign a location constraint to the IOB – Create a file in the project directory with name design. ucf * Syntax: INST Q 0_LO<4>_pad LOC = P 59; – More information on location constraints will be discussed later
Output Three-State Control • Output enable may be inverted – Use OBUFE macro for active-high enable – Use OBUFT primitive for active-low enable OBUFE OE T OBUFT OE • Three-state control also via a dedicated global net – Controlled by same STARTUP primitive • All I/O are disabled during configuration T STARTUP GTS
I/O Combinatorial Logic • OAND 2 – Can be used as a generic two-input function generator or MUX – One input can be driven by IOB output clock signal – Requires library components beginning with “O” – F input pin is faster than IO pin F IO OAND 2 OPAD FAST • OMUX 2 – Fast output signal (from output clock pin) MUXes IOB output or clock enable pins to pad – Effectively doubles the number of device outputs without requiring a larger, more expensive package – Pin-to-pin delay is less than 6 ns OMUX 2 D 0 O D 1 S 0 OPAD
IOB Flip-Flops and Latches • IOB Flip-Flop/Latch features – – Programmable clock polarity Clock enable Global Set or Reset No Asynchronous Local Set/Reset • Flip-Flops and Latches can be used in unbonded IOBs • Use IOB Flip-Flops: – When all CLB Flip-Flops are used – To minimize the Flip-Flop-to-PAD delay – Minimize skew between outputs • IO Blocks contain minimal combinatorial logic – IOB Flip-Flops can be used as part of an internal shift register – Do not use IOB Flip-Flops as part of a pipeline
Use Pull-ups/Pull-downs to Prevent Floating • Unused IOBs: – Outputs of unused IOBs are automatically disabled – Pull-ups are automatically connected on unused IOBs • Used IOBs: – A PULLUP or PULLDOWN primitive can be connected to used IOBs – Inputs should not be left floating * Add a pull-up to design inputs that may be left floating to reduce power and noise
Slew Rate Control • Slew rate controls output speed • Two slew rates – Default slow slew rate reduces noise – Use fast slew rate wherever speed is important – Fast Slew rates are approximately twice as slow slew rates • Slew rate specification – Instantiation: in the user constraint file: * INST $1 I 87/obuf SLOW; – Synthesis: vendor dependent FAST OPAD OBUF
- Slides: 48