Programmable Logic Devices Ernest Jamro Dept Electronics AGH
Programmable Logic Devices Ernest Jamro Dept. Electronics AGH UST, Kraków Poland
PLD as a Black Box Inputs (logic variables) Logic gates and programmable switches Outputs (logic functions)
Programmable Logic Devices (PLD) • PLA/PAL/GAL – very simply functions up to roughly 30 input/output pins, (EPROM / EEPROM based) • CPLD – complex PLD – incorporating many PAL/GAL structures (based on EEPROM), medium scale logic (20 -200 input/output pins) • FPGA (Field Programmable Gate Arrays) – large scale PLD (50 to 2000 pins), SRAM-based (requires configuration after power-on), System on a Chip (incorporates microprocessors, memory blocks etc. )
Programmable Logic Array (PLA) –The connections in the AND plane are programmable –The connections in the OR plane are programmable x 1 x 2 xn Input buffers and inverters x 1 xn xn P 1 AND plane Pk OR plane f 1 fm
Gate Level Version of PLA x 1 x 2 x 3 Programmable connections f 1 = x 1 x 2+x 1 x 3'+x 1'x 2'x 3 OR plane P 1 f 2 = x 1 x 2+x 1'x 2'x 3+x 1 x 3 P 2 P 3 P 4 AND plane f 1 f 2
Customary Schematic of a PLA x 1 x 2 x 3 OR plane f 1 = x 1 x 2+x 1 x 3'+x 1'x 2'x 3 P 1 f 2 = x 1 x 2+x 1'x 2'x 3+x 1 x 3 P 2 P 3 P 4 x marks the connections left in place after programming AND plane f 1 f 2
AND Plane Implementation with Floating Gate Transistors
Programmable Array Logic (PAL/GAL) – The connections in the AND plane are programmable x 1 x 2 – The connections in the OR plane are NOT programmable xn Input buffers and inverters x 1 – PAL – one-time programmable (like EPROM) http: //en. wikipedia. org/wiki/Programmable_Array_Logic - GAL (Generic Array Logic) it is eraseable and re-programmable (like EEPROM) fixed connections xn xn P 1 AND plane Pk OR plane f 1 fm
Example Schematic of a PAL/GAL x 1 x 2 x 3 f 1 = x 1 x 2 x 3'+x 1'x 2 x 3 P 1 f 2 = x 1'x 2'+x 1 x 2 x 3 P 2 f 1 P 3 P 4 AND plane f 2
Macrocell PAL Select OR gate from PAL 0 1 D Q Flip-flop Clock back to AND plane Enable f 1
Macrocell Functions – Enable = 0 can be used to allow the output pin for f 1 to be used as an additional input pin to the PAL – Enable = 1, Select = 0 is normal for typical PAL operation Select 0 1 – Enable = Select = 1 allows the PAL to synchronize the output changes with a clock pulse D Q Clock back to AND plane – The feedback to the AND plane provides for multilevel design Enable f 1
Multi-Level Design with PALs/GALs f = A'BC + A'B'C' + AB'C = A'g + Ag' • A where g = BC + B'C' and C = h below B Sel = 0 En = 0 0 h 1 D Q Sel = 0 Clock 0 1 En = 1 g D Q Select Clock 0 1 D Q Clock f
CPLD – Complex Programmable Logic Devices (CPLD) – SPLDs (PLA, PAL) are limited in size due to the small number of input and output pins and the limited number of product terms • Combined number of inputs + outputs < 32 or so – CPLDs contain multiple circuit blocks on a single chip • • • Each block is like a PAL: PAL-like block Connections are provided between PAL-like blocks via an interconnection network that is programmable Each block is connected to an I/O block as well
PAL-like block I/O block Structure of a CPLD PAL-like block I/O block Interconnection wires
Internal Structure of a PAL-like Block – Includes macrocells • – Usually about 16 each PAL-like block Fixed OR planes • OR gates have fan-in between 5 -20 PAL-like block DQ – XOR gates provide negation ability • XOR has a control input DQ DQ
Programming a CPLD is eraseable and re-programmable like EEPROM CPLDs have many pins – large ones have > 200 Removal of CPLD from a PCB is difficult without breaking the pins Use ISP (in system programming) to program the CPLD JTAG (Joint Test Action Group) port used to connect the CPLD to a computer
FPGA Principles • A Field-Programmable Gate Array (FPGA) is an integrated circuit that can be configured by the user to emulate any digital circuit as long as there are enough resources • An FPGA can be seen as an array of Configurable Logic Blocks (CLBs) connected through programmable interconnect (Switch Boxes) • An FPGA is usually based in SRAM memory therefore can be very quickly re-programmed and must be programmed after each power-on Copy from dr. Konstantinos Tatas com. tk@fit. ac. cy http: //staff. fit. ac. cy/com. tk
FPGA structure
Simplified CLB Structure
Programmable Logic lab example
Example of RAM: 4 -input AND gate A B C D O 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 1 1 1 0 0 0 0 1 0 1 0 0 1 1 0 0 0 1 1 1 0 0 1 1 1
Example 2: Find the configuration bits for the following circuit A 0 A 1 S 0 0 0 1 1 1 0 0 1 1 1 SRAM Memory U’DD T 5 Linia T 6 T 3 T 4 T 1 T 2
Interconnection Network
Example 3 • Determine the configuration bits for the following circuit implementation in a 2 x 2 FPGA, with I/O constraints as shown in the following figure. Assume 2 -input LUTs in each CLB.
CLBs required
Placement: Select CLBs
Routing: Select path
Configuration Bitstream • The configuration bitstream must include ALL CLBs and SBs, even unused ones • CLB 0: 00011 • CLB 1: 01100 • CLB 2: XXXXX • CLB 3: ? ? ? • SB 0: 000000 • SB 1: 000010 • SB 2: 000000 • SB 3: 000000 • SB 4: 000001
The Virtex CLB
Details of One Virtex Slice
Implements any Two 4 -input Functions 4 -input function 3 -input function; registered
Implements any 5 -input Function 5 -input function
Implement Some Larger Functions e. g. 9 -input
Two Slices: Any 6 -input Function from other slice 6 -input function
Example: mod 10 counter 0 1 1 0 2 1 3 0 Q 0 . . . 8 1 9 0 Q 1 10 X . . . 0 0 1 0 Q 2 . . . 6 0 7 1 8 1 9 0 10 X . . . Q 3
Ripple Carry Adder ai + bi+ci-1 = si + 2·ci si ci-1ai, bi 00 01 11 10 0 0 1 1 1 0 ci ci-1ai, bi 00 01 11 10 0 1 0 1 1 1 si = ai bi ci-1 ci= ai bi + ai ci-1 + bi ci-1= ai bi + ci-1 (ai bi)
Ripple Carry Adders in FPGAs si= ai bi ci-1 Fragment of Virtex Configurable Logic Block (CLB)
Lookup Tables used as memory (16 x 2) Distributed Memory Synchronous write, asynchronous read
Lookup Tables used as distributed memory (32 x 1)
Virtex-5 Logic Architecture Advanced logic structure – True 6 -input LUTs – Exclusive 64 -bit distributed RAM option per LUT – Exclusive 32 -bit or 16 -bit x 2 shift register RAM 64 SRL 32 LUT 6 Register/ Latch
New Advanced Logic Structure • Improved slice – Four LUT 6 s & FFs per slice – Better local connection • True 6 -input LUTs – Higher performance – Best logic compaction – Wide logic functions without MUX delays • 65% higher capacity and one to two speed grades faster than Virtex-4 (4 inputs LUTs) LUT 6 Slice
Logic Compaction with LUT 6 Use Fewer LUTs, Faster, Less Routing 8 to 1 Multiplexer LUT 4 LUT 6 64 bit RAM LUT 4 LUT 6
New 6 -Input LUT with Two Outputs • True 6 -input LUT – Any function of 6 variables – No input shared with other LUTs A 6 A 5 A 4 A 3 A 2 A 1 O 6 O 5 • Second output adds functionality – Reduces average slice count by 10% – 2 independent functions of 5 variables – 1 function of 6 variables plus 1 subfunction of 5 variables – 1 function of 3 variables plus 1 function of 2 other variables – Plus other combinations of subfunctions. . . 6 -input LUT with 2 outputs
Virtex-5 Memory Options… The Right Memory for the Application Distributed RAM/SRL 32 On-chip BRAM/FIFOFast Memory Interfaces RAM / SRL 32 DRAM BRAM/FIFO LOGIC Virtex-5 DRAM • SDRAM • DDR SDRAM • FCRAM SRAM • RLDRAM SRAM • Sync SRAM FLASH • DDR SRAM • ZBT • QDR EEPROMFLASH EEPROM • Very granular, localized memory • Efficient, on-chip blocks • Cost-effective bulk storage • Minimal impact on logic routing • Flexible + optional FIFO logic • Memory controller cores • Great for small FIFOs • Ideal for mid-sized FIFOs/buffers • Large memory requirements Granularity Capacity Synchronous write Asynchronous read Synchronous read
Distributed RAM • Distributed LUT memory – 64 -bit blocks throughout the FPGA – Single-port, dual-port, multi-port – Can be used as 32 -bit shift register Slice 3 Logic RAM Shift Register Slice 3 Logic • Very fast (sub-nanosecond) – Tightly coupled to logic Slice 3 Logic RAM Shift Register Slice 3 Logic Slice 3 Logic RAM Shift Register R A M • Synchronous write, asynchronous read Distributed memory can be placed anywhere in the FPGA
32 -bit Shift Registers in 1 LUT • Length is dynamically determined by the A inputs D CLK 32 -bit Shift register Q 31 32 A 6 MUX Qn Convenient way to dynamically change LUT content
BRAM/FIFO Features Independent read and write port widths • Multiple configurations – True dual-port, simple dual-port, single-port • Integrated logic for fast and efficient FIFOs • Synchronous write and read or FIFO Dual-Port BRAM Each RAM block can be configured as BRAM or FIFO
BRAM Mode Top Level View • True dual port – unrestricted flexibility – Read and write operations simultaneously and independently on port A and port B – 32 Kx 1, 16 Kx 2, 8 Kx 4, 4 Kx 9, 2 Kx 18, 1 Kx 36 Addr A Port A 36 Wdata A 36 Rdata A 36 Kb Memory Array • Each port can have different width Addr B 36 Wdata B Port B Rdata B In one clock cycle, 4 total operations can be performed 36
Block RAM # - can be: 1, 2, 4, 8 (9), 16 (18), 32 (36)
Virtex IOB
Virtex 7 IOB Differential / Single Ended Standards
Virtex 7 IOB Digitally Controlled Impedance (DCI)
IO Standards
Region Region Region CMT CMT Region Region GClk CMT CMT Region Region LX 330 Layout Bank Bank Bank Region Bank Region Bank. CFGBank Region Bank LX 30 Layout Bank Bank Bank I/O Banking Architecture • Many banks per device: – Each bank has a seperate supply voltage (in order to suport different IO standards) Bank Region CFG CMT GClk
Edge-Aligned DDR Inputs, Opposite -Edge D QA DATA FPGA Fabric CLK D QB Select. IO™ CLK DATA QA QA’ QB 1 1 0 0 1 0 1 1 0 0 1 1 1 0 1
Need Frequency Conversion Internal Must Be Lower Than External 1 Gbps n Xn FPGA Fabric CL 4 K DDR Data FF FF FF RLOCs or LOCs Directed Routing Constraints Timing can be tricky 4
Skew Affects Setup and Hold Times Clock Data Connector Source CLK DATA 1 DATA 2 t. SU 2 Target t. H 1 1 t. H 2
Capture Window
Channel Timing Can Create Additional Clock Domains Channel 1 Alignment Frequency Reduction Fast Unaligned 1 Channel 2 Alignment Frequency Reduction Fast Unaligned 2 Channel 3 Alignment Frequency Reduction Fast Unaligned 3 Fast Aligned Slow Aligned
ISERDES Manages Incoming Data Chip. Sync™ Data ISERDES BUFIO FPGA Fabric CLKDIV CLK n ÷ BUFR • Frequency division – Data width to 10 bits • Dynamic signal alignment – – Bit alignment Word alignment Clock alignment Supports Dynamic Phase Alignment (DPA)
Easy Bit Alignment Chip. Sync™ CLK FPGA Fabric DATA IDELAY INC/DEC ISERDES State Machi ne 190 -210 MHz (calibration clk) IDELAY CNTRL • 64 delay elements of ~ 70 to 89 ps each
OSERDES Simplifies Frequency Multiplication Chip. Sync n OSERDES CLK m CLKDIV DCM/PMCD FPGA Fabric
Gigabit Serial Signaling is Everywhere • Serial is faster than parallel – Very high multi-gigabit data rates – Embedded clock avoids clock/data skew – Reduction in EMI & power consumption • The preferred choice in many markets – Telecom, datacom, computing, storage video/imaging, instrumentation, etc. – Dominating all new standards activities 100 % 75% 50% 92% 64% 25 % 0 % 2005 2006 Percentage of Engineers Designing Serial IO Systems Source: EE Times Survey, 2005 Serial transceivers must be flexible, robust and easy to use
The Gigabit Transceiver Tx PMA PCS Rx PMA PCS GTP Transceiver • 8 to 96 transceivers per device • Supporting data rates to 28 Gbps FPGA Fabric Interface
Virtex-5 Delivers Powerful Clock Management – DCMs (Digital Clock Manager) – based on DLL (Delay Lock Loop) – PLLs DCM PLL Clock Buffers • Combination digital and analog technology Select by: Function • Highest performance – 550 MHz global clocking – More than 2 x jitter filtering Component Automatic HDL code
Virtex-5 Clock Management Tile • Up to 6 CMTs per device – Each with 2 DCMs and 1 PLL CMT • DCM – 5 th generation all-digital technology – Provides most clocking functions • PLL – Reduces internal clock jitter – Supports higher jitter on reference clocks – Replaces discrete PLLs and VCOs Powerful combination of flexibility and precision
Filter Jitter Using the Virtex-5 PLL Input Clock >400 ps pk-pk jitter PLL Output Clock <100 ps pk-pk jitter • 400 MHz noisy clock • Quiet FPGA Typical Waveform Examples
DCM (Digital Clock Manager) Features DCM_ADV • Operate from 19 MHz – 550 MHz • Remove clock insertion delay – “Zero delay clock buffer” • Correct clock duty cycles • Synthesize Fout = Fin * M/D – M, D values up to 32 DCM_BASE CLKIN CLKO CLKFB CLK 90 CLK 180 CLK 270 CLK 2 X 180 CLKDV CLKFX 180 RST LOCKED CLKIN CLKO CLKFB CLK 90 CLK 180 Phase CLK 270 Shift CLK 2 X 180 DRP CLKDV CLKFX 180 RST LOCKED • Additional DCM_ADV features – Dynamically phase shift clocks in increments of period/256 or with direct delay line control – Use Dynamic Reconfiguration Port to adjust parameters without reconfiguring Each DCM can be invoked with either the DCM_BASE or DCM_ADV primitive
DCM in VHDL Library UNISIM; use UNISIM. vcomponents. all; -- DCM_SP: Digital Clock Manager port map ( -- Spartan-6 CLK 0 => CLK 0, -- 1 -bit Same frequency as CLKIN, 0 degree phase shift. -- Xilinx HDL Libraries Guide, version 11. 2 DCM_SP_inst : DCM_SP CLK 180 => CLK 180, -- 1 -bit Same frequency as CLKIN, 180 degree phase shift. generic map ( CLK 270 => CLK 270, -- 1 -bit Same frequency as CLKIN, 180 degree phase CLKDV_DIVIDE => 2. 0, -- Specifies the extent to which the CLKDLL, CLKDLLE, CLKDLLHF, or shift. CLK 2 X => CLK 2 X, -- 1 -bit Two times CLKIN frequency clock, -- DCM_SP clock divider (CLKDV output) is to be frequency divided. aligned with CLK 0. CLKFX_DIVIDE => 1, -- Specifies the frequency divider value for the CLKFX output. CLK 2 X 180 => CLK 2 X 180, -- 1 -bit 180 degree shifted version of CLKFX_MULTIPLY => 4, -- Specifies the frequency multiplier value for the CLKFX output. the CLK 2 X clock. CLKIN_DIVIDE_BY_2 => FALSE, -- Enables CLKIN divide by two features. CLK 90 => CLK 90, -- 1 -bit Same frequency as CLKIN, 90 degree CLKIN_PERIOD => "10. 0", -- Specifies the input period to the DCM_SP CLKIN input in phase ns. shift. CLKDV =>phase CLKDV, -- 1 -bit Divided version of CLK 0. Divide value CLKOUT_PHASE_SHIFT => "NONE", -- This attribute specifies the phase shift mode. NONE = No is programmable. -- shift capability. Any set value has no effect. FIXED = DCM CLKFX => CLKFX, -- 1 -bit Digital Frequency Synthesizer output -- outputs are a fixed phase shift from CLKIN. Value is specified (DFS). -- by PHASE_SHIFT attribute. VARIABLE = Allows the DCM outputs to -- be shifted in a positive and negative range relative to CLKIN. CLKFX 180 => CLKFX 180, -- 1 -bit 180 degree shifted version of the CLKFX clock. LOCKED => LOCKED, -- 1 -bit Signal indicating when the DCM has LOCKed. CLK_FEEDBACK => "1 X", -- Defines the DCM feedbcak mode. 1 X: CLK 0 as feedback 2 X: CLK 2 X PSDONE => PSDONE, -- 1 -bit Output signal that indicates -- as feedback. variable phase shift is done. -- Starting value is specified by PHASE_SHIFT. DESKEW_ADJUST => "SYSTEM_SYNCHRONOUS", -- Sets configuration bits affecting the clock delay alignment STATUS => STATUS, -- 8 -bit DCM Status Bits -- between the DCM_SP output clocks and an FPGA clock input pin. CLKFB => CLKFB, -- 1 -bit Feedback clock input to DCM. The feedback input is required unless the DFS DLL_FREQUENCY_MODE => "LOW", -- AUTO mode allows DLL to do automatic frequency search to decide -- is used stand-alone. The source of CLKFB must be CLK 0 or -- whether DLL will operate in LOW or HIGH mode. This is a legacy CLK 2 X output from the -- attribute where the high and low value has no affect, it is -- DCM. -- always in auto mode. CLKIN => CLKIN, -- 1 -bit Clock input for the DCM. DSS_MODE => "NONE", DSSEN => DSSEN, DUTY_CYCLE_CORRECTION => TRUE, -- Corrects the duty cycle of the CLK 0, CLK 90, CLK 180, and CLK 270 PSCLK => PSCLK, -- 1 -bit Phase shift clock input. The PSCLK -- outputs. input pin provides the source clock for PHASE_SHIFT => 0, -- Defines the amount of fixed phase shift from -255 to 255 -- the DCM phase shift. STARTUP_WAIT => FALSE -- Delays configuration DONE until DCM LOCK. PSEN => PSEN, -- 1 -bit Variable Phase Shift enable signal, synchronous with PSCLK. )
Using the DLL to De-Skew the Clock
Three Types of Clock Resources I/O Column Global Clocks Regional Clocks Global Muxes I/O Clocks
BUFG - Global (Clock) Buffer This design element is a high-fanout buffer that connects signals to the global routing resources for low skew distribution of the signal. BUFGs are typically used on clock nets. Library UNISIM; use UNISIM. vcomponents. all; -- BUFG: Global Clock Buffer -- Virtex-6 -- Xilinx HDL Libraries Guide, version 11. 2 BUFG_inst : BUFG generic map ( ) port map ( O => O, -- 1 -bit Clock buffer output I => I -- 1 -bit Clock buffer input );
BUFGCE This design element is a global clock buffer with a single gated input. Its O output is "0" when clock enable (CE) is Low (inactive). When clock enable (CE) is High, the I input is transferred to the O output. This module is race condition free.
Xtreme. DSP in Virtex-5 • Second-generation DSP slice architecture DSP Slice – 25 x 18 multiplier – Per-bit logic functions (AND, OR, XNOR, …) • High performance for DSP “heavy lifting” – 550 MHz operation Can also be used for fast counters, barrel shifters, etc…
Virtex-5 DSP 48 E Full Custom Design Enabling Efficient DSP Wider internal data-path and 96 -accumulated output enable higher precision Pipeline registers enable 550 Mhz performance ACOUT BCOUT PCOUT ACIN BCIN 25 x 18 input increases precision and efficiency = Optional Pipeline Register/ Routing Logic Register Routing Logic C (48 -bit) Multiplier Optional B (18 -bit) A (25 -bit) Optional Pipeline Register/ Routing Logic 48 -bit P (48 -bit) Optional P(96 bit) PCIN Pattern detect circuitry increases functionality
FPGAs For Massively Parallel DSP X C 1 C 0 X MAC Unit Reg Data Out 1 GHz = 1. 6 MSPS 640 clock cycles 640 operations in 1 clock cycle C 2 X C 3 Reg + C 0 Reg 640 clock cycles needed X Data In Reg Coefficients FPGA - Fully Parallel Implementation Reg Programmable DSP Sequential Data In X …C 192 X + Data Out 550 MHz = 550 MSPS 1 clock cycle 640 -tap filter implementation is 340 times faster
Xilinx, 7 Series Families
Zynq = FPGA + Processor
- Slides: 78