Modular Design Techniques for the FPX Field Programmable
Modular Design Techniques for the FPX Field Programmable Port Extender (FPX) 1
Overview • Motivation • RAD Logic Resources • RAD Infrastructure Modules – Reconfiguration Control – SRAM Interface – Control Cell Processor • RAD Module Interface • Top Level RAD Design – Pins and layout overview – Module instantiation Field Programmable Port Extender (FPX) 2
Motivation for Modular Design • Definitions – Modules: entities that perform network data processing • FPX Applications: packet classification, compression, etc. – Infrastructure: all other entities necessary for system functionality • Memory interfaces, control cell processor, reconfiguration control, etc. • Assume most applications do not need all available logic and memory resources • Higher performance and flexibility are achievable via multiple modules • Standard module interface – Ensures module interoperability – Reduces design redundancy – Shortens module design cycle Field Programmable Port Extender (FPX) 3
Dynamic Hardware Plugins (DHP) • Programmable router with software and reconfigurable hardware packet processing • Hardware plugins – Static interfaces for I/O and offchip memory – User defined on-chip memory • Infrastructure – IOC • Slotted ring interface – Application Controller • Reconfiguration control – Memory Interfaces • SRAM/SDRAM interfaces • Applications – Position independent – Dynamically loadable • Prototype with WUGS/SPC/FPX – Partially reconfigure RAD FPGA for new applications Field Programmable Port Extender (FPX) 4
RAD FPGA Logic Resources • Virtex 1000 E – 7 FPGA • 4 Global Clock Trees – (2) 100 MHz clocks from FPX board • Globally accessible IOBs – Versa-Ring routing – 3 flops for tri-state bussing • 64 x 96 CLB array – 2 flops/LUTs per Slice – 2 Slices per CLB – Total = 24, 576 flops/LUTs • 96 Block Select. RAMs – 4096 bits per block – 6 columns of 16 blocks – 6 columns of dedicated interconnect – Total = 393, 216 bits Field Programmable Port Extender (FPX) 5
Reconfiguration Control Module • Partial reconfiguration controller for RAD FPGA • Executes reconfiguration handshake with NID FPGA and RAD modules • Module interface – Localized synchronous reset – Enable – Ready Field Programmable Port Extender (FPX) 6
SRAM Interface Module • Interface to off-chip ZBT SRAM • Abstracts modules from device specific timing • Independent interface for each module • Arbitrates requests and issues grant to winning module • Modules retain access by holding request high after receiving grant – Modules responsible for preventing starvation Field Programmable Port Extender (FPX) 7
Control Cell Processor • Captures control cells for offchip memory transactions – SRAM read/write – SDRAM read/write • Not yet implemented • Checks for correct HEC • VPI = 0 x 000 • VCI = 0 x 0023 (35) – Modifiable register • Module. ID = 0 x 00 • Op. Codes – Even Op. Codes for command cells – Response Op. Code = 1+Op. Code – Op. Codes 0 x 00 to 0 x 0 F reserved for common operations • Updates CRC for response cells Field Programmable Port Extender (FPX) 8
RAD Module Interface • Cell I/O and Flow Control – 32 -bit wide UTOPIA-style interface w/ unique timing • Off-chip Memory Access – Arbitrated access to SRAM and SDRAM via standard interface • Control (clock, reset, and reconfiguration control) Field Programmable Port Extender (FPX) 9
Control Interface • 100 MHz global clock (CLK) – All I/O signals should be synchronous to CLK • Synchronous reset (RESET_L) – Asserted low for 1 clock cycle • Reconfiguration handshake (ENABLE_L, READY_L) – Enable asserted low at reset – Module must pull READY_L high after reset, prior to accepting cells in order to prevent reconfiguration during operation – Enable asserted high prior to reconfiguration – Module stops accepting cells, flushes internal pipelines, and asserts READY_L for at least one clock cycle Field Programmable Port Extender (FPX) 10
Cell Input Interface • Start of Cell (SOC_MOD_IN) – Signals the first word of the ATM cell • 32 -bit wide data path (D_MOD_IN) – ATM cells transferred as (14) 32 -bit words – First word arrives with SOC_MOD_IN – Remaining 13 words arrive on subsequent clock cycles • Transmit Cell Available (TCA_MOD_IN) – Signals module’s ability to accept a cell – Must be valid 6 clock cycles prior to the last cycle of the current cell transfer Field Programmable Port Extender (FPX) 11
Cell Output Interface • Start of Cell (SOC_OUT_MOD) – Signals the first word of the ATM cell • 32 -bit wide data path (D_OUT_MOD) – ATM cells transferred as (14) 32 -bit words – First word sent with SOC_MOD_IN – Remaining 13 words sent on subsequent clock cycles • Transmit Cell Available (TCA_OUT_MOD) – Signals output’s ability to accept a cell – Modules must sample TCA_OUT_MOD no sooner than 3 clock cycles prior to asserting SOC_OUT_MOD Field Programmable Port Extender (FPX) 12
SRAM Interface • Arbitration Handshake – SRAM_REQ requests and holds memory access – SRAM_GR grants access and initiates access termination – Module may retain memory access for duration of transaction set • If grant is de-asserted, module must complete current transaction and release memory • Module is responsible for preventing starvation • Reads – Hold SRAM_RW high, issue address – Data appears inside module 6 clock cycles later • Writes – Assert SRAM_RW low, issue address and data – Data will be written 5 clock cycles later IMPORTANT: HOLD SRAM_RW HIGH TO PREVENT OVERWRITING VALID MEMORY DATA Field Programmable Port Extender (FPX) 13
SRAM Interface Timing • All I/O signals must be flopped at module boundary to ensure timing constraints are met • Timing diagrams take reference point from inside module and assume boundary flops Field Programmable Port Extender (FPX) 14
RAD Pin Mappings • Ingress Path (LC) – Input • SOC_LC_NID • D_LC_NID • TCAFF_LC_RAD Output Egress Path (SW) Input Output Ingress Path (LC) – Output SDRAM 1 • Egress Path (SW) SDRAM 2 • SOC_LC_RAD • D_LC_RAD • TCAFF_LC_NID RAD FPGA (Chip View) – Input • SOC_SW_NID • D_SW_NID • TCAFF_SW_RAD – Output • SOC_SW_RAD • D_SW_RAD • TCAFF_SW_NID SRAM 2 SRAM 1 • SRAM Interfaces • SDRAM Interfaces Field Programmable Port Extender (FPX) 15
Design Issues & Recommendations • Keep routing delays in mind during initial design phase, use conservative estimates • Conform to the Module Interface Specification • Use provided infrastructure • Flop all module I/O signals – Position independent modules • Use synchronous reset • Perform cell I/O simulations • Experiment with synthesis and PAR options – Over-constrain timing delays – Significant deviations in timing results occur with various options, including hierarchy ungrouping and routing algorithms • Share experience and wisdom with other developers Field Programmable Port Extender (FPX) 16
Example RAD Design: IP Router using Fast IP Lookup Field Programmable Port Extender (FPX) 17
Overview • • • FPX file tree Design Overview Fast IP Lookup Module Overview Use of Infrastructure Modules Top-level RAD Design Flow (UNIX, Exemplar, Xilinx) – – Module design and functional simulation (Model. Sim) Top-level design and functional simulation (Model. Sim) Synthesis (Exemplar Leonardo & Spectrum) Place and Route (Xilinx Alliance Series) • Constraint passing caveats • Floorplanning to meet timing – Backannotated Gate-level Simulation (Model. Sim) Field Programmable Port Extender (FPX) 18
FPX File Tree • Provided directories in all CAPS – Distinguishes original (sub)directories from those added by Kits members • Create subdirectory for new module designs under MODULES – Perform local simulation and synthesis • Create subdirectory for new top-level builds under TOP – Instantiate modules and necessary infrastructure – Perform system-level simulation, top-level synthesis Field Programmable Port Extender (FPX) 19
Design Overview SRAM 1 Extract IP Headers SRAM 1 Interface 1 0 Remap VCIs Request Grant for IP packets IP Lookup Engine 0 0 1 1 counter 1 On-Chip Cell Store SRAM 2 Packet Reassembler LC Field Programmable Port Extender (FPX) 0 RAD FPGA NID FPGA 0 0 1 1 1 1 Control Cell Processor SW 20
Fast IP Lookup Module Overview Field Programmable Port Extender (FPX) 21
Top-level RAD Design with FIPL Module Field Programmable Port Extender (FPX) 22
End of Presentation Field Programmable Port Extender (FPX) 23
IP Lookup Design Constraints • • Maximum WUGS line rate = 1. 2 Gb/s Minimum packet length = 1 cell Lookup period < 323 ns Access to one 256 K x 36 SRAM (Micron ZBT) Minimum memory latency = 4 clock cycles Memory accesses per lookup (IPv 4, worst case) = 11 Single worst case lookup: (memory accesses)x(clock cycles/access)x(Tclk)=tlookup 11 x 4 x 10 ns = 440 ns • Must use parallel engines and pipeline memory accesses to achieve desired performance. • Reality check: – FPGA routing delays comprise ~ 50% to 80% of total signal delay Field Programmable Port Extender (FPX) 24
IP Lookup Design Techniques • Design (VHDL) – Simulate design/algorithm with C program – Identify constraints – Design with conservative delay estimates • Flops for Cell I/O • Allow one clock cycle for next address calculation • Simulation (Mentor Graphics Model. Sim) – Experimental data structure written to memory from input file via “fake” control cell processor – Used “fake” NID model with file I/O to pass cells in and out • Synthesis (Exemplar) – Targeted 9 ns clock period • Place and Route (Xilinx Alliance Series) – Used constraint file with pin mappings – Weighted delay vs. area – Used DFS routing algorithm vs. KPATHS Field Programmable Port Extender (FPX) 25
IP Lookup Status and Changes • Initial design simulates, synthesizes, and PARs – Timing reports specify maximum clock frequency of 58 MHz… need ~ 2 x speedup • Experimenting with floorplanning – Maintain hierarchy through synthesis – Hand-place data path CLBs • Redesign pipeline – – Add flops to SRAM interface signals Increases memory latency to 6 clock cycles Achieve 1. 2 Gb/s lookups with two engines Create position independent module • Perform final gate-level simulation with robust test vectors and sample data structures Field Programmable Port Extender (FPX) 26
Dynamic Hardware Plugins (DHP) NID FPGA Interface (Cell I/O) • Application for partial FPGA reconfiguration • Ingress/Egress plugin modules DHP Module SRAM Interface Field Programmable Port Extender (FPX) DHP Control Block. RAM SDRAM Interface DHP Module Block. RAM – Cell routing, flow control – Memory mgmt. – Plugin reconfiguration control DHP Module SDRAM Interface • Central control block DHP Module Block. RAM – Plugins are dynamically loaded into running device – Plugins may be bypassed during re-configuration DHP Module Block. RAM • Dynamic reconfiguration Egress Path Block. RAM – Modules are position independent plugins – Multiplexed Daisy-Chain enables plugin permutations Ingress Path 27
IP Lookup as a DHP Module NID FPGA Interface (Cell I/O) Egress Path DHP Module Cells IN DHP Module Remap VCIs IP Wrapper Lookup Engine Cell Store DHP Control SRAM Interface Block. RAM DHP Module DHP Fast Module IP Block. RAM DHP Module SDRAM Interface Extract IP Address Block. RAM SDRAM Interface Block. RAM Ingress Path Block. RAM • Ingress module • Cell I/O – Process all IP data flows passing through switch port – Watch for control cell updates to root node pointer • Requires access to SRAM – Tree bitmap data structure stored in off -chip SRAM • Implements Cell Store, IP Address FIFO, and Output VCI FIFO in Block Select. RAM Cells OUT SRAM Interface Field Programmable Port Extender (FPX) SRAM Interface 28
Challenges • DHP Module control – Cell routing to correct permutation of plugin modules • Flow classification and tagging of cells – Flow control • Asynchronous (non-flywheel) cell I/O interfaces • Plugins may arbitrarily delay cells • Plugins may inject more traffic than they absorb and vice versa • Implementing and maintaining static DHP Module interfaces – Signal route locks for plugin module interface – Signal route locks for memory and control signals – Reservation of logic and routing resources • Memory resource arbitration – Sharing off-chip memory resources between a dynamic set of applications • Maintaining flow state between plugins Field Programmable Port Extender (FPX) 29
- Slides: 29