Supporting RipUp and Reroute in an FPGABased Multilayer
Supporting Rip-Up and Reroute in an FPGA-Based Multilayer Maze Routing Accelerator John A. Nestor, Kavon Nasabzadeh, and Oliver Bowen ECE Department Lafayette College Easton, Pennsylvania 18042 nestorj@lafayette. edu
Outline } Background } } } Overview of VLSI CAD Maze Routing with the Lee Algorithm Routing Accelerators The L 3 Acclerator Maze Routing with Multiple Nets (Rip up and Reroute) } L 4 - Hardware Acclerator with Rip up and Reroute } Design Improvements } Etching - a new feature to support Rip up and Reroute } Control Software Implementation } Results } Conclusion Nestor 2 MAPLD 2005/209
The Routing Problem } Given } a set desired connections (a netlist) } A set of layers available to make connections } Create a set of connections that: } } } Nestor Completely connects the terminals of each net Meets timing constraints on delay for critical nets Minimizes the area consumed by routing Minimizes the crossings between each layer Resolves crosstalk and noise issues 3 MAPLD 2005/209
Maze Routing - The Lee Algorithm } Treat routing surface as a grid T S } Algorithm Operation N N } Expansion - starting with “source” terminal, label neighboring nodes with shortest path back to source; stop when target reached } Backtrace - follow shortest path from target back to source } Cleanup - remove labels } Easily extended to multiple layers Nestor 4 MAPLD 2005/209
Lee Algorithm Operation } Expansion - O(d 2) steps T S T S Nestor S S S T T T S T S T S 5 T S MAPLD 2005/209
Lee Algorithm Operation (continued) } Backtrace - O(d) steps T S T S T T S } Cleanup - O(N 2) steps T Nestor T T 6 T MAPLD 2005/209
Lee Algorithm Tradeoffs } Advantage: Guaranteed to find a shortest path connection if one exists } Disadvantages: } Slow • Expansion: O(d 2) for connection of distance d • Backtrace: O(d) for connection of distance d • Cleanup: O(N 2) for an N X N grid } Shortest-path guarantee only for a single connection Nestor 7 MAPLD 2005/209
Multiple Net Routing } Lee Algorithm: routes nets one at a time } Problem: Early connections may block later ones } Solution: “Rip-up-and-reroute” } Rip up (remove) blocking nets } Reroute nets in different order } Key question: how to select net order? T 1 T 2 S 2 Nestor T 1 T 2 S 2 S 1 S 1 Net 1 routed first and blocks Net 2 Net 1 ripped up Net 2 routed first followed by Net 1 8 MAPLD 2005/209
Software-Based Rip-Up and Reroute } } Label unoccupied gridpoints with distance from source Allow expansion of occupied cells with penalty (e. g. , 50) Path through obstacle found when no no-obstacle path exists Usage matrix sums connections in each gridpoint; identifies violations where sum > 1 - identifies nets for rip-up Violation 3 2 T 1 2 1 52 53 1 S 2 51 52 T 2 2 1 52 53 3 2 S 1 T 1 1 1 T 2 S 2 2 1 1 1 S 1 1 Expansion: Path found Backtrace: Routed with through net N 1 conflicting paths Nestor 1 9 Usage matrix Identifies violations MAPLD 2005/209
Maze Routing Accelerators } Full grid accelerators (e. g. L-Machine, [Breuer & Shamsa 1981]) } } One processing element per gridpoint Custom implementation Reduces execution time from O(d 2) to O(d) Impractical for multiple layers } Virtual grid acclerators (e. g. HAM [Venkataswaran & Mazumder, 1993]) } Each processing element handles multiple gridpoints } Custom implementatoin } Complex PE design Nestor 10 MAPLD 2005/209
The L 3 Maze Routing Accelerator } Modified Direct Grid Approach } Use two-dimensional grid to support multiple layers by time multiplexing } PE Operation on each clock cycle } Expansion: • if the cell represented the PE is unexpanded and a neighboring PE is expanded, enter an “expansion state” that marks the direction of the neighbor (shortest path to source) • Quit when target enters expansion state } Backtrace • Start with target and follow marked direction • Quite when source is reached } Cleanup: clear expansion states (but not obstacles) of all PEs in parallel - one layer per clock cycle Nestor 11 MAPLD 2005/209
L 3 Accelerator Organization Control Unit Column Decoder R o w D e c. PE PE PE PE S T CMD Signal (to all PEs) STATUS Signal (AND of all cell STATUS outputs) Nestor 12 MAPLD 2005/209
L 3 - PE Details } Efficient implementation } 32 LUTs in a Xilinx Spartan/Virtex FPGA for each PE } Large Xilinx FPGAs can support grids of 32 X 32 PEs } Execution time comparsion: hw vs sw: } Expansion: O(L X d) for L layers vs. O(L X d 2) } Backtrace: O(L X d) vs. O(L X d) } Cleanup: O(L) vs. O(L X N 2) } Single-net routing speedup of 93 X over software for a 16 X 16 4 layer array (2. 5 GHz Pentum 4) } Drawbacks } Limited clock speed due to long timing paths in design } No support for multiple net routing Nestor 13 MAPLD 2005/209
L 4 - An Improved Routing Accelerator } Support for rip up and reroute using etching } Hardware can’t support cost-based routing (unlike SW) } Instead, allow expansion of “obstacle” cells } Analogy - solvent “etching” through a barrier } Pipelined control unit / array interaction for increased clock rate (50 MHz vs. 24 MHz) } Increased implementation cost (44 LUTs), but still feasible for 32 X 32 routing array } Control algorithm in Host Processor supports multiple net routing Nestor 14 MAPLD 2005/209
L 4 Processing Element Design Shift Register SREG stores state for layers above/below current layer Layers processed bottom-to-top Register ST 0 stores state for layer currently being processed Nestor 15 MAPLD 2005/209
L 4 Processing Element - States } Each PE stores the state of each layer at a specific (x, y) coordinate } States: Nestor EMPTY Cell unoccupied and unexpanded BLOCKED Cell occupied by routed net XE Expanded - shortest backtrace path to east XW Expanded - shortest backtrace path to west XN Expanded - shortest backtrace path to north XS Expanded - shortest backtrace path to south XU Expanded - shortest backtrace path upward XD Expanded - shortest backtrace path downard 16 MAPLD 2005/209
L 4 Processing Element - Commands } Commands are broadcast to all cells } Decoders select rectangular regions of cells } Cell outputs ORed together READ Return state of selected cell(s) WRITE Write state of selected cell(s) EXPAND CLEARX Nestor IF EMPTY or (BLOCKED and ETCH enabled�, AND a neighboring cell is expanded THEN enter corresponding expand state (XN, …) Reset expanded cells to EMPTY state Reset etched cells to BLOCKED state 17 MAPLD 2005/209
Etching Example } ETCH=0 - expansion fails T 1 T 2 S 2 S 1 T 2 S 1 S 1 } Etch=1 - expansion succeeds T 1 T 2 S 1 T 2 S 1 After Backtrace Expansion Nestor T 2 S 1 T 1 18 MAPLD 2005/209
Overall System Nestor 19 MAPLD 2005/209
Control Algorithm } Based on the SILK Simulated Evolution router [Lin et. al. , 1989] } Key idea: score nets by fitness Score = a* (number of violations) + b * (number of vias – minimum number of vias) + g * (actual wire length / lower bound of wire length) } Probabilistically select less fit nets for rip-up } Use hardware router to find connections } Usage matrix (maintained in software) identifies routing violations Youn-Long Lin, Yu-Chin Hsu and Fur-Shing Tsai, “SILK: A Simulated Evolution Router”, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 8, No. 10, pp. 1108 -1114, October 1989. Nestor 20 MAPLD 2005/209
Control Algorithm Initial Routing HW/SW Router Input Netlist (Greedy Algorithm) Feasible Solution Usage Matrix SE Algorithm Post Optimization Final Routing Nestor Rip up Nets HW - L 4 Accelerator Reroute Nets Success 21 Routing Result Fail SW - Modified Lee Algorithm MAPLD 2005/209
Evaluating the System } Goal: evaluate the best case speedup over software (interface overhead excluded) } Method: Cycle counters in accelerator, host processors } Experiements on two grids } 4 -layer 8 X 8 } 4 -layer 16 X 16 Nestor 22 MAPLD 2005/209
Results 8 X 4 Grid 16 X 4 Grid Nestor 23 MAPLD 2005/209
Conclusions } Extended accelerator for Maze Routing } } Support for Rip-Up and Reroute Support for multiple net routing Faster clock speed Control algorithm based on simulated evolution } Future Work } Larger arrays in a PCI-based accelerator } Performance measurements including interface overhead } Hardware support to accelerate control algorithm Nestor 24 MAPLD 2005/209
- Slides: 24