Development and Application of Tree Synthesis Algorithms John
- Slides: 65
Development and Application of Tree Synthesis Algorithms John Lillis University of Illinois Chicago
Overview Part I: Buffer tree synthesis Formulations S/P/SP-tree Part II: Fanin tree embedding/replication Optimization across gate boundaries Interaction with placement
Part I: Buffer Tree Synthesis
Premises of Work MAIN PREMISE: Powerful Buffer Tree Synthesis is a Core for Modern Design Conservation of Resources Crucial Estimate: 700 -800 K Buffers/Chip in Near Future Cost-Performance Tradeoffs General Cost Model Topology / Embedding / Buffering Spaces Should be Explored Simultaneously 2 -Phase Approach Not Robust / Predictable Particularly Troublesome in Presence of Blockages
Max Slack Weakness Overoptimized Slack subtrees Cost
Problem Formulation Given: Location of Driver and Sinks Technology Parameters Timing Requirements Buffer Library Target Routing Graph (Blockages) Find: Topology in corresponding space its Embedding and Buffer Assignment Minimizing Cost s. t. Timing Constraints
Philosophy of Constraint Imposition Goals: Predictable Behavior Absence of ad-hoc Heuristics Main Idea: Optimally Solve Constrained Variant of the Problem Well-Designed Constraints Produce Large Flexible Solution Space Tractability Constraints: Topology Space Full space Constrained space
Topology Embedding Flexibility s c s a c a b s b c a b
Target Routing Graph Construction Routing blockage s a c b Buffer blockage
Algorithmic Description Timing-Driven Maze Routing Topology Embedding S-Tree P-Tree SP-Tree
Algorithmic Description Timing-Driven Maze Routing Topology Embedding S-Tree P-Tree SP-Tree
Core Subroutine: Timing-Driven Maze Routing Generalization of [Hur, et. al. ; TCAD Feb 2000] Single Target, Multiple Sources Finds non-dominated paths Simultaneous Buffer Insertion Handling of Blockages in Topology Synthesis Target Sources
Algorithmic Description Timing-Driven Maze Routing Topology Embedding S-Tree P-Tree SP-Tree
Topology Embedding Goal: Obtain timing feasible embedding / buffering of given topology, minimizing cost Solution: Dynamic Programming (bottom-up)
Solution sets A(u, v) represents a set of solutions that correspond to Vertex u in Topology Vertex v in Target Graph A 1 b = Join(A 1. left , A 1. right) A 1 = Gen. Dijsktra(A 1 b) A(u, v) u v
Algorithmic Description Timing-Driven Maze Routing Topology Embedding S-Tree P-Tree SP-Tree
S-Tree Notion of localities: Spatial Temporal Polarity Partition sinks into 2 sets based on: estimated timing criticality signal polarity requirements some other criteria. . . Subtrees can break topology and “stitch” at different place
S-Tree Topology Space s Sink partition: {a, c, d} {b} d c a b s s b d a c d b a c
S-Tree Recurrence A 1 b = Join(A 1. left , A 1. right) A 1 = Gen. Dijsktra(A 1 b) A 2 b = Join(A 2. left , A 2. right) A 2 = Gen. Dijsktra(A 2 b) A 12 b = Join(A 12. left , A 12. right) + Join(A 1 , A 2) A 12 = Gen. Dijsktra(A 12 b)
S-Tree Topology Space s s Initial topology s c b a f d c a e b f d e s s a c c a b d f e c b f d e
Incorporating polarity 4 sets: critical & positive signal polarity critical & negative non-critical & positive non-critical & negative Other partitioning schemes. . .
Algorithmic Description Timing-Driven Maze Routing Topology Embedding S-Tree P-Tree SP-Tree
P-Tree Topology Space All Permutation-Constrained Topologies a b c d e b c d s e a b c d e
Limitations of P-Tree Space Isolation of Critical / Non-Critical Subtrees: “Temporal-Locality” Min WL May Not Produce Min Cost Driver Critical Non-critical
Algorithmic Description Timing-Driven Maze Routing Topology Embedding S-Tree P-Tree SP-Tree
SP-Tree Combine everything said so far. . . From P-Tree Spatial locality Robustness From S-Tree Temporal locality Polarity locality Ability to fix “topology problems” by “stitching”
Solution Space Entire space SP-Tree S-Tree Fixed topo.
Experiments Randomly generated nets Non-uniform required arrival time Non-uniform sink input capacitance Buffer-biased cost Interested in: Min cost feasible solution Max slack solution for verification Runtime More details in the paper. . .
Algorithms for Experiments S-Tree P-Tree SP-Tree RMP [Cong, Yuan; DAC 2000] RMP-Quick [Cong, Yuan; DAC 2000]
Results Net 2 -06 # buffers Min cost feasible Max slack
Results Net 2 -08 # buffers Min cost feasible Max slack
Results Net 2 -12 # buffers Min cost feasible Max slack
SP-Tree vs. P-Tree
Conclusions Key Concepts: General Cost Models Routing Congestion Buffer Congestion Orthogonal Separation of Spatial and Temporal Locality Polarity Requirements Routing and Buffer Blockages Targets: Small-to-Medium Sized Signal Nets Results Summary Highly Cost-Efficient, High Performance Solutions Substantially Outperforms Prior Approaches in Solution Quality and Runtime
Part II: Fanin Tree Embedding/Replication
Replication Overview • Hrkic, Lillis, Beraudo (DAC 04, IWLS 04) • Concept: Netlist structure limits potential of timing-driven placement • Difficult for top-down synthesis to fix • Main issue: inherently non-monotone paths • Approach (Hrkic, Lillis; DAC 04) touches on placement, synthesis (netlist perturbation) and routing.
Logic Replication Duplicate logic cell Preserve functionality Improve timing Place / Move cells Adjust connections A B CR C C D E
Early Work Use replication to straighten I/O paths Local monotonicity [Beraudo, Lillis, DAC 2003] Sequence of 3 cells on the path Incremental framework B D A A C C E D C R E B
Limitations of Local Monotonicity satisfied Still many non-monotone paths A B C D F E
Replication Tree Approach [Hrkic et. al. DAC 04] Identify critical sink Extract critical fan-in tree (Replication Tree) Optimize fan-in tree (Fan-in Tree Embedding) Legalize placement
Slowest Paths Tree Focus on slowest paths Find slowest paths tree from critical sink Include paths within epsilon of current critical delay Focus on most critical portions of fan-in cone
Replication Tree Most circuits do not contain large fan-in trees due to reconvergence Given a critical tree temporarily replicate the entire tree Assign connections: if (u, v) is tree edge; connect u. R to v. R else connect u to v. R A C B A D E C B E B R DR D F AR F FR CR
Placement cost Replication is temporary Placement cost is crucial Cost discount for placing cell over its logical equivalent low cost for placing DR over D actual replication will never occur multiple low cost location possible A C B B CR R DR D E A R F FR
Fan-in Tree Embedding Given: Fan-in tree Placement of sink and inputs Arrival times at inputs Placement and routing graph Find: Placement of internal tree nodes (Gates) Minimizing Cost s. t. Timing Constraints cost / delay tradeoff
Fan-in Tree Embedding Example C A B B sink Higher delay, lower cost sink Lower delay, higher cost
Fan-out and Fan-in Tree C source A B C A sink B Bottom-up Top-down
Fan-in Tree Embedding Adaptation of S-Tree algorithm [Hrkic, Lillis, DAC 2002] Keep: Graph Model for Embedding Target Modified Timing-Driven Maze Routing multiple source, multiple targets at each vertex keep a list of non-dominated solutions S. Hur, J. Lillis, IEEE TCAD 2000 Modify: Top-down vs. Bottom-up Solution signature (c, t): c - cost t - signal arrival time Gate placement cost p(x, y)
Fan-in Tree Embedding Non-binary tree: multiple gate inputs Top-Down Dynamic Programming Maze Routing to populate solutions deffered backtracking Join Solutions Modified maze routing c=px, y + c 1 +. . . + cn t=MAX(t 1, . . . , tn) Bottom-Up solution extraction backtrack to extract maze route extract gate placement Join
Aside: Legalization Use Modified Gain-Graph approach [Hur, Lillis; ICCAD 00] Modified to incorporate timing information
Optimization Flow Identify critical sink (static timing analysis) Extract Fan-in Tree Replication Tree epsilon-Slowest Paths Tree Embed Fan-in Tree Decide which cells to Replicate / Unify Legalize placement Repeat while there is improvement
Enhancements Post-process unification some cells placed close to their logical equivalents no automatic unification if one of the paths is non-critical it is possible to unify without degrading performace Unification in legalizer during ripple-move cell may be placed on top of its replica unify them and stop legalization epsilon-Slowest Paths Tree no randomization dynamically modify value of epsilon to enlarge the fan-in cone
Experiments Algorithms Timing-Driven VPR (Versatile Place and Route) [http: //www. eecg. toronto. edu/~vaughn/vpr. html] Local Replication [Beraudo, Lillis, DAC-03] RT-Embedding 20 MCNC Benchmark Circuits Interested in: Critical delay Amount of replication Wire usage Tests performed in FPGA domain Promissing results
Experimental Setup Obtain valid placement with Timing-Driven VPR placer Local Replication Tree Embedding Route and Evaluate with Timing-Driven VPR router
Winf Wlow-stress wire length blocks Local Repl 0. 925 0. 927 1. 020 1. 003 RT-Embed 0. 858 0. 869 normalized 1. 084 to VPR Average values over all 20 circuits 1. 004 critical path delay Delay improved for all circuits Winf Wlow-stress Best improvement length blocks for circuit pdc: 0. 641 Runtime under 5% on 0. 927 the VPR flow Local Repl penalty 0. 925 RT-Embed 1. 003 0. 858 1. 004 0. 869 Delay improved for all circuits Best improvement for circuit pdc: 0. 641 Runtime penalty under 5% on the VPR flow wire 1. 020 1. 084
Replication Statistics Circuit ex 1010: 38 replications, 12 unifications
Ongoing Work Generalize to ASICs Include simultaneous buffering • Mitigation of legalization noise Preventing (some) overlaps in embedding More sophisticated placement cost Reconvergence - arborescence approach Simultaneous technology (re-)mapping – Explore multiple Tree Topologies simultaneously (Universal Tree solver engine: U-Tree)
Review Trees are everywhere! Even in places where they seem to be absent Tree based algorithms can be very strong in generality of formulation and predictability Enable connection to general placement/routing target Can capture tradeoffs between complex objectives Can sometimes be applied to drive optimization of graph structures. References: http: //cs. uic. edu/~jlillis/pubs. html S/P/SP-tree executables: http: //eda. cs. uic. edu/software. html
Thank you
Timing-Driven Placement Legalization After embedding, cells could overlap in the placement Moving cells on critical path may harm timing Ripple-move strategy [Hur, Lillis, ICCAD 2000] Modified to include both timing and wiring information Overlap Empty
Timing-Driven Placement Legalization After embedding, cells could overlap in the placement Moving cells on critical path may harm timing Ripple-move strategy [Hur, Lillis, ICCAD 2000] Modified to include both timing and wiring information Overlap Empty
Timing-Driven Placement Legalization After embedding, cells could overlap in the placement Moving cells on critical path may harm timing Ripple-move strategy [Hur, Lillis, ICCAD 2000] Modified to include both timing and wiring information Overlap Empty
Timing-Driven Placement Legalization After embedding, cells could overlap in the placement Moving cells on critical path may harm timing Ripple-move strategy [Hur, Lillis, ICCAD 2000] Modified to include both timing and wiring information Overlap Empty
Timing-Driven Placement Legalization After embedding, cells could overlap in the placement Moving cells on critical path may harm timing Ripple-move strategy [Hur, Lillis, ICCAD 2000] Modified to include both timing and wiring information Overlap Empty
Timing-Driven Placement Legalization Identify overlap Identify up to 4 closest empty (one in each quadrant) Construct gain graph monotone paths from congested to free slots edges: gain of moving a cell to neighboring slot wire and timing gain find max-gain path and perform ripple-move gain could be negative Overlap Empty
Review
- H-tree clock tree synthesis
- Bloom's taxonomy synthesis
- Difference between plus tree and elite tree
- Definition of complete binary tree
- One way threaded binary tree
- Problem tree solution tree
- Objective tree sample
- Problem tree and objective tree
- Difference between general tree and binary tree
- Rapid throwaway prototype
- What are online platform tools
- Prototyping and rapid application development
- Characteristics of rapid application development
- Application of binary tree
- Application object tree
- Application of b-tree
- Linked representation
- John fraser ss
- Computational thinking algorithms and programming
- Design and analysis of algorithms syllabus
- Ajit diwan
- Association analysis: basic concepts and algorithms
- Computer arithmetic: algorithms and hardware designs
- Kevin wayne princeton
- Data structures and algorithms tutorial
- Algorithms for select and join operations
- Algorithms and flowcharts
- Undecidable problems and unreasonable time algorithms.
- Information retrieval data structures and algorithms
- Data structures and algorithms bits pilani
- Cluster analysis: basic concepts and algorithms
- Randomized algorithms and probabilistic analysis
- Design and analysis of algorithms introduction
- Algorithms for query processing and optimization
- Synchronization algorithms and concurrent programming
- Parallel and distributed algorithms
- Ajit diwan iitb
- Dot product rules
- Cluster analysis basic concepts and algorithms
- Cluster analysis basic concepts and algorithms
- Aries in dbms
- Digital signal processor architecture
- Boris epshtein
- Data structures and algorithms
- Data structures and algorithms
- Exercise 24
- Binary search in design and analysis of algorithms
- Introduction to the design and analysis of algorithms
- Waterloo data structures and algorithms
- Signature file structure in information retrieval system
- Undecidable problems and unreasonable time algorithms
- Design and analysis of algorithms
- Design and analysis of algorithms
- Data structures and algorithms
- Cluster analysis basic concepts and algorithms
- Comp 482
- Species tree
- Winner tree loser tree
- Loser tree
- 2-3-4 trees
- Poison summary
- The bw-tree: a b-tree for new hardware platforms
- Parse tree vs syntax tree
- Ticks and dog relationship
- Issue tree mckinsey
- John coburn art