Development and Application of Tree Synthesis Algorithms John

  • Slides: 65
Download presentation
Development and Application of Tree Synthesis Algorithms John Lillis University of Illinois Chicago

Development and Application of Tree Synthesis Algorithms John Lillis University of Illinois Chicago

Overview Part I: Buffer tree synthesis Formulations S/P/SP-tree Part II: Fanin tree embedding/replication Optimization

Overview Part I: Buffer tree synthesis Formulations S/P/SP-tree Part II: Fanin tree embedding/replication Optimization across gate boundaries Interaction with placement

Part I: Buffer Tree Synthesis

Part I: Buffer Tree Synthesis

Premises of Work MAIN PREMISE: Powerful Buffer Tree Synthesis is a Core for Modern

Premises of Work MAIN PREMISE: Powerful Buffer Tree Synthesis is a Core for Modern Design Conservation of Resources Crucial Estimate: 700 -800 K Buffers/Chip in Near Future Cost-Performance Tradeoffs General Cost Model Topology / Embedding / Buffering Spaces Should be Explored Simultaneously 2 -Phase Approach Not Robust / Predictable Particularly Troublesome in Presence of Blockages

Max Slack Weakness Overoptimized Slack subtrees Cost

Max Slack Weakness Overoptimized Slack subtrees Cost

Problem Formulation Given: Location of Driver and Sinks Technology Parameters Timing Requirements Buffer Library

Problem Formulation Given: Location of Driver and Sinks Technology Parameters Timing Requirements Buffer Library Target Routing Graph (Blockages) Find: Topology in corresponding space its Embedding and Buffer Assignment Minimizing Cost s. t. Timing Constraints

Philosophy of Constraint Imposition Goals: Predictable Behavior Absence of ad-hoc Heuristics Main Idea: Optimally

Philosophy of Constraint Imposition Goals: Predictable Behavior Absence of ad-hoc Heuristics Main Idea: Optimally Solve Constrained Variant of the Problem Well-Designed Constraints Produce Large Flexible Solution Space Tractability Constraints: Topology Space Full space Constrained space

Topology Embedding Flexibility s c s a c a b s b c a

Topology Embedding Flexibility s c s a c a b s b c a b

Target Routing Graph Construction Routing blockage s a c b Buffer blockage

Target Routing Graph Construction Routing blockage s a c b Buffer blockage

Algorithmic Description Timing-Driven Maze Routing Topology Embedding S-Tree P-Tree SP-Tree

Algorithmic Description Timing-Driven Maze Routing Topology Embedding S-Tree P-Tree SP-Tree

Algorithmic Description Timing-Driven Maze Routing Topology Embedding S-Tree P-Tree SP-Tree

Algorithmic Description Timing-Driven Maze Routing Topology Embedding S-Tree P-Tree SP-Tree

Core Subroutine: Timing-Driven Maze Routing Generalization of [Hur, et. al. ; TCAD Feb 2000]

Core Subroutine: Timing-Driven Maze Routing Generalization of [Hur, et. al. ; TCAD Feb 2000] Single Target, Multiple Sources Finds non-dominated paths Simultaneous Buffer Insertion Handling of Blockages in Topology Synthesis Target Sources

Algorithmic Description Timing-Driven Maze Routing Topology Embedding S-Tree P-Tree SP-Tree

Algorithmic Description Timing-Driven Maze Routing Topology Embedding S-Tree P-Tree SP-Tree

Topology Embedding Goal: Obtain timing feasible embedding / buffering of given topology, minimizing cost

Topology Embedding Goal: Obtain timing feasible embedding / buffering of given topology, minimizing cost Solution: Dynamic Programming (bottom-up)

Solution sets A(u, v) represents a set of solutions that correspond to Vertex u

Solution sets A(u, v) represents a set of solutions that correspond to Vertex u in Topology Vertex v in Target Graph A 1 b = Join(A 1. left , A 1. right) A 1 = Gen. Dijsktra(A 1 b) A(u, v) u v

Algorithmic Description Timing-Driven Maze Routing Topology Embedding S-Tree P-Tree SP-Tree

Algorithmic Description Timing-Driven Maze Routing Topology Embedding S-Tree P-Tree SP-Tree

S-Tree Notion of localities: Spatial Temporal Polarity Partition sinks into 2 sets based on:

S-Tree Notion of localities: Spatial Temporal Polarity Partition sinks into 2 sets based on: estimated timing criticality signal polarity requirements some other criteria. . . Subtrees can break topology and “stitch” at different place

S-Tree Topology Space s Sink partition: {a, c, d} {b} d c a b

S-Tree Topology Space s Sink partition: {a, c, d} {b} d c a b s s b d a c d b a c

S-Tree Recurrence A 1 b = Join(A 1. left , A 1. right) A

S-Tree Recurrence A 1 b = Join(A 1. left , A 1. right) A 1 = Gen. Dijsktra(A 1 b) A 2 b = Join(A 2. left , A 2. right) A 2 = Gen. Dijsktra(A 2 b) A 12 b = Join(A 12. left , A 12. right) + Join(A 1 , A 2) A 12 = Gen. Dijsktra(A 12 b)

S-Tree Topology Space s s Initial topology s c b a f d c

S-Tree Topology Space s s Initial topology s c b a f d c a e b f d e s s a c c a b d f e c b f d e

Incorporating polarity 4 sets: critical & positive signal polarity critical & negative non-critical &

Incorporating polarity 4 sets: critical & positive signal polarity critical & negative non-critical & positive non-critical & negative Other partitioning schemes. . .

Algorithmic Description Timing-Driven Maze Routing Topology Embedding S-Tree P-Tree SP-Tree

Algorithmic Description Timing-Driven Maze Routing Topology Embedding S-Tree P-Tree SP-Tree

P-Tree Topology Space All Permutation-Constrained Topologies a b c d e b c d

P-Tree Topology Space All Permutation-Constrained Topologies a b c d e b c d s e a b c d e

Limitations of P-Tree Space Isolation of Critical / Non-Critical Subtrees: “Temporal-Locality” Min WL May

Limitations of P-Tree Space Isolation of Critical / Non-Critical Subtrees: “Temporal-Locality” Min WL May Not Produce Min Cost Driver Critical Non-critical

Algorithmic Description Timing-Driven Maze Routing Topology Embedding S-Tree P-Tree SP-Tree

Algorithmic Description Timing-Driven Maze Routing Topology Embedding S-Tree P-Tree SP-Tree

SP-Tree Combine everything said so far. . . From P-Tree Spatial locality Robustness From

SP-Tree Combine everything said so far. . . From P-Tree Spatial locality Robustness From S-Tree Temporal locality Polarity locality Ability to fix “topology problems” by “stitching”

Solution Space Entire space SP-Tree S-Tree Fixed topo.

Solution Space Entire space SP-Tree S-Tree Fixed topo.

Experiments Randomly generated nets Non-uniform required arrival time Non-uniform sink input capacitance Buffer-biased cost

Experiments Randomly generated nets Non-uniform required arrival time Non-uniform sink input capacitance Buffer-biased cost Interested in: Min cost feasible solution Max slack solution for verification Runtime More details in the paper. . .

Algorithms for Experiments S-Tree P-Tree SP-Tree RMP [Cong, Yuan; DAC 2000] RMP-Quick [Cong, Yuan;

Algorithms for Experiments S-Tree P-Tree SP-Tree RMP [Cong, Yuan; DAC 2000] RMP-Quick [Cong, Yuan; DAC 2000]

Results Net 2 -06 # buffers Min cost feasible Max slack

Results Net 2 -06 # buffers Min cost feasible Max slack

Results Net 2 -08 # buffers Min cost feasible Max slack

Results Net 2 -08 # buffers Min cost feasible Max slack

Results Net 2 -12 # buffers Min cost feasible Max slack

Results Net 2 -12 # buffers Min cost feasible Max slack

SP-Tree vs. P-Tree

SP-Tree vs. P-Tree

Conclusions Key Concepts: General Cost Models Routing Congestion Buffer Congestion Orthogonal Separation of Spatial

Conclusions Key Concepts: General Cost Models Routing Congestion Buffer Congestion Orthogonal Separation of Spatial and Temporal Locality Polarity Requirements Routing and Buffer Blockages Targets: Small-to-Medium Sized Signal Nets Results Summary Highly Cost-Efficient, High Performance Solutions Substantially Outperforms Prior Approaches in Solution Quality and Runtime

Part II: Fanin Tree Embedding/Replication

Part II: Fanin Tree Embedding/Replication

Replication Overview • Hrkic, Lillis, Beraudo (DAC 04, IWLS 04) • Concept: Netlist structure

Replication Overview • Hrkic, Lillis, Beraudo (DAC 04, IWLS 04) • Concept: Netlist structure limits potential of timing-driven placement • Difficult for top-down synthesis to fix • Main issue: inherently non-monotone paths • Approach (Hrkic, Lillis; DAC 04) touches on placement, synthesis (netlist perturbation) and routing.

Logic Replication Duplicate logic cell Preserve functionality Improve timing Place / Move cells Adjust

Logic Replication Duplicate logic cell Preserve functionality Improve timing Place / Move cells Adjust connections A B CR C C D E

Early Work Use replication to straighten I/O paths Local monotonicity [Beraudo, Lillis, DAC 2003]

Early Work Use replication to straighten I/O paths Local monotonicity [Beraudo, Lillis, DAC 2003] Sequence of 3 cells on the path Incremental framework B D A A C C E D C R E B

Limitations of Local Monotonicity satisfied Still many non-monotone paths A B C D F

Limitations of Local Monotonicity satisfied Still many non-monotone paths A B C D F E

Replication Tree Approach [Hrkic et. al. DAC 04] Identify critical sink Extract critical fan-in

Replication Tree Approach [Hrkic et. al. DAC 04] Identify critical sink Extract critical fan-in tree (Replication Tree) Optimize fan-in tree (Fan-in Tree Embedding) Legalize placement

Slowest Paths Tree Focus on slowest paths Find slowest paths tree from critical sink

Slowest Paths Tree Focus on slowest paths Find slowest paths tree from critical sink Include paths within epsilon of current critical delay Focus on most critical portions of fan-in cone

Replication Tree Most circuits do not contain large fan-in trees due to reconvergence Given

Replication Tree Most circuits do not contain large fan-in trees due to reconvergence Given a critical tree temporarily replicate the entire tree Assign connections: if (u, v) is tree edge; connect u. R to v. R else connect u to v. R A C B A D E C B E B R DR D F AR F FR CR

Placement cost Replication is temporary Placement cost is crucial Cost discount for placing cell

Placement cost Replication is temporary Placement cost is crucial Cost discount for placing cell over its logical equivalent low cost for placing DR over D actual replication will never occur multiple low cost location possible A C B B CR R DR D E A R F FR

Fan-in Tree Embedding Given: Fan-in tree Placement of sink and inputs Arrival times at

Fan-in Tree Embedding Given: Fan-in tree Placement of sink and inputs Arrival times at inputs Placement and routing graph Find: Placement of internal tree nodes (Gates) Minimizing Cost s. t. Timing Constraints cost / delay tradeoff

Fan-in Tree Embedding Example C A B B sink Higher delay, lower cost sink

Fan-in Tree Embedding Example C A B B sink Higher delay, lower cost sink Lower delay, higher cost

Fan-out and Fan-in Tree C source A B C A sink B Bottom-up Top-down

Fan-out and Fan-in Tree C source A B C A sink B Bottom-up Top-down

Fan-in Tree Embedding Adaptation of S-Tree algorithm [Hrkic, Lillis, DAC 2002] Keep: Graph Model

Fan-in Tree Embedding Adaptation of S-Tree algorithm [Hrkic, Lillis, DAC 2002] Keep: Graph Model for Embedding Target Modified Timing-Driven Maze Routing multiple source, multiple targets at each vertex keep a list of non-dominated solutions S. Hur, J. Lillis, IEEE TCAD 2000 Modify: Top-down vs. Bottom-up Solution signature (c, t): c - cost t - signal arrival time Gate placement cost p(x, y)

Fan-in Tree Embedding Non-binary tree: multiple gate inputs Top-Down Dynamic Programming Maze Routing to

Fan-in Tree Embedding Non-binary tree: multiple gate inputs Top-Down Dynamic Programming Maze Routing to populate solutions deffered backtracking Join Solutions Modified maze routing c=px, y + c 1 +. . . + cn t=MAX(t 1, . . . , tn) Bottom-Up solution extraction backtrack to extract maze route extract gate placement Join

Aside: Legalization Use Modified Gain-Graph approach [Hur, Lillis; ICCAD 00] Modified to incorporate timing

Aside: Legalization Use Modified Gain-Graph approach [Hur, Lillis; ICCAD 00] Modified to incorporate timing information

Optimization Flow Identify critical sink (static timing analysis) Extract Fan-in Tree Replication Tree epsilon-Slowest

Optimization Flow Identify critical sink (static timing analysis) Extract Fan-in Tree Replication Tree epsilon-Slowest Paths Tree Embed Fan-in Tree Decide which cells to Replicate / Unify Legalize placement Repeat while there is improvement

Enhancements Post-process unification some cells placed close to their logical equivalents no automatic unification

Enhancements Post-process unification some cells placed close to their logical equivalents no automatic unification if one of the paths is non-critical it is possible to unify without degrading performace Unification in legalizer during ripple-move cell may be placed on top of its replica unify them and stop legalization epsilon-Slowest Paths Tree no randomization dynamically modify value of epsilon to enlarge the fan-in cone

Experiments Algorithms Timing-Driven VPR (Versatile Place and Route) [http: //www. eecg. toronto. edu/~vaughn/vpr. html]

Experiments Algorithms Timing-Driven VPR (Versatile Place and Route) [http: //www. eecg. toronto. edu/~vaughn/vpr. html] Local Replication [Beraudo, Lillis, DAC-03] RT-Embedding 20 MCNC Benchmark Circuits Interested in: Critical delay Amount of replication Wire usage Tests performed in FPGA domain Promissing results

Experimental Setup Obtain valid placement with Timing-Driven VPR placer Local Replication Tree Embedding Route

Experimental Setup Obtain valid placement with Timing-Driven VPR placer Local Replication Tree Embedding Route and Evaluate with Timing-Driven VPR router

Winf Wlow-stress wire length blocks Local Repl 0. 925 0. 927 1. 020 1.

Winf Wlow-stress wire length blocks Local Repl 0. 925 0. 927 1. 020 1. 003 RT-Embed 0. 858 0. 869 normalized 1. 084 to VPR Average values over all 20 circuits 1. 004 critical path delay Delay improved for all circuits Winf Wlow-stress Best improvement length blocks for circuit pdc: 0. 641 Runtime under 5% on 0. 927 the VPR flow Local Repl penalty 0. 925 RT-Embed 1. 003 0. 858 1. 004 0. 869 Delay improved for all circuits Best improvement for circuit pdc: 0. 641 Runtime penalty under 5% on the VPR flow wire 1. 020 1. 084

Replication Statistics Circuit ex 1010: 38 replications, 12 unifications

Replication Statistics Circuit ex 1010: 38 replications, 12 unifications

Ongoing Work Generalize to ASICs Include simultaneous buffering • Mitigation of legalization noise Preventing

Ongoing Work Generalize to ASICs Include simultaneous buffering • Mitigation of legalization noise Preventing (some) overlaps in embedding More sophisticated placement cost Reconvergence - arborescence approach Simultaneous technology (re-)mapping – Explore multiple Tree Topologies simultaneously (Universal Tree solver engine: U-Tree)

Review Trees are everywhere! Even in places where they seem to be absent Tree

Review Trees are everywhere! Even in places where they seem to be absent Tree based algorithms can be very strong in generality of formulation and predictability Enable connection to general placement/routing target Can capture tradeoffs between complex objectives Can sometimes be applied to drive optimization of graph structures. References: http: //cs. uic. edu/~jlillis/pubs. html S/P/SP-tree executables: http: //eda. cs. uic. edu/software. html

Thank you

Thank you

Timing-Driven Placement Legalization After embedding, cells could overlap in the placement Moving cells on

Timing-Driven Placement Legalization After embedding, cells could overlap in the placement Moving cells on critical path may harm timing Ripple-move strategy [Hur, Lillis, ICCAD 2000] Modified to include both timing and wiring information Overlap Empty

Timing-Driven Placement Legalization After embedding, cells could overlap in the placement Moving cells on

Timing-Driven Placement Legalization After embedding, cells could overlap in the placement Moving cells on critical path may harm timing Ripple-move strategy [Hur, Lillis, ICCAD 2000] Modified to include both timing and wiring information Overlap Empty

Timing-Driven Placement Legalization After embedding, cells could overlap in the placement Moving cells on

Timing-Driven Placement Legalization After embedding, cells could overlap in the placement Moving cells on critical path may harm timing Ripple-move strategy [Hur, Lillis, ICCAD 2000] Modified to include both timing and wiring information Overlap Empty

Timing-Driven Placement Legalization After embedding, cells could overlap in the placement Moving cells on

Timing-Driven Placement Legalization After embedding, cells could overlap in the placement Moving cells on critical path may harm timing Ripple-move strategy [Hur, Lillis, ICCAD 2000] Modified to include both timing and wiring information Overlap Empty

Timing-Driven Placement Legalization After embedding, cells could overlap in the placement Moving cells on

Timing-Driven Placement Legalization After embedding, cells could overlap in the placement Moving cells on critical path may harm timing Ripple-move strategy [Hur, Lillis, ICCAD 2000] Modified to include both timing and wiring information Overlap Empty

Timing-Driven Placement Legalization Identify overlap Identify up to 4 closest empty (one in each

Timing-Driven Placement Legalization Identify overlap Identify up to 4 closest empty (one in each quadrant) Construct gain graph monotone paths from congested to free slots edges: gain of moving a cell to neighboring slot wire and timing gain find max-gain path and perform ripple-move gain could be negative Overlap Empty

Review

Review