MinRegister Retiming Under Simultaneous Timing and Initial State
- Slides: 39
Min-Register Retiming Under Simultaneous Timing and Initial State Constraints Aaron Hurst Dec. 2007
Introduction l Retiming is the structural relocation of registers such that output functionality is preserved l Transformation with many means and many ends l l l Minimizing worst-case delay Minimizing number of registers Either of the above under constraints Optimally or heuristically Other… ? In an industrial setting? l 1. 2. Diminishing returns in combinational optimization Coming of age of sequential equivalence checking
Motivation Register minimization is uniquely valuable l l Area and power reduction l Clock network: dynamic power, design effort Testability: scan chain depth Verification: state representation l l Must satisfy several constraints l l Timing, initializability, congestion, electrical. . . Constrained minimum-register retiming is hard l l Current solutions not scalable
Outline Core problem 1. l unconstrained register minimization 2. Constraints: Timing 3. Constraints: Initializability 4. Other constraints
Flow-Based Register Minimization A New Approach to Unconstrained Register Minimization
Background l Register minimization an “original” problem in retiming r(v): retiming lag wi(u, v): initial reg count l An instance of minimum-cost network flow [Goldberg 97] But we can do better. . .
Orientation l Consider one combinational frame of the circuit l A single directed acyclic graph of combinational logic l l Nodes: Edges: Inputs: Outputs: logic gates pair-wise net connections register outputs, primary inputs registers inputs, primary outputs primary inputs register outputs
Cuts in a Frame l Consider circuit w/o primary IOs and their transitive fan-in/out l l Retiming = a complete cut of the DAG Number of registers = l Problem consists of finding minimum cut
Max-Flow Formulation l Min-cut/Max-flow Duality l l l Edges in graph are assigned a capacity Min-cut width = Max-flow through graph Min-cut derived from residual flow l Partition graph into {S, R} by source reachability l l l sink S = augmenting path from source s R = augmenting path from source s Min-cut is not unique l Selects one with min movement of registers source
Constraint Type #1 l What are the effect of unconstrained edges? v u l Never saturated; always present in residual graph l l Destination node v always reachable from source u Minimum cut will never lie between u and v A useful tool for constraining solution. .
A Necessary Modification l l Min-cut guarantees every path will be cut at least once Retiming requires that every path is cut exactly once R R 2’ R 1 R 2 R 3’ R 3 S l Observation: a path must cross cut from R → S l Solution: Use unconstrained flow to prevent reverse edges
Fanout Sharing l l Nets were decomposed into flow arcs False model of register count l l l 1 11 Reality: one register per net / hyper-edge “Fanout sharing” 1 1 Introduce a structure to simulate fanout-sharing 1 1 1 1
Single Iteration l 1. 2. What is the final flow graph? Reverse Edges Fanout-sharing 1 1 1 l Unitary Flow Simplification l l Binary marking scheme Flow computed on original netlist 1 1 1 1
Multiple Frames l Globally minimum solution requires moving registers beyond one frame l l Corresponding min-cut may stretch across multiple combinational frames Solution: Repeat over single frame l Terminate when no further change l Then, consider backward direction l Final result is provably optimal unrolled circuit
Overall Algorithm Start Forward retiming Backward retiming Block Fan-out Cone of PIs Block Fan-in Cone of POs Compute Max-Flow y Implement Min-Cut n Improv. ? y Implement Min-Cut Improv. ? n l Forward retiming is preferred due to initial state computation Done
Asymptotic Analysis l Single iteration runtime limited by maximum flow solver [Goldberg 95] l Or, using unitary flow simplification… l Total number of iterations is bounded by |R|
Experimental Results l Applied to {ISCAS, ITC, Open. Cores, Altera} benchmarks. . . Register Savings per Iteration l The number of iterations is quite small l Register count is monotonically decreasing l l Runtime can be bounded Runtime is 5 x faster than minimum-cost formulation l <0. 01 s for 70% of benchmarks
Summary Key points: 1. Optimal 2. Minimum register movement 3. Fast. . . both absolutely and relatively 4. Scalable: early termination with improvement
Timing Constraints
Background l Timing constraints make problem much harder D(u, v): path delay W(u, v): path reg count l Complexity: pair-wise path delay constraints l Enumeration alone is O(n 3) l Simplification: prune unnecessary constraints l Minaret: use skew-equivalence to find ASAP and ALAP register positions [Sapatnekar 99]
Conservative Constraints l Consider retiming a register l Two timing constraints made potentially critical in each direction max minarrival l If other end of timing constraints is fixed. . . l l max minarrival Bound on absolute positions of register All such constraints can be computed with two-pass STA l Linear time
Exact Constraints l Fixing other end of timing path is conservative l l May also move in the same direction, relaxing constraint If other end of timing constraints is not fixed. . . l Conditional constraints “Can retime R 2 past v 2 only if R 1 is retimed past v 1” R 1 v 1 R 2 v 2 unit delay max delay 3 l Computed from edge of bounded transitive fan-in/out cone - delay - register count
Constraint Implementation l Conservative Constraints: l l Indicate nodes to be removed from problem Exact Constraints: l Implemented as unconstrained edges v 1 l v 2 Cut can only move beyond v 2 if it moves beyond v 1
Refinement l All timing constraints met by initial circuit l l l Guarantees flow from source to sink remains finite Iteratively tighten conservative constraints into exact ones Simplification: Only constraints limiting area improvement Ccons = all Cexact = minexact+cons compute cut with Ccons compute cut w/o Ccons y convert ccons between two cuts into cexact Conservative Constraint any? n
Building Intuition 4 5 6 unit delay max delay 3 4 No Constraint Conservative Constraint Exact Constraint l minexact+cons Constraints impose relations across multiple clock cycles
Building Intuition unit delay max delay 2 No Constraint Conservative Constraint Exact Constraint l minexact+cons Cycles in constraints lock retiming moves to be ‘in-step’
Experimental Results Max path delay initial period, min path delay ≥ 0 l l Average number of exact constraints = 1. 1% of design size
Summary Key points: 1. Inherits benefits of flow-based retiming 1. 2. 3. Optimal Fast Monotonic improvement 2. Problem reduced using both timing and area criticality 3. Advanced timing model and constraints 4. Scalable: intermediate solutions are timing-feasible
Initializability Constraints
Problem l Retimed circuit must preserve initialization behavior l Accomplished by: 1. 2. Additional combinational logic Identifying an equivalent initial state forward retiming: simulation 0 ? backward retiming: l SAT Backward retiming jeopardizes initializability 0
Background l How to transform an uninitializable circuit into an initializable one? l ‘Prayer’ minimizing register movement maximizes initializability [Pan 99] l ‘Slash and Burn’ incrementally tighten bound on negative retiming lag [Stok 95] l ‘Brute Force’ mixed-integer linear program [Sapatnekar 97]
Feasibility Constraints SAT problem with variables Feasibility Constraint : l Parital cut of unrolled circuit Sufficient to imply infeasibility l ? l At least one element must be removed UNSAT with only variables in TFO( ) l ? l 0 Built incrementally as retiming progresses 0 l Variables switched off in SAT with additional per-clause variables z
Feasibility Constraints l Topologically order circuit y l l Fast: Faster: 0 y v ? SAT ? n n ? binary search on v 0 0 = Incremental SAT UNSAT core can be used to localize conflict
Constraint Type #2 Penalty Structure 0 l source Adds exactly one unit flow path Delayed insertion until comes into frame l 0 l Biases against cuts below feasibility constraint l l Closest cut of width +1 returned Cut is squeezed forward l New cut of width +1 is closest and therefore most initializable l If it isn’t. . . additional penalty needed sink
Algorithm Cfeas = compute min-cut constrained by Cfeas initializable? n Cfeas = y binary search on v n l SAT y n SAT y v Complexity: already NP-complete from test for initializability
Experimental Results l Equivalent init. state after min-reg retiming for most designs Only one design was not initializable: s 400 Can be easily lost if backward retiming invoked multiple times l l l A harder problem: randomized initial states Original Name Gates Regs Infeasible Feasible Min-Register Regs Avg. | | Runtime s 400 0. 3 k 21 18 +1 8. 0 0. 08 s oc_aes_core 16. 6 k 402 395 +3 2. 0 2. 55 s oc_vga_lcd 17. 1 k 1108 1087 +1 1. 09 s nut_003 6. 6 k 484 450 +3 1. 0 1. 41 s radar 12 71. 1 k 3875 3771 +27 2. 3 108. 3 s oc_wb_dma 29. 2 k 1775 1757 +2 3. 5 5. 70 s oc_minirisc 3. 9 k 289 271 +2 1. 0 0. 49 s
Summary Key points: 1. Optimal 2. Compatible with timing constraints 3. Worst-case bound non-polynomial, but fast in practice
Additional Applications l Physical constraints l l Electrical constraints l l Placement congestion: penalty structures Capacitive load on clock network drivers: penalty structures Others?
Contribution l New formulation of register minimization problem l Constraints of different forms can be added to problem 1. 2. 3. Timing Initializability Other l Improves upon best practices within each sub-problem Unified solution to synthesis-ready retiming l Scalable l
- Vivado retiming
- Retiming example
- Retiming example
- What is initial state + goal state in search terminology?
- Initial and final value theorem
- Final value theorem
- Niproxim
- How to solve simultaneous equations
- Simultaneous grinding and spray drying
- Simultaneous heat and mass transfer
- Simultaneous integration and sequential integration
- Contoh simultan
- Contoh kasus model persamaan simultan dan penyelesaiannya
- Persamaan linier 1 variabel
- Good state graphs and bad state graphs in software testing
- Absorptive state vs postabsorptive state
- Glycogen breakdown
- State state graphs and transition testing
- Elemts of music
- Worksheet on simultaneous linear equations
- Linear equation solver with steps
- What is simultaneous multithreading
- Simultaneous equations step by step
- If the signs are the same in simultaneous equations
- Elimination method simultaneous equations
- Simultaneous equations
- Elimination method example
- Persamaan linier simultan
- Emergency evacuation training
- Simultaneous extinction
- Simultaneous extinction
- Simultaneous access
- Simultaneous equations dr frost
- Simultaneous interpreting definition
- Quadratic simultaneous equation
- Simultaneous interpretation in turkey
- Force summation biomechanics
- Kagan strategies for math
- Simultaneous sampling daq
- Quadratic logarithmic equations