Retiming and Resynthesis Outline Retiming and Resynthesis Rn

  • Slides: 39
Download presentation
Retiming and Re-synthesis Outline: • Retiming and Resynthesis (Rn. R) • Resynthesis of Pipelines

Retiming and Re-synthesis Outline: • Retiming and Resynthesis (Rn. R) • Resynthesis of Pipelines 1

Optimizing Sequential Circuits by Retiming Netlist of Gates Netlist of gates and registers: Inputs

Optimizing Sequential Circuits by Retiming Netlist of Gates Netlist of gates and registers: Inputs Various Goals: Outputs – Reduce clock cycle time – Reduce area • Reduce number of latches 2

Retiming Problem – Pure combinational optimization can be myopic since relations across register boundaries

Retiming Problem – Pure combinational optimization can be myopic since relations across register boundaries are disregarded Solutions – Retiming: Move register(s) so that • clock cycle decreases, or number of registers decreases and • input-output behavior is preserved – Rn. R: Combine retiming with combinational optimization techniques • Move latches out of the way temporarily • optimize larger blocks of combinational 3

Circuit Represetation [Leiserson, Rose and Saxe (1983)] Circuit representation: G(V, E, d, w) –

Circuit Represetation [Leiserson, Rose and Saxe (1983)] Circuit representation: G(V, E, d, w) – – V set of gates E set of wires d(v) = delay of gate/vertex v, (d(v) 0) w(e) = number of registers on edge e, (w(e) 0) 4

Circuit Representation Example: Correlator + 0 Host 0 (x, y) = 1 if x=y

Circuit Representation Example: Correlator + 0 Host 0 (x, y) = 1 if x=y 0 otherwise 7 0 0 2 3 0 3 Graph (Directed) a Circuit b Every cycle in Graph has at least one register i. e. no combinational loops. Operation delay 3 + 7 5

Preliminaries For a path p : Clock cycle 0 0 7 Path with 0

Preliminaries For a path p : Clock cycle 0 0 7 Path with 0 0 2 3 0 w(p)=0 3 For correlator c = 13 6

Basic Operation • Movement of registers from input to output of a gate or

Basic Operation • Movement of registers from input to output of a gate or vice versa Retime by -1 Retime by 1 • Does not affect gate functionalities • A mathematical definition: retardation – r: V Z, an integer vertex labeling – wr(e) = w(e) + r(v) - r(u) for edge e = (u, v) 7

Basic Operation Thus in the example, r(u) = -1, r(v) = -1 results in

Basic Operation Thus in the example, r(u) = -1, r(v) = -1 results in 0 0 7 0 2 3 0 0 u 0 v 0 3 7 1 1 3 1 v u 0 3 • For a path p: s t, wr(p) = w(p) + r(t) - r(s) • Retardation – r: V Z, an integer vertex labeling – wr(e) =w(e) + r(v) - r(u) for edge e= (u, v) – A retiming r is legal if wr(e) 0, e E 8

Retiming for minimum clock cycle Problem Statement: (minimum cycle time) Given G (V, E,

Retiming for minimum clock cycle Problem Statement: (minimum cycle time) Given G (V, E, d, w), find a legal retiming r so that is minimized Retiming: 2 important matrices • Register weight matrix • Delay matrix 9

Retiming for minimum clock cycle V 0 V 1 V 2 V 3 W

Retiming for minimum clock cycle V 0 V 1 V 2 V 3 W = register path weight matrix (minimum # latches on all paths between u and v) D = path delay matrix (maximum delay on all paths between u and v) 0222 0000 0220 0 13 10 7 V 0 V 1 V 2 V 3 7 0 v 0 0 0 2 3 V 1 W V 0 V 1 V 2 V 3 0 V 2 D 3 3 13 10 6 6 3 13 10 7 c p, if d(p) then w(p) 1 10

Conditions for Retiming Assume that we are asked to check if a retiming exists

Conditions for Retiming Assume that we are asked to check if a retiming exists for a clock cycle Legal retiming: wr(e) 0 for all e. Hence wr(e) = w(e) = r(v) - r(u) 0 or r (u) - r (v) w (e) For all paths p: u v such that d(p) , we require wr(p) 1 – Thus Or take the least w(p) (tightest constraint) r(u)-r(v) W(u, v)-1 Note: this is independent of the path from u to v, so we just need to apply it to 11 u, v such that D(u, v)

Solving the constraints • All constraints in difference-of-2 -variable form • Related to shortest

Solving the constraints • All constraints in difference-of-2 -variable form • Related to shortest path problem W Correlator: = 7 D>7: Legal: r(u)-r(v) w(e) r(u)-r(v) W(u, v)-1 D W V 0 V 1 V 2 V 3 0222 0000 0220 0 v 0 0 0 13 10 7 3 3 13 10 13 13 10 7 7 0 0 2 6 6 3 13 3 v 1 0 3 V 2 12 V 0 V 1 V 2 V 3

Solving the constraints • Do shortest path on constraint graph: (O(|V|3 )). • A

Solving the constraints • Do shortest path on constraint graph: (O(|V|3 )). • A solution exists if and only if there exists no negative weighted cycle. D>7: Legal: r(u)-r(v) w(e) r(u)-r(v) W(u, v)-1 r(0) -1 0 -1 2 0 -1 r(2) -1 1 1 0, -1 1 r(1) 0, -1 r(3) 0 Constraint graph A solution is r(v 0) = r(v 3) = 0, r(v 1) = r(v 2) = -1 13

Retiming To find the minimum cycle time, do a binary search among the 0

Retiming To find the minimum cycle time, do a binary search among the 0 v 0 0 7 entries of the D matrix (0( V 3 log V )) 0 2 W 0 3 0 v 1 V 0 V 1 V 2 V 3 3 V 2 Retimed correlator: + Retime Host Clock cycle = 3+3+7=13 0222 0000 0220 D V 0 V 1 V 2 V 3 0 13 10 7 3 3 13 10 6 6 3 13 10 7 + Host Clock cycle = 7 a b 14 V 0 V 1 V 2 V 3

Retiming: 2 more algorithms 1. Relaxation based: – – Repeatedly find critical path; retime

Retiming: 2 more algorithms 1. Relaxation based: – – Repeatedly find critical path; retime vertex at end of path by +1 (O( V E log V )) v u +1 Critical path 2. Also, Mixed Integer Linear Program formulation 15

Retiming for minimum area (minimum # latches) Goal: minimize number of registers used where

Retiming for minimum area (minimum # latches) Goal: minimize number of registers used where av is a constant. 16

Minimum registers formulation Minimize: Subject to: wr(e) =w(e) + r(v) - r(u) 0 •

Minimum registers formulation Minimize: Subject to: wr(e) =w(e) + r(v) - r(u) 0 • Reducible to a flow problem 17

Retiming and resynthesis: motivation Goal: incorporate combinational optimization into sequential optimization • Naïve approach:

Retiming and resynthesis: motivation Goal: incorporate combinational optimization into sequential optimization • Naïve approach: carve out combinational regions, do optimization on each region. Only local gains made. • Can we do any better? Rn. R: a new approach Sentovich, Malik, Brayton and. Sangiovanni-Vincentelli (‘ 89) 3 step approach 1. 2. 3. Move registers to boundary of circuit Optimize network Move registers back in an optimal way 18

Rn. R: circuit representation Circuit representation: communication graph – internal/peripheral edges – edge-weight =

Rn. R: circuit representation Circuit representation: communication graph – internal/peripheral edges – edge-weight = register count c 1 a 0 2 b 1 1 i j internal peripheral 19

Extended Retiming • • Move register to the periphery Negative edge-weights permitted -1 negative

Extended Retiming • • Move register to the periphery Negative edge-weights permitted -1 negative latch 1 • A negative latch has the interpretation that it advances its output by 1 clock cycle instead of delaying it 20

Peripheral Retiming A retiming is called a peripheral retiming if it results in all

Peripheral Retiming A retiming is called a peripheral retiming if it results in all internal edges having zero weight Peripheral edges can have negative weight 1 2 n All internal edge weights are 0 1 2 k k peripheral weights n 21

Peripheral Retiming A circuit that undergoes peripheral retiming followed by a legal retiming, i.

Peripheral Retiming A circuit that undergoes peripheral retiming followed by a legal retiming, i. e. one that results in all weights 0 is “functionally equivalent” to the original circuit Functional equivalence: equivalence of finite state automata: But we have to be careful about the initial conditions and initializing sequences. The resulting circuit may only exhibit equivalence after an appropriate delay. See [Singhal et. al ICCAD 1995]. This holds even for regular retiming 22

Peripheral Retiming - an example c 1 a 0 c 0 Peripheral retiming 2

Peripheral Retiming - an example c 1 a 0 c 0 Peripheral retiming 2 0 1 Not possible for all circuits: 0 i j 1 j i 0 b a b 1 2 0 a 0 b 1 0 0 c 0 d 0 o 1 d 0 o 2 23

Path Weight Matrix (PWM) Matrix W – rows: inputs – columns: outputs o 2

Path Weight Matrix (PWM) Matrix W – rows: inputs – columns: outputs o 2 o 1 1 i a 1 j b k o 1 I 1 j 0 k * o 2 * ~ 0 24

PWM and Peripheral Retiming Satisfiable path weight matrix: 1. Wij ~, i, j and

PWM and Peripheral Retiming Satisfiable path weight matrix: 1. Wij ~, i, j and 2. i, j, such that Wij= I + j , Wij * Peripheral retiming possible matrix is satisfiable – i, j specify registers on peripheral edge – some i, j can be negative Complexity: linear in size of communication graph 25

Path Weight Matrix - generation From inputs to outputs: generate output columns of W.

Path Weight Matrix - generation From inputs to outputs: generate output columns of W. Example: o 2 [ 1 0 * ] o 1 [*~0] Paths from inputs to node 1 i [0**] o 2 * ~ 0 ( [ * 0 * ]+1) & [ * 0 0 ] =[*1*]&[*00] =[*~0] [10*] ([ 0 * * ]+1) & ([ * 0 * ]+0) =[1**]&[*0*] =[10*] o 1 I 1 j 0 k * a 1 j [*0*] ([ * 0 * ]+0) & ([ * * 0 ]+0) =[*0*]&[**0] =[*00] b k [**0] 26

Computing , Each constraint of the form I + j = Wij Procedure 1.

Computing , Each constraint of the form I + j = Wij Procedure 1. 2. 3. 4. set 1 = 0 use first row to generate ‘s determine ‘s from ‘s check for consistency Example: o 0/2 c 1/0 2/0 a b o 1/1 i 2 j 3 1/0 i j 1 = 0 2 = 1 1 = 2 27

Computing , Example: i j 0 0 a 0 b o 1 o 2

Computing , Example: i j 0 0 a 0 b o 1 o 2 i 0 0 j 0 1 1 0 0 c 0 d 0 1+ 1 = 0 o 1 d 0 o 2 2+ 1 = 0 1+ 2 = 0 2+ 2 = 1 --------------- 1 - 2 = 0 1 - 2 = -1 contradiction 28

Optimizing Acyclic Sequential Circuits Acyclic circuits with satisfiable W – Do peripheral retiming putting

Optimizing Acyclic Sequential Circuits Acyclic circuits with satisfiable W – Do peripheral retiming putting i, j registers at the I/O – Resynthesize interior – Do a legal retiming (move registers in) • May not always be possible Acyclic circuits with unsatisfiable W – Identify maximal sub-circuits with satisfiable W – Cut connections – Repeat previous procedure 29

Acyclic Sequential Circuits cut circuits Note that both of the cut circuits can be

Acyclic Sequential Circuits cut circuits Note that both of the cut circuits can be peripherally retimed, unlike the original. 30

Optimizing Cyclic Sequential Circuits Cyclic circuits 1. Make circuit acyclic by breaking cycles 1.

Optimizing Cyclic Sequential Circuits Cyclic circuits 1. Make circuit acyclic by breaking cycles 1. Different feedback-cuts give different W’s 2. Find sub-circuits with satisfiable W’s 3. Repeat procedure 31

FSM Optimization X Y OUT A B Break feedback X X Y Y OUT

FSM Optimization X Y OUT A B Break feedback X X Y Y OUT B A 32

FSM Optimization Peripheral retiming combinational X X Y Y OUT B A X Resynthesize

FSM Optimization Peripheral retiming combinational X X Y Y OUT B A X Resynthesize X Y Y OUT B A 33

FSM Optimization Retime X X Y Y OUT B X A Y Reconnect OUT

FSM Optimization Retime X X Y Y OUT B X A Y Reconnect OUT B A 34

FSM Optimization Original circuit OUT A B Resynthesized circuit OUT A B 35

FSM Optimization Original circuit OUT A B Resynthesized circuit OUT A B 35

Resynthesis of Pipelines Goal: Performance optimization of pipeline circuits Example: Pipeline circuit C In

Resynthesis of Pipelines Goal: Performance optimization of pipeline circuits Example: Pipeline circuit C In Ii I 1 Cn Ci On Oi In Peripheral Retiming Cn -(n -1) Ii Ci -(i-1) C 1 O 1 I 1 C 1 On n -1 Oi i-1 Combinational circuit O 1 36

Resynthesis of Pipelines Parameters: – – A R c C vector of arrival times

Resynthesis of Pipelines Parameters: – – A R c C vector of arrival times for inputs vector of required times for outputs target clock cycle circuit Pipeline performance problem: PP(CP, c. P, AP, RP) Combinational performance problem: PC(CC, c. C, AC, RC) 37

Resynthesis of Pipelines Problem Transformation: – Relax PP(CP, c. P, AP, RP) to P

Resynthesis of Pipelines Problem Transformation: – Relax PP(CP, c. P, AP, RP) to P ‘P(CP, cp+ , AP, RP) • where = largest possible single gate delay – Convert: pipeline P ‘P to combinational problem P ‘C 38

Perfomance Synthesis of Pipeline • Arrival and Required Times for P ‘C : In

Perfomance Synthesis of Pipeline • Arrival and Required Times for P ‘C : In Ii I 1 Cn Ci On Oi In Peripheral Retiming Cn On (i-1)c+Ri Ii Ci Oi Combinational circuit (i-1)c+Ai C 1 O 1 I 1 C 1 O 1 Theoretical Contribution: – If P ‘C(cp+ ) has a solution then retiming yields a solution to P ‘P (c p + ) – If there is a solution to PP (cp) then peripheral retiming yields 39 a solution to P ‘C (cp+ )