Synthesis of Asynchronous Control Circuits with Automatically Generated

  • Slides: 51
Download presentation
Synthesis of Asynchronous Control Circuits with Automatically Generated Relative Timing Assumptions Jordi Cortadella, University

Synthesis of Asynchronous Control Circuits with Automatically Generated Relative Timing Assumptions Jordi Cortadella, University Politècnica de Catalunya Mike Kishinevsky, Intel Corporation Steven M. Burns, Intel Corporation Ken Stevens, Intel Corporation Earlier contributions: Luciano Lavagno, Alex Kondratyev, Alex Yakovlev, Alexander Taubin

Outline • Why asynchronous • Relative timing • Reminder: design flow for asynchronous circuits

Outline • Why asynchronous • Relative timing • Reminder: design flow for asynchronous circuits • Lazy transition systems • Timing assumptions and constraints • Automatic generation of timing assumptions • Results

Why asynchronous? – All high-performance “synchronous” design styles are “asynchronous in small” (within one/few

Why asynchronous? – All high-performance “synchronous” design styles are “asynchronous in small” (within one/few clocks). Example: [ISSCC 2001 Intel paper on 4 GHz IEU for 0. 18 um CMOS in Pentium 4(tm)]. Requires asynchronous style timing analysis. – Relative sequential distance within a die for global wires is growing – Can we deliver global clock N years from now?

Timing assumptions in design flow • Synchronous circuits (e. g. , static CMOS): –

Timing assumptions in design flow • Synchronous circuits (e. g. , static CMOS): – max delay: stabilize within a clock (- setup - clock 2 q - clock_skew) – min delay: stabilize after hold time (+clock_skew - clock 2 q) • Speed-independent = quasi-delay insensitive: wire delays after a fork smaller than fan-out gate delays [Muller 59, Varshavsky et al. 80, Martin 89, …]. Problem: fat circuits • Burst-mode FSM: circuit stabilizes between two changes at the inputs [Nowick 91, Yun 94]. Problem: fundamental mode is similar to synchronous (external alignment by the worst case) • Timed circuits: Absolute bounds on gate / environment delays are known a priori (before physical design) [Mayers 95]. Problem: how do you know absolute delays before sizing/physical design?

Relative Timing Asynchronous Circuits Speed-independent C-element b a Timing assumption (on environment): b a

Relative Timing Asynchronous Circuits Speed-independent C-element b a Timing assumption (on environment): b a c a- before bc RT C-element: faster, smaller; correct only under timing constraint: a- before b-

Relative Timing Circuits • Assumptions: “a before b” – for concurrent events: reduces reachable

Relative Timing Circuits • Assumptions: “a before b” – for concurrent events: reduces reachable state space – for ordered events: permits early enabling – both increase don’t care space for logic synthesis => simplify logic (better area and timing) • “Assume - if useful - guarantee” approach: assumptions are used by the tool to derive a circuit and required timing constraints that must be met in physical design flow • Applied to design of the Rotating Asynchronous Pentium Processor(TM) Instruction Decoder (K. Stevens, S. Rotem et al. Intel Corporation)

STG for the READ cycle DSr+ LDS+ LDTACK+ DTACK- D+ DTACK+ LDTACK- LDS- D

STG for the READ cycle DSr+ LDS+ LDTACK+ DTACK- D+ DTACK+ LDTACK- LDS- D DSr DTACK VME Bus Controller LDS LDTACK DSr- D-

State Graph (Read cycle) DSr+ LDS+ LDTACKDSr+ LDS- LDTACK+ DSr+ D+ DTACK- LDTACKLDS- DTACK-

State Graph (Read cycle) DSr+ LDS+ LDTACKDSr+ LDS- LDTACK+ DSr+ D+ DTACK- LDTACKLDS- DTACK- DDTACK+ DSr- LDTACK-

Binary encoding of signals DSr+ 10000 LDS+ LDTACK- DSr+ 10010 LDS- LDTACK+ DTACKLDS- DSr+

Binary encoding of signals DSr+ 10000 LDS+ LDTACK- DSr+ 10010 LDS- LDTACK+ DTACKLDS- DSr+ 10110 D+ DTACK- LDTACK- 01100 LDS- DTACK- 01110 00110 10110 DDTACK+ DSr- (DSr , DTACK , LDS , D)

Karnaugh map for LDS = 1 LDS = 0 D LDTACK DSr 00 01

Karnaugh map for LDS = 1 LDS = 0 D LDTACK DSr 00 01 11 10 00 0 0 - 1 00 - - - 1 01 - - - - 11 - 1 10 0 0 - 0/1?

Speed-independent netlist

Speed-independent netlist

Transition systems ER 0 (LDS+) 1 LDS+ LDS- 1 (LDS-) 0 ER Excitation region:

Transition systems ER 0 (LDS+) 1 LDS+ LDS- 1 (LDS-) 0 ER Excitation region: enabling = firing, since delay can be zero

Lazy Transition Systems ER (LDS+) LDS+ LDS- LDSDTACK- LDS- FR (LDS-) Event LDS- is

Lazy Transition Systems ER (LDS+) LDS+ LDS- LDSDTACK- LDS- FR (LDS-) Event LDS- is lazy: firing = subset of enabling

Timing assumptions • (a before b) for concurrent events: concurrency reduction for firing and

Timing assumptions • (a before b) for concurrent events: concurrency reduction for firing and enabling • (a before b) for ordered events: early enabling • (a simultaneous to b wrt c) for triples of events: combination of the above

Speed-independent Netlist DSr+ LDS+ LDTACK+ D+ DTACK+ LDTACK- D- LDSD DTACK LDS map DSr-

Speed-independent Netlist DSr+ LDS+ LDTACK+ D+ DTACK+ LDTACK- D- LDSD DTACK LDS map DSr- csc LDTACK

Adding timing assumptions (I) DSr+ LDS+ LDTACK+ D+ DTACK+ LDTACK- D- LDSD DTACK SLOW

Adding timing assumptions (I) DSr+ LDS+ LDTACK+ D+ DTACK+ LDTACK- D- LDSD DTACK SLOW LDTACK- before DSr+ LDS map DSr- csc FAST LDTACK

Adding timing assumptions (I) DSr+ LDS+ LDTACK+ D+ DTACK+ LDTACK- D- LDSD DTACK LDTACK-

Adding timing assumptions (I) DSr+ LDS+ LDTACK+ D+ DTACK+ LDTACK- D- LDSD DTACK LDTACK- before DSr+ LDS map DSr- csc LDTACK

State space domain LDTACK- before DSr+ LDTACK-

State space domain LDTACK- before DSr+ LDTACK-

State space domain LDTACK- before DSr+ LDTACK-

State space domain LDTACK- before DSr+ LDTACK-

State space domain LDTACK- before DSr+ LDTACK- Two more unreachable states

State space domain LDTACK- before DSr+ LDTACK- Two more unreachable states

Boolean domain LDS = 1 LDS = 0 D LDTACK DSr 00 01 11

Boolean domain LDS = 1 LDS = 0 D LDTACK DSr 00 01 11 10 00 0 0 - 1 00 - - - 1 01 - - - - 11 - 1 10 0 0 - 0/1?

Boolean domain LDS = 1 LDS = 0 D LDTACK DSr 00 01 11

Boolean domain LDS = 1 LDS = 0 D LDTACK DSr 00 01 11 10 00 0 0 - 1 00 - - - 1 01 - - - - 11 - 1 10 0 0 - - 10 0 0 - 1 One more DC vector for all signals One state conflict is removed

Netlist with one constraint DSr+ LDS+ LDTACK+ D+ DTACK+ LDTACK- D- LDSD DTACK LDS

Netlist with one constraint DSr+ LDS+ LDTACK+ D+ DTACK+ LDTACK- D- LDSD DTACK LDS map DSr- csc LDTACK

Netlist with one constraint DSr+ LDS+ LDTACK+ D+ LDTACK- DTACKDTACK+ DSr- D- LDSD TIMING

Netlist with one constraint DSr+ LDS+ LDTACK+ D+ LDTACK- DTACKDTACK+ DSr- D- LDSD TIMING CONSTRAINT LDTACK- before DSr+ DSr LDS LDTACK

Timing assumptions • (a before b) for concurrent events: concurrency reduction for firing and

Timing assumptions • (a before b) for concurrent events: concurrency reduction for firing and enabling • (a before b) for ordered events: enabling • (a simultaneous to b wrt c) for triples of events: combination of the above early

Ordered events: early enabling b a F b a c G a a b

Ordered events: early enabling b a F b a c G a a b b c c Logic for gate c may change c

Adding timing assumptions (II) DSr+ LDS+ LDTACK+ D+ LDTACK- DTACK DSr DTACK+ DSr- D-

Adding timing assumptions (II) DSr+ LDS+ LDTACK+ D+ LDTACK- DTACK DSr DTACK+ DSr- D- LDSD D- before LDS- LDS LDTACK

State space domain D- before LDS- DSr- D- Potential enabling for LDS- Reachable space

State space domain D- before LDS- DSr- D- Potential enabling for LDS- Reachable space is unchanged For LDS- enabling can be changed in one state

Boolean domain LDS = 1 LDS = 0 D LDTACK DSr 00 01 11

Boolean domain LDS = 1 LDS = 0 D LDTACK DSr 00 01 11 10 00 0 0 - 1 00 - - - 1 01 - - - - 11 - 1 10 0 0 - - 10 0 0 - 1

Boolean domain LDS = 1 LDS = 0 D LDTACK DSr 00 01 11

Boolean domain LDS = 1 LDS = 0 D LDTACK DSr 00 01 11 10 00 0 0 - 1 00 - - - 1 01 - - - - 11 - - 1 1 10 0 0 - - 10 0 0 - 1 One more DC vector for one signal: LDS If used: LDS = DSr, otherwise: LDS = DSr + D

Before early enabling DSr+ LDS+ LDTACK+ D+ LDTACK- DTACK DSr DTACK+ DSr- D- LDSD

Before early enabling DSr+ LDS+ LDTACK+ D+ LDTACK- DTACK DSr DTACK+ DSr- D- LDSD LDS LDTACK

Netlist with two constraints DSr+ LDS+ LDTACK+ D+ LDTACK- DTACK DSr DTACK+ DSr- D-

Netlist with two constraints DSr+ LDS+ LDTACK+ D+ LDTACK- DTACK DSr DTACK+ DSr- D- LDSD TIMING CONSTRAINTS LDTACK- before DSr+ and D- before LDS- LDS LDTACK Both timing assumptions are used for optimization and become constraints

Deriving automatic timing assumptions • Rule I (out of 6): a, b non-input events

Deriving automatic timing assumptions • Rule I (out of 6): a, b non-input events – Untimed ordering: a||b and a enabled before b, but not vice versa – Derived assumption: a fires before b – Justification: delay of a gate can be made shorter than delay of two (or more) gates: del(a) < del(c)+del(b) c b a a b b a c

Deriving automatic timing assumptions • Rule I (out of 6): a, b non-input events

Deriving automatic timing assumptions • Rule I (out of 6): a, b non-input events – Untimed ordering: (a||b) and (a enabled before b), but not vice versa – Derived assumption: a fires before b – Justification: delay of a gate can be made shorter than delay of two (or more) gates c b a a b a c b – Effect I: a state becomes DC for all signals

Deriving automatic timing assumptions • Rule I (out of 6): a, b non-input events

Deriving automatic timing assumptions • Rule I (out of 6): a, b non-input events – Untimed ordering: (a||b) and (a enabled before b), but not vice versa – Derived assumption: a fires before b – Justification: delay of a gate can be made shorter than delay of two (or more) gates c b a a b a c b – Effect II: another state becomes local DC for signal of event b

Backannotation of Timing Constraints • Timed circuits require post-verification • Can synthesis tools help

Backannotation of Timing Constraints • Timed circuits require post-verification • Can synthesis tools help ? – Report the least stringent set of timing constraints required for the correctness of the circuit – Not all initial timing assumptions may be required • Petrify reports a set of constraints for order of firing that guarantee the circuit correctness

Timing constraints generation c d e c c a b d e Assumptions: d

Timing constraints generation c d e c c a b d e Assumptions: d before b and c before e and a before d b b d e a d

Timing constraints generation c d e c c a b d e Assumptions: d

Timing constraints generation c d e c c a b d e Assumptions: d before b and c before e and a before d b b d e a d

Timing constraints generation c d e c c a b d e Assumptions: d

Timing constraints generation c d e c c a b d e Assumptions: d before b and c before e and a before d b b d e a d Correct behavior

Timing constraints generation 1 c d e c c a b d e 2

Timing constraints generation 1 c d e c c a b d e 2 Assumptions: d before b and c before e and a before d b b d e a d Incorrect behavior

Covering incorrect behavior c 3 1 {1} d before c a b d {1,

Covering incorrect behavior c 3 1 {1} d before c a b d {1, 3} e c c d e {2, 4} 2 Assumptions: d before b and c before e and a before d b b d e a d before b d 5 c before e 4 Other possible constraints remove states from assumption domain => invalid

Covering incorrect behavior c 3 1 {1} d before c a b d e

Covering incorrect behavior c 3 1 {1} d before c a b d e c c d e {2, 4} 2 Assumptions: d before b and c before e and a before d b b d e a d 5 c before e 4 Constraints for the minimal cost solution: d before c and c before e

Timing aware state encoding • Solve only state conflicts reachable in the RT assumptions

Timing aware state encoding • Solve only state conflicts reachable in the RT assumptions domain • Generate automatic timing assumptions for inserted state signals => state signals can be implemented as RT logic • State variables inserted concurrently with I/O events => latency and cycle time reduction

Value of Relative Timing • RT circuits provides up to 2 -3 x (1.

Value of Relative Timing • RT circuits provides up to 2 -3 x (1. 3 -2 x) delay&area reduction with respect to SI circuits synthesized without (with) concurrency reduction • Automatic generation of timing assumptions => foundation for automatic synthesis of RT circuits with area/performance comparable/better than manual • Back-annotation of timing constraints => minimal required timing information for the back-end tools • Timing-aware state encoding allows significant area/performance optimization

Specification (STG) Reachability analysis State Graph State encoding Design flow without timing SG with

Specification (STG) Reachability analysis State Graph State encoding Design flow without timing SG with CSC Boolean minimization Next-state functions Logic decomposition Decomposed functions Technology mapping Gate netlist

Design Flow with Timing Specification (STG + user assumptions) Reachability analysis Lazy State Graph

Design Flow with Timing Specification (STG + user assumptions) Reachability analysis Lazy State Graph Automatic Timing Assumptions Timing-aware state encoding Lazy SG with CSC Boolean minimization Next-state functions Logic decomposition Decomposed functions Technology mapping Required Timing Constraints Gate netlist

FIFO example li ro FIFO lo ri lo- li+ ri- ro- li- lo+ ri+

FIFO example li ro FIFO lo ri lo- li+ ri- ro- li- lo+ ri+

Speed-Independent Implementation without concurrency reduction 3 state signals are required

Speed-Independent Implementation without concurrency reduction 3 state signals are required

SI implementation with concurrency reduction x+ li lo g. C + x + g.

SI implementation with concurrency reduction x+ li lo g. C + x + g. C ri lo- li+ ri- ro li- lo+ ri+ x-

RT implementation x+ li lo x ri lo- li+ ri- ro li- lo+ ri+

RT implementation x+ li lo x ri lo- li+ ri- ro li- lo+ ri+ x. OR x+ lo- li+ ri- ro- li- lo+ ri+ x-

RT implementation x+ li lo x ri lo- li+ ri- ro li- lo+ ri+

RT implementation x+ li lo x ri lo- li+ ri- ro li- lo+ ri+ x. OR x+ To satisfy the constraint: Delay(x- ) < Delay (ri+ ) and Delay(lo+) + Delay(x- ) < Delay(ro+ ) + Delay (ri+ ) All constraints are either satisfied by default or easy to satisfy by sizing lo- li+ ri- ro- li- lo+ ri+ x-