Elastic Circuits blending synchronous and asynchronous technologies Jordi
Elastic Circuits blending synchronous and asynchronous technologies Jordi Cortadella Universitat Politècnica de Catalunya, Barcelona (joint work with M. Kishinevsky and M. Galceran-Oms) Collège de France May 21 st, 2013
Collège de France 2013 Elastic circuits 2
Combinational Logic Flip Flops Synchronous circuit PLL Collège de France 2013 Elastic circuits 3
Asynchronous circuit L Combinational Logic L delay C Collège de France 2013 4 -phase Elastic circuits C 4
Asynchronous circuit Req. In Req. Out C C Ack. Out Ack. In • David Muller’s pipeline (late 50’s) • Sutherland’s Micropipelines (Turing award, 1989) Collège de France 2013 Elastic circuits 5
Globally-asynchronous Locally-synchronous GALS
So. C design with GALS • Most IPs are synchronous DSP • Different components may have different operating frequencies CLK 3 P Bridge CDC • Some components have variable latencies (e. g. , cache hit/miss latency) Fast Bus CLK 1 Bridge CDC Mem Slow Bus • Multiple clock domains are essential Collège de France 2013 Elastic circuits CLK 2 7
Multiple clock domains f 3/f 0 CLK 1 f 2/f 0 CLK 2 CLK (f 0) CLK 3 f 1/f 0 CLK Independent clocks Rational clock frequencies Single clock (mesochronous) (controllable skew) Collège de France 2013 Elastic circuits 8
Synchronous handshakes Data Sender Valid Receiver Ack CLK 1 CLK 2 • The arrival of data is unpredictable • Handshakes solve the problem Collège de France 2013 Elastic circuits 9
The problem: metastability D Q CLKS D setup CLKR hold Q CLKR D Q Collège de France 2013 ? Elastic circuits 10
Metastability Source: W. J. Dally, Lecture notes for EE 108 A, Lecture 13. Metastability and Synchronization Failure (or When Good Flip-Flops go Bad) 11/9/2005. Collège de France 2013 Elastic circuits 11
Metastability logic 0 logic 1 metastable Collège de France 2013 Elastic circuits 12
Classical synchronous solution D Q D Q CLKT CLKR Mean Time Between Failures f. Ф: frequency of the clock f. D: frequency of the data tr: resolve time available W: metastability window : resolve time constant Collège de France 2013 Elastic circuits Example # FFs MTBF 1 FF 15 min 2 FF 9 days 3 FF 23 years 13
Handshake with synchronizers Data Sender Valid Receiver Ack CLK 1 CLK 2 • Simple solution • Throughput can be highly degraded: a long round trip for every transaction Collège de France 2013 Elastic circuits 14
Asynchronous FIFOs Circular buffer Data 3 -4 cycles 1 cycle Valid Ack FIFO control 1 cycle Valid Ack Clk Out Clk In • Ack is issued as soon as data has been delivered • No impact on throughput (1 token/cycle) • Min latency determined by the internal synchronizers • Some tricky structures for the FIFO pointers (e. g. Grey encoding) Collège de France 2013 Elastic circuits 15
So. C design with GALS DSP CLK 3 P • Bridges for Clock Domain Crossing usually contain asynchronous FIFOs Bridge CDC • Latency cost only when interfacing with synchronous domains Fast Bus CLK 1 Bridge CDC Mem Slow Bus Collège de France 2013 CLK 2 • No latency penalty between asynchronous domains Elastic circuits 16
Synchronous and Asynchronous meeting each other
Meanwhile, a small village of indomitable engineers was resisting the synchronous occupation … Asynchronia Collège de France 2013 Elastic circuits 18
Bill Grundmann (Intel’s director of CAD research, Technical director for CAD technology for the Alpha Microprocessor): “The specification of a complex system is usually asynchronous (functional units, messages, queues, …), … however the clock appears when we move down to the implementation levels” (in a technical discussion about system design with M. Kishinevsky and J. Cortadella, 2004) Collège de France 2013 Elastic circuits 19
Async and Sync meeting each other Async • Modular (time elasticity) • But hard to analyze and synthesize J. O’Leary and G. Brown, 1997 Synchronous emulation of asynchronous circuits A. Peeters and K. Van Berkel, 2001 Synchronous handshake circuits Elastic Circuits (Sync / Async) Cortadella et al. , Desynchronization, 2003 L. Carloni et al. , 1999 A methodology for correct-by-construction latency-insensitive design • Easy to analyze and synthesize • Not modular (time rigid) Sync Collège de France 2013 Elastic circuits 20
Different flavors of elasticity … … 1 … 7 4 1 1 0 2 … 8 + 4 3 Rigid time 4 7 0 1 2 4 1 1 0 Collège de France 2013 2 … + a 8 3 Asynchronous … 8 + e 4 4 3 Synchronous Elastic circuits 21
Why synchronous elasticity? • Time is discrete (cycle based), but unpredictable (unknown number of cycles) • Examples – Short/long integer addition (8 bits, 64 bits) – Floating-point units – Cache latency: fast hit(2), slow hit(3), miss(>20) – Bus arbitration – Latencies in Network-on-Chip – … and many others Collège de France 2013 Elastic circuits 22
… even at design time Sender Receiver CLK Can we add a register without modifying the functionality of the system? Collège de France 2013 Elastic circuits 23
Many systems are already elastic AMBA AXI bus protocol Handshake signals Collège de France 2013 Elastic circuits 24
Designing with synchronous elasticity
Communication channel sender receiver Data Long wires: slow transmission Collège de France 2013 Elastic circuits 26
Pipelined communication sender receiver Data How about if the sender does not always send valid data? Collège de France 2013 Elastic circuits 27
Pipelined communication sender receiver Data Collège de France 2013 Elastic circuits 28
Pipelined communication sender receiver Data Collège de France 2013 Elastic circuits 29
Pipelined communication sender receiver Data Collège de France 2013 Elastic circuits 30
Pipelined communication sender receiver Data ? ? ? Collège de France 2013 Elastic circuits 31
The Valid bit sender receiver Data Valid Collège de France 2013 Elastic circuits 32
The Valid bit sender receiver Data Valid Collège de France 2013 Elastic circuits 33
The Valid bit sender receiver Data Valid Collège de France 2013 Elastic circuits 34
The Valid bit sender receiver Data Valid Collège de France 2013 Elastic circuits 35
The Valid bit sender receiver Data Valid How about if the receiver is not always ready ? Collège de France 2013 Elastic circuits 36
The Stop bit sender receiver Data Valid Stop 0 Collège de France 2013 0 0 Elastic circuits 0 0 37
The Stop bit sender receiver Data Valid Stop 0 Collège de France 2013 0 0 Elastic circuits 1 1 38
The Stop bit sender receiver Data Valid Stop 0 Collège de France 2013 0 1 Elastic circuits 1 1 39
The Stop bit sender receiver Data Valid Stop 1 1 1 Back-pressure Collège de France 2013 Elastic circuits 40
The Stop bit sender receiver Data Valid Stop 1 Collège de France 2013 1 1 Elastic circuits 1 0 41
The Stop bit sender receiver Data Valid Stop 0 Collège de France 2013 0 0 Elastic circuits 0 0 42
The Stop bit sender receiver Data Valid Stop 0 Collège de France 2013 0 0 Elastic circuits 0 0 43
The Stop bit sender receiver Data Valid Stop 0 Collège de France 2013 0 0 Elastic circuits 0 0 44
The Stop bit sender receiver Data Valid Stop 0 Collège de France 2013 0 0 Elastic circuits 0 0 45
The Stop bit sender receiver Data Valid Stop 0 0 1 Long combinational path Collège de France 2013 Elastic circuits 46
Relay stations (Carloni, 1999) sender shell receiver main pearl aux Collège de France 2013 shell aux Elastic circuits aux 47
Relay stations (Carloni, 1999) sender shell receiver main pearl aux Collège de France 2013 shell aux Elastic circuits aux 48
Relay stations (Carloni, 1999) sender shell receiver main pearl aux Collège de France 2013 shell aux Elastic circuits aux 49
Relay stations (Carloni, 1999) sender shell receiver main pearl aux Collège de France 2013 shell aux Elastic circuits aux 50
Relay stations (Carloni, 1999) sender shell receiver main pearl aux Collège de France 2013 shell aux Elastic circuits aux 51
Relay stations (Carloni, 1999) sender shell receiver main pearl aux Collège de France 2013 shell aux Elastic circuits aux 52
Relay stations (Carloni, 1999) sender shell receiver main pearl aux Collège de France 2013 shell aux Elastic circuits aux 53
Relay stations (Carloni, 1999) sender shell receiver main pearl aux Collège de France 2013 shell aux Elastic circuits aux 54
Relay stations (Carloni, 1999) sender shell receiver main pearl aux Collège de France 2013 shell aux Elastic circuits aux 55
Relay stations (Carloni, 1999) sender shell receiver main pearl shell pearl aux aux • Handshakes with short wires • Double storage required Collège de France 2013 Elastic circuits 56
Flip-flops vs. latches sender FF FF receiver 1 cycle Collège de France 2013 Elastic circuits 57
Flip-flops vs. latches sender H L receiver 1 cycle Collège de France 2013 Elastic circuits 58
Flip-flops vs. latches sender H L receiver 1 cycle Collège de France 2013 Elastic circuits 59
Flip-flops vs. latches sender H L receiver 1 cycle Collège de France 2013 Elastic circuits 60
Flip-flops vs. latches sender H L receiver 1 cycle Collège de France 2013 Elastic circuits 61
Flip-flops vs. latches sender H L receiver 1 cycle Collège de France 2013 Elastic circuits 62
Flip-flops vs. latches sender H L receiver 1 cycle Collège de France 2013 Elastic circuits 63
Flip-flops vs. latches sender H L receiver 1 cycle Flip-flops already have a double storage capability, but … Collège de France 2013 Elastic circuits 64
Flip-flops vs. latches sender H L receiver 1 cycle Not allowed in conventional FF-based design ! Collège de France 2013 Elastic circuits 65
Flip-flops vs. latches sender H L receiver 1 cycle Let’s make the master/slave latches independent Collège de France 2013 Elastic circuits 66
Flip-flops vs. latches sender H L receiver ½ cycle Let’s make the master/slave latches independent Only half of the latches (H or L) can move tokens Collège de France 2013 Elastic circuits 67
Synchronous elasticity sender receiver Data En En V Valid Stop Collège de France 2013 En S En V S Elastic circuits V S Valid Stop 68
Synchronous elasticity sender receiver Data En Valid En V 1 Stop Collège de France 2013 En En V Valid 0 S S S Elastic circuits S Stop 69
Synchronous elasticity sender receiver Data En Valid En V 1 Stop Collège de France 2013 En En V Valid 0 S S S Elastic circuits S Stop 70
Synchronous elasticity sender receiver Data En Valid En V 1 Stop Collège de France 2013 En En V Valid 0 S S S Elastic circuits S Stop 71
Synchronous elasticity sender receiver Data En Valid En V 1 Stop Collège de France 2013 En En V Valid 0 S S S Elastic circuits S Stop 72
Synchronous elasticity sender receiver Data En Valid En V 1 Stop Collège de France 2013 En En V Valid 0 S S S Elastic circuits S Stop 73
Synchronous elasticity sender receiver Data En Valid En V 0 Stop Collège de France 2013 En En V Valid 0 S S S Elastic circuits S Stop 74
Synchronous elasticity sender receiver Data En Valid En V 0 Stop Collège de France 2013 En En V Valid 0 S S S Elastic circuits S Stop 75
Synchronous elasticity sender receiver Data En Valid En V 0 Stop Collège de France 2013 En En V Valid 0 S S S Elastic circuits S Stop 76
Synchronous elasticity sender receiver Data En Valid En V 0 Stop Collège de France 2013 En En V Valid 0 S S S Elastic circuits S Stop 77
Synchronous elasticity sender receiver Data En Valid En V 0 Stop Collège de France 2013 En En V Valid 0 S S S Elastic circuits S Stop 78
Synchronous elasticity sender receiver Data En Valid En V 1 Stop Collège de France 2013 En En V Valid 1 S S S Elastic circuits S Stop 79
Synchronous elasticity sender receiver Data En Valid En V 1 Stop Collège de France 2013 En En V Valid 1 S S S Elastic circuits S Stop 80
Synchronous elasticity sender receiver Data En Valid En V 1 Stop Collège de France 2013 En En V Valid 1 S S S Elastic circuits S Stop 81
Synchronous elasticity sender receiver Data En Valid En V 1 Stop Collège de France 2013 En En V Valid 1 S S S Elastic circuits S Stop 82
Synchronous elasticity sender receiver Data En Valid En V 1 Stop Collège de France 2013 En En V Valid 1 S S S Elastic circuits S Stop 83
Synchronous elasticity sender receiver Data En Valid En V 1 Stop Collège de France 2013 En En V Valid 1 S S S Elastic circuits S Stop 84
Synchronous elasticity sender receiver Data En Valid En V 1 Stop Collège de France 2013 En En V Valid 1 S S S Elastic circuits S Stop 85
Synchronous elasticity sender receiver Data En Valid En V 1 Stop Collège de France 2013 En En V Valid 1 S S S Elastic circuits S Stop 86
Synchronous elasticity sender receiver Data En Valid En V 1 Stop Collège de France 2013 En En V Valid 1 S S S Elastic circuits S Stop 87
Synchronous elasticity sender receiver Data En Valid En V 1 Stop Collège de France 2013 En En V Valid 0 S S S Elastic circuits S Stop 88
Synchronous elasticity sender receiver Data En Valid En V 1 Stop Collège de France 2013 En En V Valid 0 S S S Elastic circuits S Stop 89
Synchronous elasticity sender receiver Data En Valid En V 1 Stop Collège de France 2013 En En V Valid 0 S S S Elastic circuits S Stop 90
Synchronous elasticity sender receiver Data En Valid En V 1 Stop Collège de France 2013 En En V Valid 0 S S S Elastic circuits S Stop 91
Synchronous elasticity sender receiver Data En Valid En V 1 Stop Collège de France 2013 En En V Valid 0 S S S Elastic circuits S Stop 92
Synchronous elasticity sender receiver Data En Valid En V 1 Stop Collège de France 2013 En En V Valid 0 S S S Elastic circuits S Stop 93
Synchronous elasticity sender receiver Data En Valid En V 1 Stop Collège de France 2013 En En V Valid 0 S S S Elastic circuits S Stop 94
Synchronous elasticity sender receiver Data En Valid En V 1 Stop Collège de France 2013 En En V Valid 0 S S S Elastic circuits S Stop 95
Basic VS block Eni Vi-1 Vi VS Si-1 Collège de France 2013 Si Si-1 Elastic circuits Si 96
Elastic netlists Enable signal to data latches V S VS Fork Join VS V S Join / Fork V S VS Collège de France 2013 Elastic circuits 97
Join + V 1 VS V S 1 S VS VS V 2 VS VS Collège de France 2013 S 2 Elastic circuits 98
Lazy Fork V V 1 S 1 V 2 S Collège de France 2013 S 2 Elastic circuits 99
Eager Fork S 1 ^ V 1 V V 2 ^ S S 2 Collège de France 2013 Elastic circuits 100
Variable Latency Units [0 - k] cycles go done clear V/S Collège de France 2013 V/S Elastic circuits 101
Design automation
Transforming sync into elastic Collège de France 2013 Elastic circuits 103
Transforming sync into elastic Collège de France 2013 Elastic circuits 104
Transforming sync into elastic Behavioral equivalence is preserved Collège de France 2013 Elastic circuits 105
Elastic Esterel module ABRO: input A, B, R; output O; loop [ await A || await B ]; emit O each R end module Marc Galceran Oms, Master thesis, 2007 Collège de France 2013 Elastic circuits 106
Elastic Esterel A Pause. Reg 7 R O Boot B Pause. Reg 11 Collège de France 2013 Elastic circuits 107
Elastic Esterel A Pause. Reg 7 R O Boot B Pause. Reg 11 Valid_A Stop_A Valid_B Stop_B Valid_R Stop_R Collège de France 2013 Elastic Control Layer Elastic circuits Valid_O Stop_O 108
Circuit vs. μarchitectural cycles Collège de France 2013 Elastic circuits 109
Synchronous handshake circuits (Peeters, 2001) int = type [0. . 255] & gcd: main proc (in? chan <<int, int>> & out! chan int) begin x, y: var int | forever do in? <<x, y>> ; do x <> y then if x < y then y: =y-x else x: =x-y fi od ; out!x od end Sources: J. Kessels and A. Peeters. DESCALE: A Design Experiment for a Smart Card Application Consuming Low Energy, in Principles of Asynchronous Circuit Design, A Systems Perspective, Eds. , J. Sparso and S. Furber, Kluwer Academic Publishers, 2001. P. A. Beerel, R. O. Ozdag and M. Ferretti. A Designer’s Guide to Asynchronous VLSI, Cambridge University Press, 2010. Collège de France 2013 Elastic circuits 110
Generalization: bounded FIFOs Out In B 3 B 1 B 2 Bounded Dataflow Networks Collège de France 2013 Elastic circuits 111
Behavioral equivalence Synchronous: D: a b c d e f g h i j k … Elastic: D: a a b b b c d e e f g g h i i i j k … V: 1 0 0 1 1 1 0 0 1 1 … Collège de France 2013 Elastic circuits 112
Early evaluation
Early evaluation 3 x 15 2 5 Collège de France 2013 Elastic circuits 114
Early evaluation 3 x 6 2 Collège de France 2013 Elastic circuits 115
Early evaluation 0 x 0 8 Collège de France 2013 Elastic circuits 116
Early evaluation • Only wait for required inputs • Late arriving tokens are cancelled by anti-tokens Branch target address PC+4 No branch Take branch Example: mux for next-PC calculation Collège de France 2013 Elastic circuits 117
How to implement anti-tokens ? Valid+ Stop+ Valid– Stop– Collège de France 2013 + Valid+ - Valid– Elastic circuits Stop+ Stop– 118
Dual elastic controllers En En Collège de France 2013 V+ V+ S+ S+ V- V- S- S- Elastic circuits 119
Fork/join Dual fork/join Collège de France 2013 Join with early evaluation Elastic circuits 120
Re-designing for average performance F Ffast Early evaluation Fslow / fast Collège de France 2013 Elastic circuits 121
H. 264 CABAC decoder Gotmanov, Kishinevsky and Galceran-Oms Evaluation of flexible latencies: designing synchronous elastic H. 264 CABAC decoder Proc. Problems in design of micro- and nano-electronic systems Oct. 2010 (in Russian) Collège Moscow, de France 2013 Elastic circuits 122
Profiling Collège de France 2013 Elastic circuits 123
H. 264 CABAC decoder Collège de France 2013 Elastic circuits 124
Area vs. Performance Area Effective Cycle Time Collège de France 2013 Elastic circuits 125
Conclusions • Rigid systems preserve timing equivalence (data always valid at every cycle) • Elastic systems waive timing equivalence to enable more concurrency Θ Θ (bubbles decrease throughput, but reduce cycle time) • A new avenue of performance optimizations can emerge to build correct-by-construction pipelines Collège de France 2013 Elastic circuits 126
Unifying sync/async elasticity • J. Carmona, J. Cortadella, M. Kishinevsky and A. Taubin, Elastic Circuits, IEEE Trans. On CAD, Oct. 2009. Collège de France 2013 Elastic circuits 127
Collège de France 2013 Elastic circuits 128
- Slides: 128