Clockless Logic Recap Lookahead Pipelines HighCapacity Pipelines 1
- Slides: 19
Clockless Logic ã Recap: Lookahead Pipelines ã High-Capacity Pipelines 1
Recap: Lookahead Pipeline Styles 2 Strategies: 1. Early Evaluation 2. Early Done 2
Lookahead Pipelines: Strategy #1 Use non-neighbor communication: l stage receives information from multiple later stages l allows “early evaluation” Benefit: stage gets head-start on next cycle 3
Lookahead Pipelines: Strategy #2 Use early completion detection: l completion detector moved before stage (not after) l stage indicates “early done” in parallel with computation early completion detector Benefit: again, stage gets head-start on next cycle 4
Single-Rail Styles Adapt dual-rail styles to single-rail: l replace dual-rail function blocks by single-rail blocks l replace completion detectors by matched delays request matched delay done bit 1 bit n bit m request/done indicate valid data Example: LPsr 2/2 delay 5
Single-Rail Styles (contd. ) Example: LPsr 2/1 delay 6
High-Capacity Pipelines Singh/Nowick WVLSI-00, ISSCC-02, Async-02 7
Recent Approaches 3 novel styles for high-speed async pipelining: l “Lookahead Pipelines” (LP) [Singh/Nowick, Async-00] l “High-Capacity Pipelines” (HC) [Singh/Nowick, WVLSI-00] l MOUSETRAP Pipelines [Singh/Nowick, TAU-00] Goal: significantly improve throughput of PS 0 Two Distinct Strategies: l LP: introduce protocol optimizations Ø “shave off” components from critical cycle l HC: fundamentally new protocol Ø greater concurrency: “loosely-coupled” stages 8
High-Capacity Pipeline: HC Key Idea: Decouple control for pull-up and pull-down l increases pipeline concurrency initiates next cycle early l once N+1 evaluates, can enter “isolate (hold) phase” stage N allowed to complete entire next cycle! stage controller pc eval ack delay N+1 N+2 9
Inside an HC stage Decoupled control: pull-up and pull-down stacks are independently controllable: eval pc “keeper” precharge control data inputs evaluation control Pull-down stack data outputs l pc asserted: precharge l eval asserted: evaluate l both de-asserted: enter “isolate” (hold) phase 10
Cycle of an LPHC Stage N+1 Eval pc=1 eval=0 Isolate pc=0 eval=0 Precharge pc=1 eval=1 ã Only a single backward synchronization arc: l once stage N+1 has completed Eval, N can perform entire next cycle! Ø why safe? : N+1 enters isolate phase … key to greater concurrency Ø almost all existing approaches: require 2 arcs ã One (natural) forward synchronization arc: l stage N+1 evaluates new data only after N has evaluated 11
Formal Specification of Controller (Start evaluate) pc+ eval+ (Evaluate complete) S+ (Isolate) eval- (Start precharge) (Precharge complete) T+ (Evaluate of N+1 complete) T- (Precharge of N+1 complete) pc. S- Problem: Specification too concurrent for direct synthesis l desired precharge condition: N and N+1 have evaluated same data l problem: this condition not uniquely captured by given signals! Ø N may evaluate next data item, while N+1 stuck on current item! 12
Modified Specification of Controller Solution: Add a state variable ok 2 pc pc+ eval+ (Evaluate of T+ N+1 complete) S+ eval- T- pc- ok 2 pc+ S- ok 2 pc- (Precharge of N+1 complete) ok 2 pc records whether N+1 has “absorbed” N’s data item Øok 2 pc resets immediately when N deletes item (N precharges) Øok 2 pc is set when N+1 deletes item (N+1 precharges) 13
Controller implementation S T a. C + NAND 3 pc ok 2 pc S INV Controller implementation is very simple: l each signal implemented using a single gate l ok 2 pc typically off the critical path eval 14
Performance 2 2 N isolates 3 1 N N+1 N enables itself N precharges evaluates N+1 evaluates for next evaluation N+2 Cycle Time = 15
Ripple-Carry Adder: One Stage Mixed Dual-Rail/Single-Rail Datapath: l single-rail: sum l dual-rail: A, B, Carry-in and Carry-out Ø must implement binate functions using unate dynamic logic A B reqab a 1 a 0 b 1 b 0 reqc Carry-in cin 1 cin 0 done Full-Adder Stage cout 1 Carry-out cout 0 sum 16
Final Adder Architecture shift-registers provide operand bits A, B carry in adder least significant stage carry out sum most significant shift-registers accumulate sum bits 17
Results Designed/simulated adder in each pipeline style Experimental Setup: l design: 32 -bit ripple-carry-adder l technology: 0. 6 HP CMOS, @3. 3 V and 300°K New LPHC style: 10% faster than LPSR 2/1 18
Conclusions Introduced 2 new asynchronous adders: l Use novel pipeline protocols: Ø observe events from multiple later stages Ø decouple control of pull-up/pull-down l Especially suitable for fine-grain (gate-level) pipelining l Very high-throughputs obtained: Ø 0. 93 -1. 02 GHz in 0. 6 Ø expected to outperform the best (IPCMOS: 3. 3 -4. 5 GHz / 0. 18 ) l LPHC doubles the typical storage capacity l Robustly handle arbitrary-speed environments Ø useful as IP’s Future Work: Layout/fabrication, application to DSP’s 19
- Clockless microcontroller
- Tom coolidge
- Westwood pipelines
- Edge-to-core-to-cloud data pipelines
- Gdd
- Pipelines
- Questar pipelines
- First order logic vs propositional logic
- Combinational logic sequential logic
- Software development plan
- First order logic vs propositional logic
- Combinational logic sequential logic
- Combinational logic vs sequential logic
- Combinational logic sequential logic 차이
- First order logic vs propositional logic
- Cryptarithmetic problem logic+logic=prolog
- Ttrr ttrr cross
- What is economic environment example
- Recap indexing scans
- Recap introduction