Clockless Logic Recap Lookahead Pipelines HighCapacity Pipelines 1

  • Slides: 19
Download presentation
Clockless Logic ã Recap: Lookahead Pipelines ã High-Capacity Pipelines 1

Clockless Logic ã Recap: Lookahead Pipelines ã High-Capacity Pipelines 1

Recap: Lookahead Pipeline Styles 2 Strategies: 1. Early Evaluation 2. Early Done 2

Recap: Lookahead Pipeline Styles 2 Strategies: 1. Early Evaluation 2. Early Done 2

Lookahead Pipelines: Strategy #1 Use non-neighbor communication: l stage receives information from multiple later

Lookahead Pipelines: Strategy #1 Use non-neighbor communication: l stage receives information from multiple later stages l allows “early evaluation” Benefit: stage gets head-start on next cycle 3

Lookahead Pipelines: Strategy #2 Use early completion detection: l completion detector moved before stage

Lookahead Pipelines: Strategy #2 Use early completion detection: l completion detector moved before stage (not after) l stage indicates “early done” in parallel with computation early completion detector Benefit: again, stage gets head-start on next cycle 4

Single-Rail Styles Adapt dual-rail styles to single-rail: l replace dual-rail function blocks by single-rail

Single-Rail Styles Adapt dual-rail styles to single-rail: l replace dual-rail function blocks by single-rail blocks l replace completion detectors by matched delays request matched delay done bit 1 bit n bit m request/done indicate valid data Example: LPsr 2/2 delay 5

Single-Rail Styles (contd. ) Example: LPsr 2/1 delay 6

Single-Rail Styles (contd. ) Example: LPsr 2/1 delay 6

High-Capacity Pipelines Singh/Nowick WVLSI-00, ISSCC-02, Async-02 7

High-Capacity Pipelines Singh/Nowick WVLSI-00, ISSCC-02, Async-02 7

Recent Approaches 3 novel styles for high-speed async pipelining: l “Lookahead Pipelines” (LP) [Singh/Nowick,

Recent Approaches 3 novel styles for high-speed async pipelining: l “Lookahead Pipelines” (LP) [Singh/Nowick, Async-00] l “High-Capacity Pipelines” (HC) [Singh/Nowick, WVLSI-00] l MOUSETRAP Pipelines [Singh/Nowick, TAU-00] Goal: significantly improve throughput of PS 0 Two Distinct Strategies: l LP: introduce protocol optimizations Ø “shave off” components from critical cycle l HC: fundamentally new protocol Ø greater concurrency: “loosely-coupled” stages 8

High-Capacity Pipeline: HC Key Idea: Decouple control for pull-up and pull-down l increases pipeline

High-Capacity Pipeline: HC Key Idea: Decouple control for pull-up and pull-down l increases pipeline concurrency initiates next cycle early l once N+1 evaluates, can enter “isolate (hold) phase” stage N allowed to complete entire next cycle! stage controller pc eval ack delay N+1 N+2 9

Inside an HC stage Decoupled control: pull-up and pull-down stacks are independently controllable: eval

Inside an HC stage Decoupled control: pull-up and pull-down stacks are independently controllable: eval pc “keeper” precharge control data inputs evaluation control Pull-down stack data outputs l pc asserted: precharge l eval asserted: evaluate l both de-asserted: enter “isolate” (hold) phase 10

Cycle of an LPHC Stage N+1 Eval pc=1 eval=0 Isolate pc=0 eval=0 Precharge pc=1

Cycle of an LPHC Stage N+1 Eval pc=1 eval=0 Isolate pc=0 eval=0 Precharge pc=1 eval=1 ã Only a single backward synchronization arc: l once stage N+1 has completed Eval, N can perform entire next cycle! Ø why safe? : N+1 enters isolate phase … key to greater concurrency Ø almost all existing approaches: require 2 arcs ã One (natural) forward synchronization arc: l stage N+1 evaluates new data only after N has evaluated 11

Formal Specification of Controller (Start evaluate) pc+ eval+ (Evaluate complete) S+ (Isolate) eval- (Start

Formal Specification of Controller (Start evaluate) pc+ eval+ (Evaluate complete) S+ (Isolate) eval- (Start precharge) (Precharge complete) T+ (Evaluate of N+1 complete) T- (Precharge of N+1 complete) pc. S- Problem: Specification too concurrent for direct synthesis l desired precharge condition: N and N+1 have evaluated same data l problem: this condition not uniquely captured by given signals! Ø N may evaluate next data item, while N+1 stuck on current item! 12

Modified Specification of Controller Solution: Add a state variable ok 2 pc pc+ eval+

Modified Specification of Controller Solution: Add a state variable ok 2 pc pc+ eval+ (Evaluate of T+ N+1 complete) S+ eval- T- pc- ok 2 pc+ S- ok 2 pc- (Precharge of N+1 complete) ok 2 pc records whether N+1 has “absorbed” N’s data item Øok 2 pc resets immediately when N deletes item (N precharges) Øok 2 pc is set when N+1 deletes item (N+1 precharges) 13

Controller implementation S T a. C + NAND 3 pc ok 2 pc S

Controller implementation S T a. C + NAND 3 pc ok 2 pc S INV Controller implementation is very simple: l each signal implemented using a single gate l ok 2 pc typically off the critical path eval 14

Performance 2 2 N isolates 3 1 N N+1 N enables itself N precharges

Performance 2 2 N isolates 3 1 N N+1 N enables itself N precharges evaluates N+1 evaluates for next evaluation N+2 Cycle Time = 15

Ripple-Carry Adder: One Stage Mixed Dual-Rail/Single-Rail Datapath: l single-rail: sum l dual-rail: A, B,

Ripple-Carry Adder: One Stage Mixed Dual-Rail/Single-Rail Datapath: l single-rail: sum l dual-rail: A, B, Carry-in and Carry-out Ø must implement binate functions using unate dynamic logic A B reqab a 1 a 0 b 1 b 0 reqc Carry-in cin 1 cin 0 done Full-Adder Stage cout 1 Carry-out cout 0 sum 16

Final Adder Architecture shift-registers provide operand bits A, B carry in adder least significant

Final Adder Architecture shift-registers provide operand bits A, B carry in adder least significant stage carry out sum most significant shift-registers accumulate sum bits 17

Results Designed/simulated adder in each pipeline style Experimental Setup: l design: 32 -bit ripple-carry-adder

Results Designed/simulated adder in each pipeline style Experimental Setup: l design: 32 -bit ripple-carry-adder l technology: 0. 6 HP CMOS, @3. 3 V and 300°K New LPHC style: 10% faster than LPSR 2/1 18

Conclusions Introduced 2 new asynchronous adders: l Use novel pipeline protocols: Ø observe events

Conclusions Introduced 2 new asynchronous adders: l Use novel pipeline protocols: Ø observe events from multiple later stages Ø decouple control of pull-up/pull-down l Especially suitable for fine-grain (gate-level) pipelining l Very high-throughputs obtained: Ø 0. 93 -1. 02 GHz in 0. 6 Ø expected to outperform the best (IPCMOS: 3. 3 -4. 5 GHz / 0. 18 ) l LPHC doubles the typical storage capacity l Robustly handle arbitrary-speed environments Ø useful as IP’s Future Work: Layout/fabrication, application to DSP’s 19