Speed and Power Tradeoffs Applied to Adder Design
- Slides: 77
Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California Davis www. ece. ucdavis. edu/acsel From: Tutorial Presentation 16 th International Symposium on Computer Arithmetic Santiago de Compostela, SPAIN June 18, 2003 16 th International Symposium on Computer Arithmetic,
Issues to be addressed • How do we compare different topologies for their efficiency ? • How do we estimate speed and efficiency of our algorithm ? • What criteria's should we use when developing a new algorithm ? • How does power enter into this equation ? June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 2
Additional Issues • Determine which topology is the best for given Power or Delay budget • Determine which topology can stretch the furthest in terms of speed or power June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 3
Metric June 18, 2003 16 th International Symposium on Computer Arithmetic,
Previously used estimates Counting the number of gates (logic levels): not accurate June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 5
Critical path in Motorola's 64 -bit CLA June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 6
Motorola's 64 -bit CLA Modified PG Block Intermediate propagate signals Pi: 0 are generated to speed-up C 3 June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 7
Fan-In and Fan-Out Dependency (Oklobdzija, Barnes: IBM 1985) June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 8
Delay Comparison: Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Delay June 18, 2003 Complexity 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 9
Design Objective • Design takes time: – finding results afterward is not of much value • There is a disconnect between measures used by computer arithmetic when developing an algorithm and what is obtained after implementation – we want to estimate as close to the measured results • A simple tool that can evaluate different design trade-off for a given technology is needed • Power trade-off is the most important – speed and power are tradable June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 10
Logical Effort Theory • “Back of the Envelope” complexity: good for estimating speed • Gate delay = linear function of load – Slope: logical effort gate driving characteristics – Intersect: parasitic gate internal load • “Logical Effort” accuracy is not sufficient – We needed to extend and refine the method – However, that becomes more than “Back of the Envelope” • Logical Effort does not account for possible power-delay trade-offs June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 11
Logical Effort Theory • Excel –a platform of choice (ARITH-16) – Simple enough – Can provide computation quickly – Easy to enter a given design • Technology characterization is needed: – This needs to be done only once: available for every design afterwards – Domino gate = 2 stages of dynamic and static • Different driving characteristics of these stages • Multi-output gate (carry-look-ahead, Ling/conditional sum) • Energy model needs to be included June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 12
Energy Motivation *courtesy of Intel Corp. Cache Processor thermal map Tem p (o. C) Execution core 120 o. C AGUs: performance and peak-current limiters High activity thermal hotspot Goal: energy-efficient June 18, 2003 high-performance 16 th International Symposium on Computer Arithmetic, design Santiago de Compostela, SPAIN 13
Critical Paths of Representative 64 -bit Adders June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 14
XOR Carry-merge gates PG Kogge-Stone Adder 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Critical path = PG+5+XOR = 7 gate stages Generate, Propagate fanout of 2, 3 Energy Maximum interconnect spans 16 b inefficient June 18, 2003 16 th International Symposium on Computer Arithmetic, 15 Santiago de Compostela, SPAIN
Sparse-tree Adder Architecture Generate every 4 th carry in parallel Side-path: 4 -bit conditional sum generator 73% fewer carry-merge gates energy-efficient June 18, 2003 16 th International Symposium on Computer Arithmetic, 16 Santiago de Compostela, SPAIN
Kogge-Stone adder (8 -stage) D = 8*(GBH)1/8*2. 2 + 3. 8*P June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 17
MXA 2 – Architecture & Result • Multiplexer-based • Generate carries using radix-2 (P, G) • 4 -bit conditional sum selected by carries • 4 -b cell width = 17 m • 9 -stage critical path – Per-stage effort = 3. 7 – Total effort delay = 33. 3 – Total parasitic = 22. 5 – Total delay = 55. 8 June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 18
HC 2 – Architecture • Generate even carries using radix-2 (P, G) • Generate odd carries from even carries • CMOS adder for sum • 1 -b cell width 4 m • 10 -stage critical path June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 19
HC 2 – Circuits & Results June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 20
KS 2 – Architecture & Results • Generate carries using radix-2 (P, G) • CMOS adder for sum • Similar circuits as HC 2 • 1 -b cell width 4 m • 9 -stage critical path June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 21
KS 4 – Architecture • Generate carries using redundant radix-4 (P, G) • Dynamic circuit • 1 -b cell width 4 m • 6 -stage critical 16 th path June 18, 2003 International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 22
KS 4 – Circuits & Result June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 23
CLA 4 – Architecture • Generate carries using radix-4 (P, G, C) • 1 -b cell width 4 m • 15 -stage critical path P-Path (P, G, C) Network June 18, 2003 G-Path 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 24
CLA 4 – Circuits & Result June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 25
LNG 4 – Architecture • Generate carries using Ling pseudo-carries • Conditional sums selected by local & long carries • 1 -b cell width 5. 1 m; 9 -stage critical path June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 26
LNG 4 – Circuits & Result June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 27
Results from Simulation • Fairly consistent with logical effort analysis • Per-stage delay June 18, 2003 – 1. 4 FO 4 (static) – 0. 8 FO 4 (dynamic) 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 28
Delay of Representative 64 -b Adders June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 29
What happened when Power is considered ? June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 30
What happened when Power is considered ? • Must look at Energy-Delay Space of designs June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 31
Energy-Delay Space Energy speed barrier Different Adders Emin power limit Dmin June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN Delay 32
Logical Effort in Energy-Delay Space Most design approaches focus here • It is possible to lower energy by trading delay? or … June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 33
Logical Effort June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 34
Delay in a Logic Gate Delay of a logic gate has two components d=f+p parasitic delay effort delay, stage effort f = gh electrical effort = Cout/Cin electrical effort is also called “fanout” logical effort • Logical effort describes relative ability of gate topology to deliver current (defined to be 1 for an inverter) • Electrical effort is the ratio of output to input capacitance June 18, 2003 *from Mathew Sanu / D. Harris 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 35
Delay Logical Effort Parameters: Inverter ic log ( 2. g=2 t) r effo d=gh+p p=3. 8 ps (parasitic delay) Fanout: h =Cin/Cout • d = gh + p • Delay increases linearly with fanout • More complex gates have greater g and p June 18, 2003 *from Mathew Sanu / D. Harris 36 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
Normalized delay: d Normalized Logical Effort: Inverter *from Mathew Sanu / D. Harris 6 5 4 3 2 1 r e rt g=1 p=1 d = gh + p = h+1 e v in effort delay parasitic delay 1 2 3 4 5 Fanout: h = Cout/Cin • Define delay of unloaded inverter = 1 • Define logical effort ‘g’ of inverter = 1 • Delay of complex gates can be defined w. r. t d=1 June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 37
Computing Logical Effort DEF: Logical effort is the ratio of the input capacitance to the input capacitance of an inverter delivering the same output current • Measured from delay vs. fanout plots of simulated gates • Or estimated, counting capacitance in units of transistor W June 18, 2003 *from Mathew Sanu / D. Harris 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 38
L. E for Adder Gates *from Mathew Sanu / D. Harris • Logical effort parameters obtained from simulation for std cells • Define logical effort ‘g’ of inverter = 1 • Delay of complex gates can be defined w. r. t d=1 June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 39
Normalized L. E Gate type Logical Parasitics Eff. (g) (Pinv) Inverter 1 1 Dyn. Nand 0. 6 1. 34 Dyn. CM 0. 6 1. 62 1 3. 71 1. 48 2. 53 Mux 1. 68 2. 93 XOR 1. 69 2. 97 Dyn. CM 4 N Static CM • Logical effort & parasitic delay normalized to that of inverter *from Mathew Sanu June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 40
Delay of a string of gates • Delay of a path, D = di = S g ih i + Sp i • gi & pi are constants • To minimize path delay, optimal values of hi are to be determined D is minimized when each stage bears the same effort, i. e. gihi = g i+1 h i+1 June 18, 2003 *from Mathew Sanu / D. Harris 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 41
Minimizing path delay gi G= Cout(path) hi = Cin(path) • Logical Effort of a string of gates: • Path Electrical Effort: H= Con-path + Coff-path • Branching Effort b= • Path Branching Effort: B= • Path Effort: F=GBH b Con-path i Delay is minimized when each stage bears the same effort: The minimum delay of an N-stage path is: f = gihi = F 1/N NF 1/N + P *from Mathew Sanu / D. 16 th Harris International Symposium on Computer Arithmetic, June 18, 2003 Santiago de Compostela, SPAIN 42
Inclusion of Wire Delay into Logical Effort June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 43
Wiring Load • Wiring in hand analysis – Only lumped capacitance included • Wiring in HSPICE – Short wire: 1 -segment -model RC network – Long wire: 4 -segment -model RC network – Using worst-case wire capacitance • Wire length – Estimated from most critical 1 -bit pitch June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 44
Modeling interconnect cap. • Include interconnect cap in branching factor Coff-path PG Cint CM 0 Con-path b= Con-path + Coff-path Con-path =2 CM 0 Con-path b= Con-path + Coff-path+Cint Con-path =2+I June 18, 2003 CM 0 Adder bitpitch PG Coff-path = 2+ Cint Con-path I : % int. cap to gate cap in 1 adder bitpitch 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 45
Branching g 0 g 1 g 2 g 3 Logical Effort assumes the “branching” factor of this circuit to be 2. This is incorrect and can create inaccuracies 16 th International Symposium on Computer Arithmetic, June 18, 2003 46 Santiago de Compostela, SPAIN
Correction on Branching g 0 g 1 g 2 g 3 f 0 = f 1 , f 2 = f 3 Td 1 = (f 0 + f 1 + parasitics) Td 2 = (f 2 + f 3 + parasitics) Minimum Delay occurs when Td 1 = Td 2 June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 47
“Real” Branching Calculation Branching only equals 2 when: This explains why we had to resort to Excel ! June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 48
Technology Characterization June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 49
Characterization Setup • Logical Effort Requirements: – Equalize input and output transitions. • Logical Effort is characterized by varying the h (Cout/Cin) of a gate. By using a variable load of inverters each gate can be characterized over the same range of loads. • The Logical Effort of each gate is characterized for each input. • Energy is characterized for each output transition of the gate caused by each input transition. i. e. for an inverter: energy is measured for t. LH and t. HL June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 50
LE Characterization Setup for Static Gates • t. LH • t. HL • Average • Energy In Gate. . Variable Load June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 51
LE Characterization Setup for Dynamic Gates • t. HL • Energy In Gate Variable Load June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 52
LE Table (Static CMOS) • Technology: P/N Ratio = 2 INV = 3. 67, p. INV = 4. 29 • Measured on worst-case single-input switching June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 53
Static CMOS Gates: Delay Graphs June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 54
Static Gates: Pull-up Delay Graph June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 55
LE Table (Dynamic CMOS) • Technology: • Minimum-sized keeper included • Measured on all-input switching of worst path June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 56
Dynamic CMOS: Delay Graphs June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 57
Dynamic CMOS: Delay Graphs June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 58
Energy Calculation June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 59
Energy Calculation 16 X Minimal Size Dyn-NAND 8 X Minimal Size Dyn-NAND June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 60
Energy Calculation June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 61
Energy Calculation June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 62
Energy Calculation NAND-2 June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 63
Examples June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 64
64 -Bit Adders • Han-Carlson (prefix-2, HC 2): Static and Dynamic • Han-Carlson (prefix-2, HC 2 -2): Dynamic-Static • Kogge-Stone (prefix-2, KS 2): Static and Dynamic • Kogge-Stone (prefix-2, KS 2 -2): Dynamic. Static • Quaternary-Tree (prefix-2, QT 2): Static and Dynamic Included wire delay, tdelay = 0. 7 Rwire. Cwire June 18, 2003 16 th International Symposium on Computer Arithmetic, 2 Included wire energy, E = C V Santiago de Compostela, SPAIN w wire 65
Test Setup 1 mm wire A 0 Cwire S 0 Adder S 63 A 63 Cwire H=(Cin + Cwire)/Cin June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 66
Energy-Delay Estimates June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 67
Adders: Energy Dynamic: KS, HC QT KS Dynamic-Static June 18, 2003 Static HC 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 68
Dynamic Static Implementation of Carry-Merge stage inverters to be eliminated Regular Domino Implementation June 18, 2003 Compound-Domino Implementation 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 69
Energy-Delay comparison of 64 -bit KS, HC and QT adders June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 70
Adders: Critical Path Energy QT dynamic-static HC-dynamic KS dynamic HC dynamic-static KS dynamic-static June 18, 2003 QT static KS-static 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN HC-static 71
Intel 32 -bit Adder 0. 13 u 1. 2 V [VLSI-2002] KS QT KS estimated QT Estimated June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 72
Energy-Delay comparison of 32 -bit QT and KS adders: estimated vs. simulation in 0. 10 mm technology June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 73
Est. Results: All Adders w/o Wires June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 74
Est. Results: All Adders w/ Wires June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 75
Energy-Delay Trade-offs 90 nm technology Optimized Design Delay Saving Worst Case Energy Vector With 100% Input Activity Energy Saving Initial Design June 18, 2003 Collaboration with Intel AMR 76 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN
Conclusion • Using realistic measures for comparing various designs leads to better design choices • Power is as important as speed • Making comparison in Energy-Delay space is necessary: – power can always be traded for speed and vice versa • Wire effects are significant • Leakage currents ? June 18, 2003 16 th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 77
- Design tradeoffs for ssd performance
- Examples of tradeoffs
- Scarcity forces tradeoffs symbol
- How to find speed with distance and time
- Draw the power triangle
- While driving drivers experience the emotion of
- Speed detection of moving vehicle using speed cameras
- Rhythm stone
- Stanford binet iq test
- Serial adder verilog code
- 175 =
- Simbol half adder
- 4 bit parallel adder truth table
- 4 bit carry look ahead adder
- Bcd subtractor
- 74hc382
- Rangkaian adder yang menjumlahkan banyak bit disebut *
- Compound adder
- Floating point 32 bit
- Circuito sommatore
- Bcd adder vhdl
- Qc.measure qiskit
- Half adder stick diagram
- Fungsi half adder
- Floating point adder vhdl
- Carry propagate adder
- Verilog procedural assignment
- Carry select adder verilog
- Adderq
- Adderq
- Full adder circuit
- Nand 2 tetris
- Full adder equation
- Full adder truth table
- Virtual labs iit kharagpur
- Rangkaian aritmatika
- Outline adder
- Adderq
- Adderq
- Ling adder
- Mealy moore
- Carry select adder
- Bcd 가산기
- Serial adder with accumulator
- One bit full adder
- One bit full adder
- Square root carry select adder
- Penjumlahan bcd
- Floating point division algorithm in computer architecture
- Phasor relationship for circuit elements
- Phasor adder
- Ic komparator
- Binary addition
- Fp adder
- Binary decode
- Manchester carry chain adder in vlsi
- Adderq
- 2's complement adder
- Dynamic nand gate
- Solar power satellites and microwave power transmission
- Actual power and potential power
- Unit of dispersive power
- High speed fpga design
- Input output design example
- Flex28024a
- Power of a power property
- General power rule vs power rule
- Power angle curve in power system stability
- Powerbi in powerpoint
- Power absorbed or supplied
- Scope in research meaning
- Contrast applied research and basic research
- Applied vs fundamental research
- Basic and applied research
- Applied statistics and probability for engineers download
- Al reef institute of logistics and applied technology
- Toxicology and applied pharmacology
- Columbia university power electronics