AccuracyConfigurable Adder for Approximate Arithmetic Designs Andrew B

Accuracy-Configurable Adder for Approximate Arithmetic Designs Andrew B. Kahng, Seokhyeong Kang VLSI CAD LABORATORY, UC San Diego 49 th Design Automation Conference June 6 th, 2012 UC San Diego / VLSI CAD Laboratory -1 -

Outline n Background and Motivation n Accuracy Configurable Adder Design n Experimental Setup and Results n Conclusions and Ongoing Works -2 -

Why Approximate Designs? n Threats to traditional IC design approach. . . Extreme variations: / Reliability issues / Cost: PVT variation uncertainty lead to design overhead n Approximate designs Reliability issues: Relaxing the(NBTI, requirement of Soft correctness can Hard errors latchup), errors (α-particle) dramatically reduce costs of the design Cost: What is the squareof root of 10 accuracy ? Cost (power/performance) perfect is too high! “a little more n than three” Approximate designs Relaxing the requirement“ 3. 162278. . ” of correctness can dramatically reduce costs of the design Approximation could be faster and more powerful -3 -

Previous Approximate Adders Lu et al. IEEE Computer 2004 § Faster adder w/ shorter carry chain § High performance with small error rate § Large area overhead: not applicable for low energy design Zhu et al. TVLSI 2010 § ETAI : accurate part + inaccurate part § Reduce error size § Error rate is high Output accuracy is fixed benefits can be limited by required accuracy -4 -

Our Work: Accuracy-Configurable Approximate Adder How power benefits can be achieved … Accuracy-configurable design adapts to changing requirements by using different modes in each situation -5 -

Our Work: Accuracy-Configurable Approximate Adder How power benefits can be achieved … § Accuracy-configurable approximate adder accuracy: 90% Mode 1: turn-off ECC-1, ECC-2 error collection (ECC-1) accuracy: 95% Mode 2: turn-off ECC-2 error collection (ECC-2) accuracy: 100% Mode 3: turn-on All ECC -6 -

Outline n Background Motivation n Accuracy Configurable Adder Design n Experimental Setup and Results n Conclusions and Ongoing Works -7 -

Approximate Adder Implementation 16 -bit adder case § Carry chain is cut to reduce critical path delay § Sub-adders generate results of partial summation § Middle sub-adder improves accuracy (error 50% 5. 5%) -8 -

Approximate Adder Implementation N-bit adder case carry Probability of correct result : Estimation over CLA (N=16) K 2 3 4 5 6 power 0. 5 0. 87 0. 44 0. 65 1. 05 0. 68 0. 75 1. 12 0. 84 0. 83 1. 15 0. 95 0. 89 1. 12 1. 00 pass rate 0. 554 0. 829 0. 942 0. 982 0. 995 Min. clock cycle area § Approximate adder can be configured with “k” -9 -

Error Detection and Correction Variable latency operation n Error can be detected and corrected with small overhead n n n Error detection: ‘and’ gates Error correction: incrementor circuit Error detection and correction can take more time than critical path delay of “sub-adder”; the throughput can be reduced -10 -

Accuracy Configuration with Pipeline g it n ga r e w po n n Each stage generates a result with different accuracy Can turn off later stages with power gating according to accuracy requirement g it n ga r e w po r e w o p g it n ga Config. Powergating Accuracy Power reduction Mode-1 None 1. 000 -11. 5% Mode-2 Stage 4 0. 960 12. 4% Mode-3 Stage-3, 4 0. 925 31. 0% Mode-4 Stage-2, 3, 4 0. 900 51. 6% -11 -

Outline n Background Motivation n Accuracy Configurable Adder Design n Experimental Setup and Results n Conclusions and Ongoing Works -12 -

Experimental Setup and Metrics n Experimental Setup n n n Library: TSMC 65 GP Implementation: Synopsys Design Compiler Simulation: Cadence NC-SIM Input patterns: random data and actual data Library preparation: Cadence Library Characterizer Accuracy Metrics Metric ACCamp ACCinf n n Definition 1 -|Rc-Re|/Rc 1 -Be/Bw Data type Amplitude data Information data Rc and Re : correct and obtained results Be: number of error bits, Bw: bit-width of data -13 -

Approximate Adder Comparison n Accuracy vs. power consumption Image smoothing (Gaussian filter) (a) (d) (b) (e) (c) (f) (a) (b) (c) (d) (e) (f) Original image Accurate adder ACA (PSNR 24. 5 d. B) ETAI (25. 3 d. B) ETAII (16. 2 d. B) LU (11. 1 d. B) (c)~(f) have 50% power of accurate adder (b) * ETAI cannot detect and correct errors -14 -

Approximate Adder Comparison n Accuracy vs. power consumption w/voltage scaling 0. 900 0. 800 0. 700 0. 600 0. 500 Voltage scaling (1. 0 V~0. 6 V) ACA adder CLA Lu's adder ETAIIM total power (W) 0. 400 2. 00 E-04 4. 00 E-04 6. 00 E-04 8. 00 E-04 n ACCinf 1. 000 ACCamp 1. 000 0. 800 0. 700 0. 600 0. 500 0. 400 2. 00 E-04 ACA adder CLA Lu's adder ETAIIM total power (W) 4. 00 E-04 6. 00 E-04 8. 00 E-04 ACA adder shows fine results (accuracy vs. power) on both ACCamp and ACCinf metrics -15 -

Accuracy Configuration and Power Saving n Power saving from voltage scaling + mode change 6. 00 E-03 voltage scaling Accuracy: 1. 0 → 0. 9 4. 00 E-03 mode change 3. 00 E-03 Conventional pipelined adder ACA adder (mode 1) ACA adder (mode 2) ACA adder (mode 3) ACA adder (mode 4) 2. 00 E-03 1. 00 E-03 0. 80 0. 85 0. 90 ACCinf 0. 95 1. 00 Accuracy configuration w/ mode change is more effective than w/ voltage scaling mode change 5. 00 E-03 0. 00 E+00 n accurate result 4 X reduction 7. 00 E-03 voltage scaling total power consumption (W) 4 -stage 32 -bit adder case -16 -

Accuracy Configuration and Power Saving n Power consumption when accuracy requirement is varying (w/ SPEC 2006 benchmarks) mode-4 mode-3 mode-2 mode-1 0. 6 0. 4 0. 2 ex pl so ng sje m cf f re 64 h 2 gc c lix 2 ca lcu ip bz ar 0 High accuracy 0. 95 Accuracy 1. 00 0. 8 as t Normalized power consumption 1 Average 30% power savings over no accuracy configuration -17 -

Outline n Background Motivation n Accuracy Configurable Adder Design n Experimental Setup and Results n Conclusions and Ongoing Works -18 -

Conclusions and Ongoing Works n Conclusions n n n We proposed accuracy-configurable approximate (ACA) adder, which can adapt to changing accuracy requirement ACA can provide 30% power reduction with accuracy configuration during runtime Ongoing Works n n Accuracy-configurable design for other arithmetic units (multiplier, divider) Automated synthesis flow (minimize power under the required accuracy) RTL Required accuracy exact adder approximate adder Accuracy estimation Synthesis -19 -

Thank You! -20 -

Accuracy-Configurable Approximate Design § Required accuracy can change during runtime § Idea of High-Efficiency Math highlighted by Intel Labs at ISSCC-2012 § Variable-precision floating point unit w/ accuracy tracking : 24 -bit 12 -bit 6 -bit as needed Variable-precision Mantissa § Accuracy-configurable design adapts to changing requirements, maximizing benefits of approximate design paradigm -21 -