Disciplined Approximate Computing From Language to Hardware and
- Slides: 49
Disciplined Approximate Computing: From Language to Hardware and Beyond Adrian Sampson, Hadi Esmaelizadeh, 1 Michael Ringenburg, Reneé St. Amant, 2 Luis Ceze, Dan Grossman, Mark Oskin, Karin Strauss, 3 and Doug Burger 3 University of Washington Now at Georgia Tech 2 UT-Austin 3 Microsoft Research 1
Approximation with HW eyes. . .
Approximation with SW eyes. . . • Approximation is yet another complexity ‒ Most of a code base can’t handle it • But many expensive kernels are naturally error-tolerant ‒ Multiple “good enough” results
Must work across the stack… User Application Algorithm Language Compiler Architecture Circuits Devices • From above: Where approximation acceptable • From below: More savings for less effort Working at one level is inherently limiting
Errors Energy
Across the stack • Higher level knows pixel buffer vs. header bits and control flow • Lower level knows how to save energy for approximate pixel buffer We are working on: 1. Expressing and monitoring disciplined approximation 2. Exploiting approximation in architectures and devices 3. Much, much more…
Plan for today Application language for where-toapproximate Ener. J Monitoring dynamic quality evaluation Compiler traditional architecture Truffle Architecture. NPU neural networks as accelerators Circuits Approximate Storage Approximate Wi-Fi give high-level results mention in passing
Application language for where-toapproximate Ener. J: Approximate Data Types for Safe and General Low-Power Computation, PLDI 2011
Safety ✓ Statically separate critical and non -critical program components Precise ✗Approximate Generality Various approximation strategies encapsulated Disciplined Approximate Programming Storage Logic Algorithms λ Comunication
Type qualifiers @approx int a =. . . ; @precise int p =. . . ; ✗ ✓ p = a; a = p;
Control flow: implicit flows @approx int a =. . . ; @precise int p =. . . ; ✗ if (a == 10) { p = 2; }
Endorsement: escape hatch @approx int a = expensive(); @precise int p; ✓ p = endorse(a); quick. Checksum(p); output(p);
Algorithmic approximation with Polymorphism class Float. Set { @context float[] nums =. . . ; Also in the PLDI’ 11 paper: float mean() { calculate mean Formal semantics } Noninterference claim @approx float mean_APPROX() { Object-layout calculate meanissues of half } Simulation results } … @approx Float. Set some. Set =. . . ; some. Set. mean();
Application language for where-toapproximate Ener. J Monitoring [under submission] dynamic quality evaluation
Controlling approximation • • Ener. J: approximate cannot interfere with precise But semantically, approximate results are unspecified “best effort” • • • “Worst effort” of “skip; return 0” is “correct” Only higher levels can measure quality: inherently app-specific Lower levels can make monitoring convenient
Quality of Result (Qo. R) monitoring Offline: Profile, auto-tune • Pluggable energy model Online: Can react – recompute or decrease approximation-level • Handle situations not seen during testing • But needs to be cheap! First design space of online monitors • Identify several common patterns • And apps where they work well
Online monitoring approaches General Strawman: Rerun computations precisely; compare. More energy than no approximation. Sampling: Rerun some computations precisely; compare. Verification Functions: Run a cheap check after every computation (e. g. , within expected bounds, solves equation, etc. ) Fuzzy Memoization: Record past inputs and outputs of computations. Check that “similar” inputs produce “similar” outputs.
Results [See paper for energy model, benchmarks, etc. ] Application Monitor Type Ray Tracer Sampling Asteroids Verifier Triangle Verifier Sobel Fuzzy Memo FFT Fuzzy Memo
Results [See paper for energy model, benchmarks, etc. ] Application Monitor Type Precise Approx No Monitor Ray Tracer Sampling 100% 67% Asteroids Verifier 100% 91% Triangle Verifier 100% 83% Sobel Fuzzy Memo 100% 86% FFT Fuzzy Memo 100% 73%
Results [See paper for energy model, benchmarks, etc. ] Application Monitor Type Precise Approx Retained No Monitor Savings Ray Tracer Sampling 100% 67% 85% 44% Asteroids Verifier 100% 91% 95% 56% Triangle Verifier 100% 83% 87% 78% Sobel Fuzzy Memo 100% 86% 93% 49% FFT Fuzzy Memo 100% 73% 82% 70%
Application language for where-toapproximate Ener. J Monitoring Compiler traditional architecture Truffle Architecture Support for Disciplined Approximate Programming, ASPLOS 2012 dynamic quality evaluation
Hardware support for disciplined approximate execution int p = 5; @approx int a = 7; for (int x = 0. . ) { a += func(2); @approx int z; z = p * 2; p += 4; } a /= 9; func 2(p); a += func(2); @approx int y; z = p * 22 + z; p += 10; VDDH Compiler Truffle Core VDDL
Approximation-aware ISA ld ld add st 0 x 04 r 1 0 x 08 r 2 r 1 r 2 r 3 0 x 0 c r 3
Approximation-aware ISA ld ld add. a st. a 0 x 04 r 1 0 x 08 r 2 r 1 r 2 r 3 0 x 0 c r 3 operations ALU storage registers caches main memory
Approximate semantics add. a r 1 r 2 r 3: some value writes the sum of r 1 and r 2 to r 3 Informally: r 3 gets something approximating Still guarantee: r 1 + r 2 • No other register modified • No floating point trap raised Actual error pattern depends on • Program counter doesn’t jump approximation technique, • No missiles launched microarchitecture, voltage, process, • … variation, …
Dual-voltage pipeline replicated functional units Fetch Decode Reg Read Branch Predictor Instruction Cache Execute Memory Write Back Integer FU Data Cache Decoder ITLB Register File FP FU DTLB dual-voltage SRAM arrays 7– 24% energy saved on average (fft, game engines, raytracing, QR code readers, etc) (scope: processor + memory)
Diminishing returns on data Fetch Decode Reg Read Branch Predictor Instruction Cache ITLB Execute Memory Write Back Integer FU Data Cache Decoder Register File FP FU DTLB • Instruction control cannot be approximated ‒ Can we eliminate instruction bookkeeping?
Application language for where-toapproximate Ener. J Monitoring dynamic quality evaluation Compiler traditional architecture Truffle Architecture. NPU Neural Acceleration for General-Purpose Approximate Programs, MICRO 2012 neural networks as accelerators
Using Neural Networks as Approximate Accelerators Very efficient hardware implementations! Trainable to mimic many computations! CPU Recall is imprecise. Fault tolerant [Temam, ISCA 2012] Map entire regions of code to an (approximate) accelerator: ‒ Get rid of instruction processing (von Neumann bottleneck)
Neural acceleration Program
Neural acceleration Find an approximable program component Program
Neural acceleration Find an approximable program component Program Compile the program and train a neural network
Neural acceleration Find an approximable program component Program Compile the program and train a neural network Execute on a fast Neural Processing Unit (NPU)
Example: Sobel filter @approx float grad(approx float[3][3] p) { … } @approx float dst[][]; edge. Detection() void edge. Detection(a. Image &src, a. Image &dst) { for (int y = …) { for (int x = …) { dst[x][y] = grad(window(src, x, y)); } } }
Code region criteria √ Approximable √ Well-defined no side effects inputs and outputs small errors do not corrupt output
Code observation record(p); record(result); p [[NPU]] float grad(float[3][3] p) { … } + test cases void edge. Detection(Image &src, Image &dst) { for (int y = …) { for (int x = …) { dst[x][y] = grad(window(src, x, y)); } } } instrumented program => grad(p) 323, 231, 122, 93, 321, 49 49, 423, 293, 2 34, 129, 493, 49, 31, 11 21, 85, 47, 62, 21, 577 7, 55, 28, 96, 552, 921 5, 129, 493, 49, 31, 11 49, 423, 293, 2 34, 129, 72, 49, 5, 2 323, 231, 122, 93, 321, 49 6, 423, 293, 2 ➝ ➝ ➝ ➝ ➝ 53. 2 94. 2 1. 2 64. 2 18. 1 92. 2 6. 5 120 53. 2 49. 7 sample arguments & outputs
Training Inputs Backpropagation Training
Training Inputs 70% 98% 99% faster less robust slower more accurate
Neural Processing Unit (NPU) Core NPU
Software interface: ISA extensions input enq. d Core output deq. d enq. c deq. c configuration NPU
A digital NPU Bus Scheduler scheduling Processing Engines input output
A digital NPU scheduling multiply-add unit Bus Scheduler input neuron weights output accumulator sigmoid LUT Processing Engines Many other implementations FPGAs Analog input output Hybrid HW/SW SW only? . . .
How does the NN look in practice? edge detection triangle intersection 88 static instructions 56% of dynamic instructions 1, 079 static x 86 -64 instructions 97% of dynamic instructions 18 neurons 60 neurons 2 hidden layers
Results Summary 2. 3 x average speedup Ranges from 0. 8 x to 11. 1 x Also in the MICRO paper: 3. 0 x average energy reductionlatency Sensitivity to communication Sensitivitybenefit to NN evaluation efficiency All benchmarks Sensitivity to PE count Quality loss belowstatistics 10% in all cases Benchmark All-software NN slowdown • As always, quality metrics application-specific
Ongoing effort Disciplined Approximate Programming (Ener. J, Ener. C, . . . ) Approximate Compiler X 86, ARM off-the-shelf Truffle: dual-Vdd u. Arch Approximate Accelerators Approximate compiler (no special HW) • Analog components • Storage via PCM • Wireless communication • Approximate HDL • Approximate Wireless Networks and Storage
Conclusion Goal: approximate computing across the system Key: Co-design programming model with approximation techniques: disciplined approximate programming Early results encouraging Approximate computing can potentially save our bacon in a post-Dennard era and be in the survival kit for dark silicon
Disciplined Approximate Computing: From Language to Hardware and Beyond University of Washington Adrian Sampson, Hadi Esmaelizadeh, Michael Ringenburg, Reneé St. Amant, Luis Ceze, Dan Grossman, Mark Oskin, Karin Strauss, and Doug Burger Thanks!
- Approximate computing
- Approximate computing
- External hardware
- Intellectually disciplined
- Dac certified disciplined agilist (cda)
- Disciplined agile consortium
- Shuhari agile
- Superlative tasty
- The five minds
- Disciplined agile delivery roles
- What is discipline
- Vhdl hardware description language
- Conventional computing and intelligent computing
- Hardware assisted virtualization in cloud computing
- Fast exact and approximate geodesics on meshes
- Mb unit
- Board ga,e
- Narrative definition poetry
- The approximate dates of the baroque period are
- What is consonance
- A building bent deflects in the same way as
- A guided tour to approximate string matching
- Approximate counting algorithm
- The diagram below represents the placoderm fish
- It is the inner terminus of the fingerprints pattern.
- Lshzoo
- Approximate the best fitting line for the data
- Asymmetrically balanced art
- What is the approximate percentage of oxygen in the air?
- Sketch techniques for approximate query processing
- Cell decomposition diagram
- What is the approximate
- Refrain figurative language
- Collector feedback bias
- Times are approximate
- A guided tour to approximate string matching
- Vhsic hardware description language
- Language
- Hardware description language
- The verilog® hardware description language
- Vhsic hardware description language
- Hardware programming language
- Verilog
- Open computing language
- Hardware and control structures
- Hardware acquisition in system analysis and design
- Diffrence between hardware and software
- It infrastructure objectives
- It essentials virtual desktop
- Definition of hardware and software