Progress towards SelfOptimising and SelfVerifying Design Wayne Luk
- Slides: 35
Progress towards Self-Optimising and Self-Verifying Design Wayne Luk Imperial College Stamatis Vassiliadis Symposium 19 July 2017
Outline 1. Back to the future: where we start 2. Example: remote sensing 3. Self-optimisation: incremental machine learning 4. Self-verification: monitoring based on in-circuit assertions 5. Summary Acknowledgement: S. J. Wang, N. Ma, Y. Peng: Harbin Institute of Technology X. Niu, T. Todman, S. Shao: Imperial College S. Stilkerich: Airbus E. Hung: Invionics P. H. W. Leong: University of Sydney M. Flynn: Stanford University and Maxeler O. Mencer, G. Gaydadjiev, T. Becker: Maxeler 2
1. Back to the Future of Computing: 2007 • self * (optimising + verifying) = trusted re-use - unify: autonomic, self-test, dynamic optimization… - better design + more productive 3
1. Back to the Future of Computing: 2007 • self * (optimising + verifying) = trusted re-use - unify: autonomic, self-test, dynamic optimization… - better design + more productive • self-optimising self-verifying design platform - systems based on field-programmable technology: large + small - autonomous system-on-chip + network of ASOCs (M. Flynn) - applications: ubiquitous, dependable, secure, robust source: Amazon
Architecture New models Models: external + internal Self-Optimiser Self-Verifier inputs: external + internal Outputs: external + internal
Architecture New models Inputs: external + internal Models: external + internal Self-Optimiser: incremental machine learning Self-Verifier: in-circuit assertions Outputs: external + internal • 10 -year advances: custom computing + dataflow machines - self-optimisation: incremental machine learning (data analysis) - self-verification: in-circuit assertions (low-overhead monitoring) - applicable to data-centre computing and embedded systems
Custom computing • conventional computing: fit program to processor Program Software Tools Fixed Processor • custom computing: fit processor to program Program Software + Hardware Tools Customised Processor • customise operation + data: field programmable technology
FPGA: Field Programmable Gate Array Logic Cell (105 elements) DSP Block IO Block Xilinx Virtex-6 FPGA Block RAM source: Maxeler DSP Block RAM (20 TB/s)
Custom computing + dataflow machines = CDM MAIN MEMORY Custom Dataflow Machine (CDM) Only the final results DFE MEMORY K 1. Java – Data. Flow Engine – (compute) Kernels on FPGA c DFEs run for very long times DFE is a customisable accelerator MEMORY K 3 K 2 Compiler K 5 * DFE * K 1. . K 9 MEMORY K 6 K 8 MEMORY K 7 K 4 K 9 MEMORY source: Maxeler
Accelerate clouds: Microsoft + Amazon www. top 500. org/news/microsoft-goes-all-in-for-fpgas-to-build-out-cloud-based-ai/ aws. amazon. com/ec 2/instance-types/f 1/ (based on Maxeler Data. Flow Engines)
2. Example: remote sensing with hyperspectral imaging • spectral bands > 200 • Image data > 50 GBps • downlink < 10 Gbps Source: http: //www. markelowitz. com/Hyperspectral. html 11
Hyper. Spectral image classification Multiple sensor images Spectrum curve One image Pixel Pseudo color image Data cube Large computation under strict power constraint: 30 Gops/s @20 W 12
3. Self-optimisation: multi-class SVM classifiers • SVM: Support Vector Machine for binary classification • multi-class: each class being possible interpretation of image data • One-Against-One: best accuracy when used with Hamming Distance 1 vs 2 1 vs 3 7 vs 8 Voting 13
OAO Multiple classifiers with Hamming Distance Image data Binary Classifiers (BC) 1 vs 2 1 vs 3 (K-1) vs K T+1 = K×(K-1)/2 from training Hamming code 0 1 2 T Class 1 ID 0 1 2 T Class 2 ID 0 1 2 T Class K ID 0 1 2 T Hamming Distance Judge Class label • Hamming Distance of 2 strings: number of corresponding positions that are different • compare 1 vs 2, 1 vs 3… results with an Identifying Code for Class 1 etc • small Hamming Distance: image data pixel is in this class 14
Algorithm for multi-class SVM classifier Radial basis function hyperparameters, found by training (Identify each class) (treat X as 0) 15
Accelerator architecture BC: Binary Classifier 16
Binary Classifier: datapath of kernel Radial basis function hyperparameters, found by training 17
Evaluation • hardware platform - Maxeler MAX 4 DFE (Data. Flow Engine) - Altera Stratix V 5 SGSMD 8 N 2 F 45 C 2 FPGA • data sets - Airborne Visible Infra-Red Imaging Spectrometer (AVIRIS), Northwestern Indiana scene and Salinas Valley scene - 224 spectral bands - 16 classes 18
Experimental results Overall Accuracy Comparison Methods OA on 1 st image(%) OA on 2 nd image(%) Our method 98. 3 97. 8 ANN based Adaboost 98. 02 - MLRsub 92. 5 - HA-PSO-SVM 98. 2 - Sd. A 91. 9 95. 5 FPGA Resources Utilization Resources Logics FFs DSPs Block Mem Used 234666 443688 1680 1715 Available 262400 524800 1963 2567 Utilization 89. 43% 84. 55% 85. 58% 66. 81% 19
Runtime and energy consumption Platform Zynq ARM DSP Xeon DFE T(μs/Pixel) 25. 8 1321. 2 65. 8 14. 1 0. 99 Power(W) 3. 9 3. 3 16 95 26. 3 E(m. J/Pixel) 0. 1 4. 3 1. 05 1. 33 0. 03 Speedup 1334. 5 66. 4 14. 2 1 • • • 26. 0 Zynq: XC 7 Z 020 ARM: Cortex A 9 @667 MHz DSP: TMS 320 C 6678 8 cores@1 GHz Xeon: Inter E 5 -2620 12 cores, Open. MP optimized DFE Running frequency: 120 MHz 8 Millions Pixels for Xeon test and 1 Million Pixels for others 20
Run-time adaptation: incremental SVM training • batch training: QP problem, global optimum by KKT conditions • incremental training: update existing KKT conditions QP: quadratic programming KKT: Karush-Kuhn-Tucker 21
Hardware architecture • incremental SVM update: dense linear algebra • dataflow architecture updating SVM coefficients - supports both incremental and decremental training - all operations with quadratic time complexity parallelised - scalable Prototype on Maxeler Dataflow Engine: over 40 times faster than optimized software 22
4. Monitoring: in-circuit assertions • assertions: monitoring standard datapath – Boolean expressions: properties of function, timing… – when true, standard datapath is behaving as expected – in-circuit: runs at the same rate as standard datapath – propagate to software as extra outputs • example: statistical assertions – adaptation can depend on signal statistics – assertion language e = a | uop e | e bop e | mean(e) | variance(e) | … 23
Statistical assertion: self-adaptive system 24
Case study: smart avionics • air-speed sensor: Pitot tube • can fail when frozen could still fail 25
True airspeed: statistical check • true airspeed: important input to avionics • statistics on true airspeed: indicate sensor failure – trigger self-adaptation • standard datapath: calculate true airspeed for sensor – monitored by in-circuit variance operators standard datapath 26
Assertions: efficient implementation • properties to be monitored – functional – statistical – timing • run-time hardware monitoring – high-level description: assertion – same speed as hardware to be monitored – provably-correct optimisation – minimum area overhead: O(N) -> O(log. N) 27
Optimising assertion: correctness-preserving transformation • obvious (S = ) • efficient 28 • proof: use algebraic transformations in Ruby language
Assertions: efficient implementation • properties to be monitored – functional – statistical – timing • run-time hardware monitoring – high-level description: assertion – same speed as hardware to be monitored – provably-correct optimisation – minimum area overhead: O(N) -> O(log. N) 29
Assertions: efficient implementation • properties to be monitored – functional – statistical – timing • run-time hardware monitoring – high-level description: assertion – same speed as hardware to be monitored – provably-correct optimisation – minimum area overhead: O(N) -> O(log. N) -> O() – minimise compile time 30
Self-monitoring without overhead • user circuit implements standard datapath • add monitoring to user circuit – introduce new Monitoring Circuit – without modifying user design – use only spare resources on chip • accelerate Monitoring Circuit – pipeline its input connections 31 Monitoring Circuit
New design flow: resource-aware development Monitoring Circuit (XDL) • minimum area overhead: O(N) -> O() • reduce compile time 32
Self-monitoring circuitry: results • pipeline circuits to added hardware to reduce / eliminate impact on timing • up to 3. 9 times faster on large circuits (LEON 3 CPU) than re-compilation 33 Assertion: PC in range Assertion: statistics of AES output
Summary • current and future work – tools: automate implementation and verification – applications: adaptive and resilient systems – extension: assertion management and optimisation – unification: with self-tuning control, approx. computing… – prototyping: next-generation data centres, planes, drones… 34
Summary • current and future work – tools: automate implementation and verification – applications: adaptive and resilient systems – extension: assertion management and optimisation – unification: with self-tuning control, approx. computing… – prototyping: next-generation data centres, planes, drones… • future computing systems: custom dataflow machines – incremental machine learning: self-optimisation – assertion-based monitoring: self-verification – resource-aware development: reduced space/time overhead 35
- Wayne luk
- Physical progress and financial progress
- Luk jenž od západu napíná se
- Spolocenstvo luk
- Kruznica i prava
- čím sa živí lúčny koník
- 100-lük kvadrat cədvəli
- Luk jenž od západu napíná se
- Ulrich beck risikosamfundet
- Luk samfundet op
- Pritini
- Centar refleksne radnje
- Luk samfundet op
- Luk samfundet op
- Luk samfundet op
- Luk samfundet op
- Romanika umetnost
- Lukas 5 17-26 auslegung
- Luk vanhauwaert
- Co mieści wielki łuk braterstwa w paryskiej dzielnicy
- Lukas 15:22
- Luk katalog
- Epp zavarivanje
- Mok grill
- Hotel nizi
- Rolleovertagelse
- Compare torvald and nora's attitude towards money
- Loyalty and devotion towards a nation.
- Steve jobs, steve wozniak and ronald wayne
- Progress and performance measurement and evaluation
- Evaluation in progress
- Towards a language-based theory of learning
- What is the author's attitude toward a subject
- Together towards improvement
- Sand: towards high-performance serverless computing
- E hrdsa