Progress towards SelfOptimising and SelfVerifying Design Wayne Luk

  • Slides: 35
Download presentation
Progress towards Self-Optimising and Self-Verifying Design Wayne Luk Imperial College Stamatis Vassiliadis Symposium 19

Progress towards Self-Optimising and Self-Verifying Design Wayne Luk Imperial College Stamatis Vassiliadis Symposium 19 July 2017

Outline 1. Back to the future: where we start 2. Example: remote sensing 3.

Outline 1. Back to the future: where we start 2. Example: remote sensing 3. Self-optimisation: incremental machine learning 4. Self-verification: monitoring based on in-circuit assertions 5. Summary Acknowledgement: S. J. Wang, N. Ma, Y. Peng: Harbin Institute of Technology X. Niu, T. Todman, S. Shao: Imperial College S. Stilkerich: Airbus E. Hung: Invionics P. H. W. Leong: University of Sydney M. Flynn: Stanford University and Maxeler O. Mencer, G. Gaydadjiev, T. Becker: Maxeler 2

1. Back to the Future of Computing: 2007 • self * (optimising + verifying)

1. Back to the Future of Computing: 2007 • self * (optimising + verifying) = trusted re-use - unify: autonomic, self-test, dynamic optimization… - better design + more productive 3

1. Back to the Future of Computing: 2007 • self * (optimising + verifying)

1. Back to the Future of Computing: 2007 • self * (optimising + verifying) = trusted re-use - unify: autonomic, self-test, dynamic optimization… - better design + more productive • self-optimising self-verifying design platform - systems based on field-programmable technology: large + small - autonomous system-on-chip + network of ASOCs (M. Flynn) - applications: ubiquitous, dependable, secure, robust source: Amazon

Architecture New models Models: external + internal Self-Optimiser Self-Verifier inputs: external + internal Outputs:

Architecture New models Models: external + internal Self-Optimiser Self-Verifier inputs: external + internal Outputs: external + internal

Architecture New models Inputs: external + internal Models: external + internal Self-Optimiser: incremental machine

Architecture New models Inputs: external + internal Models: external + internal Self-Optimiser: incremental machine learning Self-Verifier: in-circuit assertions Outputs: external + internal • 10 -year advances: custom computing + dataflow machines - self-optimisation: incremental machine learning (data analysis) - self-verification: in-circuit assertions (low-overhead monitoring) - applicable to data-centre computing and embedded systems

Custom computing • conventional computing: fit program to processor Program Software Tools Fixed Processor

Custom computing • conventional computing: fit program to processor Program Software Tools Fixed Processor • custom computing: fit processor to program Program Software + Hardware Tools Customised Processor • customise operation + data: field programmable technology

FPGA: Field Programmable Gate Array Logic Cell (105 elements) DSP Block IO Block Xilinx

FPGA: Field Programmable Gate Array Logic Cell (105 elements) DSP Block IO Block Xilinx Virtex-6 FPGA Block RAM source: Maxeler DSP Block RAM (20 TB/s)

Custom computing + dataflow machines = CDM MAIN MEMORY Custom Dataflow Machine (CDM) Only

Custom computing + dataflow machines = CDM MAIN MEMORY Custom Dataflow Machine (CDM) Only the final results DFE MEMORY K 1. Java – Data. Flow Engine – (compute) Kernels on FPGA c DFEs run for very long times DFE is a customisable accelerator MEMORY K 3 K 2 Compiler K 5 * DFE * K 1. . K 9 MEMORY K 6 K 8 MEMORY K 7 K 4 K 9 MEMORY source: Maxeler

Accelerate clouds: Microsoft + Amazon www. top 500. org/news/microsoft-goes-all-in-for-fpgas-to-build-out-cloud-based-ai/ aws. amazon. com/ec 2/instance-types/f 1/

Accelerate clouds: Microsoft + Amazon www. top 500. org/news/microsoft-goes-all-in-for-fpgas-to-build-out-cloud-based-ai/ aws. amazon. com/ec 2/instance-types/f 1/ (based on Maxeler Data. Flow Engines)

2. Example: remote sensing with hyperspectral imaging • spectral bands > 200 • Image

2. Example: remote sensing with hyperspectral imaging • spectral bands > 200 • Image data > 50 GBps • downlink < 10 Gbps Source: http: //www. markelowitz. com/Hyperspectral. html 11

Hyper. Spectral image classification Multiple sensor images Spectrum curve One image Pixel Pseudo color

Hyper. Spectral image classification Multiple sensor images Spectrum curve One image Pixel Pseudo color image Data cube Large computation under strict power constraint: 30 Gops/s @20 W 12

3. Self-optimisation: multi-class SVM classifiers • SVM: Support Vector Machine for binary classification •

3. Self-optimisation: multi-class SVM classifiers • SVM: Support Vector Machine for binary classification • multi-class: each class being possible interpretation of image data • One-Against-One: best accuracy when used with Hamming Distance 1 vs 2 1 vs 3 7 vs 8 Voting 13

OAO Multiple classifiers with Hamming Distance Image data Binary Classifiers (BC) 1 vs 2

OAO Multiple classifiers with Hamming Distance Image data Binary Classifiers (BC) 1 vs 2 1 vs 3 (K-1) vs K T+1 = K×(K-1)/2 from training Hamming code 0 1 2 T Class 1 ID 0 1 2 T Class 2 ID 0 1 2 T Class K ID 0 1 2 T Hamming Distance Judge Class label • Hamming Distance of 2 strings: number of corresponding positions that are different • compare 1 vs 2, 1 vs 3… results with an Identifying Code for Class 1 etc • small Hamming Distance: image data pixel is in this class 14

Algorithm for multi-class SVM classifier Radial basis function hyperparameters, found by training (Identify each

Algorithm for multi-class SVM classifier Radial basis function hyperparameters, found by training (Identify each class) (treat X as 0) 15

Accelerator architecture BC: Binary Classifier 16

Accelerator architecture BC: Binary Classifier 16

Binary Classifier: datapath of kernel Radial basis function hyperparameters, found by training 17

Binary Classifier: datapath of kernel Radial basis function hyperparameters, found by training 17

Evaluation • hardware platform - Maxeler MAX 4 DFE (Data. Flow Engine) - Altera

Evaluation • hardware platform - Maxeler MAX 4 DFE (Data. Flow Engine) - Altera Stratix V 5 SGSMD 8 N 2 F 45 C 2 FPGA • data sets - Airborne Visible Infra-Red Imaging Spectrometer (AVIRIS), Northwestern Indiana scene and Salinas Valley scene - 224 spectral bands - 16 classes 18

Experimental results Overall Accuracy Comparison Methods OA on 1 st image(%) OA on 2

Experimental results Overall Accuracy Comparison Methods OA on 1 st image(%) OA on 2 nd image(%) Our method 98. 3 97. 8 ANN based Adaboost 98. 02 - MLRsub 92. 5 - HA-PSO-SVM 98. 2 - Sd. A 91. 9 95. 5 FPGA Resources Utilization Resources Logics FFs DSPs Block Mem Used 234666 443688 1680 1715 Available 262400 524800 1963 2567 Utilization 89. 43% 84. 55% 85. 58% 66. 81% 19

Runtime and energy consumption Platform Zynq ARM DSP Xeon DFE T(μs/Pixel) 25. 8 1321.

Runtime and energy consumption Platform Zynq ARM DSP Xeon DFE T(μs/Pixel) 25. 8 1321. 2 65. 8 14. 1 0. 99 Power(W) 3. 9 3. 3 16 95 26. 3 E(m. J/Pixel) 0. 1 4. 3 1. 05 1. 33 0. 03 Speedup 1334. 5 66. 4 14. 2 1 • • • 26. 0 Zynq: XC 7 Z 020 ARM: Cortex A 9 @667 MHz DSP: TMS 320 C 6678 8 cores@1 GHz Xeon: Inter E 5 -2620 12 cores, Open. MP optimized DFE Running frequency: 120 MHz 8 Millions Pixels for Xeon test and 1 Million Pixels for others 20

Run-time adaptation: incremental SVM training • batch training: QP problem, global optimum by KKT

Run-time adaptation: incremental SVM training • batch training: QP problem, global optimum by KKT conditions • incremental training: update existing KKT conditions QP: quadratic programming KKT: Karush-Kuhn-Tucker 21

Hardware architecture • incremental SVM update: dense linear algebra • dataflow architecture updating SVM

Hardware architecture • incremental SVM update: dense linear algebra • dataflow architecture updating SVM coefficients - supports both incremental and decremental training - all operations with quadratic time complexity parallelised - scalable Prototype on Maxeler Dataflow Engine: over 40 times faster than optimized software 22

4. Monitoring: in-circuit assertions • assertions: monitoring standard datapath – Boolean expressions: properties of

4. Monitoring: in-circuit assertions • assertions: monitoring standard datapath – Boolean expressions: properties of function, timing… – when true, standard datapath is behaving as expected – in-circuit: runs at the same rate as standard datapath – propagate to software as extra outputs • example: statistical assertions – adaptation can depend on signal statistics – assertion language e = a | uop e | e bop e | mean(e) | variance(e) | … 23

Statistical assertion: self-adaptive system 24

Statistical assertion: self-adaptive system 24

Case study: smart avionics • air-speed sensor: Pitot tube • can fail when frozen

Case study: smart avionics • air-speed sensor: Pitot tube • can fail when frozen could still fail 25

True airspeed: statistical check • true airspeed: important input to avionics • statistics on

True airspeed: statistical check • true airspeed: important input to avionics • statistics on true airspeed: indicate sensor failure – trigger self-adaptation • standard datapath: calculate true airspeed for sensor – monitored by in-circuit variance operators standard datapath 26

Assertions: efficient implementation • properties to be monitored – functional – statistical – timing

Assertions: efficient implementation • properties to be monitored – functional – statistical – timing • run-time hardware monitoring – high-level description: assertion – same speed as hardware to be monitored – provably-correct optimisation – minimum area overhead: O(N) -> O(log. N) 27

Optimising assertion: correctness-preserving transformation • obvious (S = ) • efficient 28 • proof:

Optimising assertion: correctness-preserving transformation • obvious (S = ) • efficient 28 • proof: use algebraic transformations in Ruby language

Assertions: efficient implementation • properties to be monitored – functional – statistical – timing

Assertions: efficient implementation • properties to be monitored – functional – statistical – timing • run-time hardware monitoring – high-level description: assertion – same speed as hardware to be monitored – provably-correct optimisation – minimum area overhead: O(N) -> O(log. N) 29

Assertions: efficient implementation • properties to be monitored – functional – statistical – timing

Assertions: efficient implementation • properties to be monitored – functional – statistical – timing • run-time hardware monitoring – high-level description: assertion – same speed as hardware to be monitored – provably-correct optimisation – minimum area overhead: O(N) -> O(log. N) -> O() – minimise compile time 30

Self-monitoring without overhead • user circuit implements standard datapath • add monitoring to user

Self-monitoring without overhead • user circuit implements standard datapath • add monitoring to user circuit – introduce new Monitoring Circuit – without modifying user design – use only spare resources on chip • accelerate Monitoring Circuit – pipeline its input connections 31 Monitoring Circuit

New design flow: resource-aware development Monitoring Circuit (XDL) • minimum area overhead: O(N) ->

New design flow: resource-aware development Monitoring Circuit (XDL) • minimum area overhead: O(N) -> O() • reduce compile time 32

Self-monitoring circuitry: results • pipeline circuits to added hardware to reduce / eliminate impact

Self-monitoring circuitry: results • pipeline circuits to added hardware to reduce / eliminate impact on timing • up to 3. 9 times faster on large circuits (LEON 3 CPU) than re-compilation 33 Assertion: PC in range Assertion: statistics of AES output

Summary • current and future work – tools: automate implementation and verification – applications:

Summary • current and future work – tools: automate implementation and verification – applications: adaptive and resilient systems – extension: assertion management and optimisation – unification: with self-tuning control, approx. computing… – prototyping: next-generation data centres, planes, drones… 34

Summary • current and future work – tools: automate implementation and verification – applications:

Summary • current and future work – tools: automate implementation and verification – applications: adaptive and resilient systems – extension: assertion management and optimisation – unification: with self-tuning control, approx. computing… – prototyping: next-generation data centres, planes, drones… • future computing systems: custom dataflow machines – incremental machine learning: self-optimisation – assertion-based monitoring: self-verification – resource-aware development: reduced space/time overhead 35