Building Fake Body Parts Digital Mockups Frank Vahid

  • Slides: 31
Download presentation
Building Fake Body Parts: Digital Mockups Frank Vahid Univ. of California, Riverside Support provided

Building Fake Body Parts: Digital Mockups Frank Vahid Univ. of California, Riverside Support provided by NSF, SRC, and Care. Fusion 1

Building fake body parts http: //www. nhlbi. nih. gov/ • How test medical equipment

Building fake body parts http: //www. nhlbi. nih. gov/ • How test medical equipment software? 2

Simulation: Slow/Inaccurate Weibel lung complexity 4 gen: 32 ODEs 6 gen: 128 ODEs 8

Simulation: Slow/Inaccurate Weibel lung complexity 4 gen: 32 ODEs 6 gen: 128 ODEs 8 gen: 512 ODEs 10 gen: 2048 ODEs Accurate simulation is slow 2 -3 minutes to simulate one breath accurately Decrease accuracy for real-time 3

Mockups Physical phenomena Digital communication http: //www. youtu be. com/watch? f eature=player_e mbedded&v=rb 0

Mockups Physical phenomena Digital communication http: //www. youtu be. com/watch? f eature=player_e mbedded&v=rb 0 ik 1 Hop. Bk Transducers Processing Core Device Physical mockup Physical phenomena disconnected Transducers Device Processing Core How run in real-time? Device Transducer models Environment Model Digital Mockup Intercepted transducer packets 4

Physical models are inherently parallel V[1], F[1] V[2], F[2] V[7], F[7] ODE dependency graph

Physical models are inherently parallel V[1], F[1] V[2], F[2] V[7], F[7] ODE dependency graph 5

GPUs • Tried, failed – GPU research group also – (results later) 6

GPUs • Tried, failed – GPU research group also – (results later) 6

FPGAs: Sw circuits (parallel) C Code for FIR Filter Circuit for FIR Filter *

FPGAs: Sw circuits (parallel) C Code for FIR Filter Circuit for FIR Filter * * * for (i=0; ii << 128; i++) yy[i] +=+= c[i] * x[i]. . . + + + Processor • 1000’s of instructions – Several thousand cycles FPGA Processor n ~ 7 cycles (though slower clock) n Speedup > 10 x-100 x 7

FPGAs “ 101” (A Quick Intro) a a 11 11 a b b 1

FPGAs “ 101” (A Quick Intro) a a 11 11 a b b 1 0 F 0 a b 00 01 10 11 1 0 G 4 x 2 Memory LUT a 1 00 1 1 a 0 01 1 0 10 1 1 11 0 0 d 1 d 0 F G 2 x 2 switch matrix SM 1 0 w x b c 0110 1100 SM 0 1 y 0 1 z SM FPGA LUT 0000 1111 SM D SM SM SM E a b c 11 10 11 00 00 10 D E 8

Differential Equation Processing Element • General PE • Diffeq can't be solved exactly •

Differential Equation Processing Element • General PE • Diffeq can't be solved exactly • Use iterative approximation (Euler, RK 4) • Computes equation solutions at given timestep (e. g. 0. 1 ms timesteps). FPGA Device under test Huang, Vahid, Givargis. A Custom FPGA Processor for Physical Model Ordinary Differential Equation Solving. Embedded Systems Letters, Dec, 2011. Digital mockup DEPE 9

Single DEPE • CPU(1), (4): Pentium IV, 3. 0 GHz • DEPE: Xilinx Virtex

Single DEPE • CPU(1), (4): Pentium IV, 3. 0 GHz • DEPE: Xilinx Virtex 6 -240 T Microblaze: 2000 -4000 LUTs. 10

Homogeneous network of general PEs • Map ODEs to homogeneous PE network • ODE

Homogeneous network of general PEs • Map ODEs to homogeneous PE network • ODE dependency graph • Scheduling V[1], F[1] PE 1 Synthesis tool V[2], F[2] PE 2 FPGA Digital mockup PE 1 V[7], F[7] PE 3 ODE dependency graph Huang, Vahid, Givargis. 2012. Synthesis of networks of custom processing elements for real-time physical system emulation. Transactions on Design Automation of Electronic Systems (TODAES). *To Appear (Dec-2012) PE 2 PE 3 100 s of PEs 11

Homogeneous network of general PEs FPGA Digital mockup 12

Homogeneous network of general PEs FPGA Digital mockup 12

Homogeneous network of general PEs • ODE mapping via simulated annealing 10 K iterations

Homogeneous network of general PEs • ODE mapping via simulated annealing 10 K iterations 150 K iterations 13

Homogeneous network of general PEs 14

Homogeneous network of general PEs 14

Homogeneous network of general PEs – FPGA Usage • 150 KLuts available on Virtex

Homogeneous network of general PEs – FPGA Usage • 150 KLuts available on Virtex 6 -240 T Demo http: //www. yo utube. com/wa tch? v=Th. UKV hqo. A 3 Q 15

Custom Processing Element • Custom PE • Custom datapath to solve specific type of

Custom Processing Element • Custom PE • Custom datapath to solve specific type of equation V’ = F 1 – F 2 Custom PE for each ODE type F’ = P 1 -P 2 -(F*CR)*CL Inputs PE Input_sel Address We Data RAM MUL Controller SUB MUL FPGA Digital mockup SUB Const ROM Address Output Huang, Vahid, Givargis. 2012. Synthesis of networks of custom processing elements for real-time physical system emulation. Transactions on Design Automation of Electronic Systems (TODAES). *To Appear (Dec-2012) 16

Custom Processing Element 17

Custom Processing Element 17

Custom Processing Element – FPGA Usage 18

Custom Processing Element – FPGA Usage 18

Networks of Heterogeneous Processing Elements • General PE: –Slow, flexible (can solve any types

Networks of Heterogeneous Processing Elements • General PE: –Slow, flexible (can solve any types of ODEs) • Custom PE: –Fast, Inflexible (only solves one type of ODEs) • Multi-Type PE –Combined multiple types of ODEs into single custom PE Huge solution space: How to choose types of PEs? FPGA Digital mockup How many PEs to allocate? How to bind ODEs to PEs? Huang, Miller, Vahid, Givargis. Synthesis of Heterogeneous Processing Elements for Physical System Emulation. CODES+ISSS 2012, Oct, 2012. 19

Automatic allocation and binding Initial random allocation Simulated Annealing ODE-to-PE mapper New PE allocation

Automatic allocation and binding Initial random allocation Simulated Annealing ODE-to-PE mapper New PE allocation Better solution Y N Best solution Cycles of each PE PE allocator 20

Networks of Heterogeneous Processing Elements 21

Networks of Heterogeneous Processing Elements 21

Heterogeneous Networks – FPGA Usage 22

Heterogeneous Networks – FPGA Usage 22

Network of PEs VS GPU and PC 1430 1490 1522 1184 Speedup vs real-time

Network of PEs VS GPU and PC 1430 1490 1522 1184 Speedup vs real-time PC(1): PC(4): GPU: HLS: General PE: Custom PE: Hetero PE: 0. 76 x 3. 07 x 1. 63 x 3. 23 x 4. 94 x 6. 1 x 34. 5 x 23

Network of general/custom/heterogeneous PEs VS HLS (regularity extraction) Performance (ms): time to emulate 1000

Network of general/custom/heterogeneous PEs VS HLS (regularity extraction) Performance (ms): time to emulate 1000 ms, using Euler with 0. 01 ms step. Heterogeneous PE: (10 x, 1. 1 x) HLS (7 x, 0. 85 x) general PE (6 x, 1. 35 x) custom PE (Speed, Size) Size (equivalent LUTs) 24

Speedup / dollar Heterogeneous PEs: 3 X better than PC(4) 4. 5 x better

Speedup / dollar Heterogeneous PEs: 3 X better than PC(4) 4. 5 x better than GPU FPGA: Easier to build custom interfaces CPU (I 7 -950 + Intel X 58 board): GPU(GTX 460 + I 3 -540 + H 55 board): FPGA (Xilinx Virtex 6 240 T-2 board): $480 $380 $1800 25

Current: Embedding-based placement of networks • Most physical models have a regular structure •

Current: Embedding-based placement of networks • Most physical models have a regular structure • Meshes, trees, grids, etc. • We can apply theoretical graph embedding techniques to embed models into FPGA • Minimal network dilation FPGA Heart cells Lungs Neuron mesh 26

Embedding-based placement of networks Eq. P 1 Eq. V 1 Eq. P 3 Eq.

Embedding-based placement of networks Eq. P 1 Eq. V 1 Eq. P 3 Eq. V 3 Eq. P 2 Eq. V 2 Eq. P 4 Eq. V 4 Eq. P 5 Eq. V 5 Eq. P 6 Eq. V 6 Eq. P 7 Eq. V 7 Physical model equations No placement strategy Map equations to virtual PEs Map virtual PEs to physical PEs via embedding Eq. P 1 Eq. V 1 Eq. P 2 Eq. V 2 Eq. P 4 Eq. V 4 Eq. P 5 Eq. V 5 Eq. P 6 Eq. V 6 Eq. P 5 Eq. V 5 Eq. P 1 Eq. V 1 Eq. P 3 Eq. V 3 Eq. P 6 Eq. V 6 Eq. P 4 Eq. V 4 Eq. P 2 Eq. V 2 Eq. P 3 Eq. V 3 Eq. P 7 Eq. V 7 Structured virtual PE graph Simulated Annealing Placement Physical placement Embedding Placement 27

Embedding-based placement of networks Not routable Work submitted to FPGA'13 (Miller/Vahid/Givargis) 28

Embedding-based placement of networks Not routable Work submitted to FPGA'13 (Miller/Vahid/Givargis) 28

Other projects • Assistive monitoring - www. cs. ucr. edu/~vahid/assistivemonitoring/ - http: //www. youtube.

Other projects • Assistive monitoring - www. cs. ucr. edu/~vahid/assistivemonitoring/ - http: //www. youtube. com/watch? feature=player_embedded&v=Sf 8 t. U-78 l. Xs – . . DesktopFall montage. mp 4 • Web-based learning – "Textbook is dead" – pcpp. zyante. com (C++) • Embedded systems educ – New prog. model, virtual lab – Also riosscheduler. org • Drunk driving (DUI) – . . Desktopdui. MOV – duicam. org 29

. . DesktopMeti ER 2. mov https: //docs. google. com/file/d/0 B 7 I 3

. . DesktopMeti ER 2. mov https: //docs. google. com/file/d/0 B 7 I 3 Pm. I 9 Qs. JTM 2 Mz. Y 2 Qy. YWQt. Zjk 4 Mi 00 YWE 0 LTk 1 Nz. Qt. ZTUw. MTM 5 ZDA 5 ZDc 5/edit • Fastest cost-effective execution of physical models • Real-time (or faster) cyberphysical system testing • Scientific research • More apps Contributors • Chen Huang (UC Riverside, now Amazon) • Bailey Miller (UC Riverside) • Prof. Tony Givargis (UC Irvine) • Ting-Shuo Chou (UC Irvine) • Others. . . 30

Key contributors • • Chen Huang (UC Riverside, now Amazon) Bailey Miller (UC Riverside)

Key contributors • • Chen Huang (UC Riverside, now Amazon) Bailey Miller (UC Riverside) Prof. Tony Givargis (UC Irvine) Ting-Shuo Chou (UC Irvine) • Others. . . 31