Rapid Exploration of Accelerator Rich Architectures Automation from















![PARADE: Platform for Accelerator-Rich Architectural Design & Exploration [ICCAD 15] extended gem 5 (Mc. PARADE: Platform for Accelerator-Rich Architectural Design & Exploration [ICCAD 15] extended gem 5 (Mc.](https://slidetodoc.com/presentation_image_h2/94a2faa1be19708cb48f1ea21b0eab2c/image-16.jpg)

![Contributions WIICA: Accelerator Workload Characterization [ISPASS’ 13] Big Cores Mach. Suite: Accelerator Benchmark Suite Contributions WIICA: Accelerator Workload Characterization [ISPASS’ 13] Big Cores Mach. Suite: Accelerator Benchmark Suite](https://slidetodoc.com/presentation_image_h2/94a2faa1be19708cb48f1ea21b0eab2c/image-18.jpg)

- Slides: 19
Rapid Exploration of Accelerator. Rich Architectures: Automation from Concept to Prototyping David Brooks, Jason Cong, Zhenman Fang, Yakun Sophia Shao, and Sam Xi Harvard University & UCLA
Tutorial Outline Time Topic Speaker 8: 30 am – 9: 00 am Accelerator Research Infrastructure Overview Sophia Shao 9: 00 am – 9: 30 am Aladdin: Accelerator Pre-RTL Modeling Sophia Shao 9: 30 am – 10: 00 am Rapid Hardware Specialization with HLS: Glass Half Full Prof. Zhiru Zhang 10: 00 am – 10: 30 am PARADE: HLS-Based Accelerator-Rich Architecture Simulation Zhenman Fang 10: 30 am – 11: 00 am Break 11: 00 am – 11: 30 am gem 5 -Aladdin: Accelerator System Co-Design Sam Xi 11: 30 am – 12: 00 pm ARAPrototyper: FPGA Prototyping Zhenman Fang 12: 00 pm – 13: 30 pm Lunch 13: 30 pm – 14: 00 pm Virtual Machine Setup Sophia Shao & Sam Xi 14: 00 pm – 14: 30 pm Hands-on: Accelerator Design Space Exploration using Aladdin Sophia Shao 14: 30 pm – 15: 00 pm Hands-on: So. C Design Space Exploration using gem 5 -Aladdin Sam Xi
Moore’s Law 3
CMOS Scaling is Slowing Down 180 nm 130 nm 90 nm 65 nm 45 nm 32 nm 22 nm 14 nm 10 nm http: //www. anandtech. com/show/9447/intel-10 nm-and-kaby-lake 4
CMOS Technology Scaling Technological Fallow Period 5
Potential for Specialized Architectures 16 Encryption 17 Hearing Aid 18 FIR for disk read 19 MPEG Encoder 20 802. 11 Baseband [Zhang and Brodersen] 6
Cores, GPUs, and Accelerators: Apple A 8 So. C Out-of-Core Accelerators 7
Cores, GPUs, and Accelerators: Apple A 8 So. C Out-of-Core Accelerators 8
Cores, GPUs, and Accelerators: Apple A 8 So. C Out-of-Core Accelerators Maltiel Consulting estimates 9 Our estimates
Challenges in Accelerators • Flexibility – Fixed-function accelerators are only designed for the target applications. • Programmability – Today’s accelerators are explicitly managed by programmers. 10
Today’s So. C OMAP 4 So. C 11
Today’s So. C ARM Audio DSP Cores Video DSP Face Imaging GPU DMA USB SD System Bus USB DMA Secondary Bus OMAP 4 So. C Secondary Bus Tertiary Bus 12
Challenges in Accelerators • Flexibility – Fixed-function accelerators are only designed for the target applications. • Programmability – Today’s accelerators are explicitly managed by programmers. • Design Cost – Accelerator (and RTL) implementation is inherently tedious and time-consuming. 13
Today’s So. C CPU Buses Mem Interface GPU/ DSP Acc Acc Acc 14
Future Accelerator-Centric Architectures Big Cores GPU/DS P Small Cores Shared Resources Memory Interface Sea of Fine-Grained Accelerators How to decompose applications into accelerators? How to rapidly design lots of accelerators? How to design and manage the shared resources? 15 Flexibility Design Cost Programmability
PARADE: Platform for Accelerator-Rich Architectural Design & Exploration [ICCAD 15] extended gem 5 (Mc. PAT) for X 86 CPU, with OS auto-generated accelerators based on HLS (Auto. Pilot) added SPM, DMA, GAM & TLB model extended Garnet (DSENT) for No. C extended Ruby (CACTI) for coherent cache hierarchy gem 5 memory model [ISPASS 14]
ARAPrototyper: Prototyping an ARA on FPGA – Using Xilinx Zynq So. C (FPGA fabrics + ARM) • Major components of an ARA – General processor cores – A sea of heterogeneous accelerators – Memory system + interconnects (No. C)
Contributions WIICA: Accelerator Workload Characterization [ISPASS’ 13] Big Cores Mach. Suite: Accelerator Benchmark Suite [IISWC’ 14] Small Cores Shared Resources GPU/DSP Aladdin: Accelerator Pre-RTL, Power-Performance Simulator [ISCA’ 14, Top. Picks’ 15] Memory Interface Sea of Fine-Grained Accelerators Accelerator Design w/ High-Level Synthesis [ISLPED’ 13_1] gem 5 -Aladdin: Accelerator-System Co-Design [MICRO’ 16] 18
Tutorial Outline Time Topic Speaker 8: 30 am – 9: 00 am Accelerator Research Infrastructure Overview Sophia Shao 9: 00 am – 9: 30 am Aladdin: Accelerator Pre-RTL Modeling Sophia Shao 9: 30 am – 10: 00 am Rapid Hardware Specialization with HLS: Glass Half Full Prof. Zhiru Zhang 10: 00 am – 10: 30 am PARADE: HLS-Based Accelerator-Rich Architecture Simulation Zhenman Fang 10: 30 am – 11: 00 am Break 11: 00 am – 11: 30 am gem 5 -Aladdin: Accelerator System Co-Design Sam Xi 11: 30 am – 12: 00 pm ARAPrototyper: FPGA Prototyping Zhenman Fang 12: 00 pm – 13: 30 pm Lunch 13: 30 pm – 14: 00 pm Virtual Machine Setup Sophia Shao & Sam Xi 14: 00 pm – 14: 30 pm Hands-on: Accelerator Design Space Exploration using Aladdin Sophia Shao 14: 30 pm – 15: 00 pm Hands-on: So. C Design Space Exploration using gem 5 -Aladdin Sam Xi