Customizable DomainSpecific Computing Proposal for NSF Expedition in
Customizable Domain-Specific Computing Proposal for NSF “Expedition in Computing” Program Point of Contact: Prof. Jason Cong cong@cs. ucla. edu Participating Universities: UCLA (lead), Rice, Ohio-State, and UC Santa Barbara (Complete list of PI/Co-PI available inside) 1
Focus: Power/Energy Efficient Computation Current Solution: Parallelization Source: Shekhar Borkar, Intel 3
Our Proposal: Beyond Parallelization – Customizable Domain-Specific Computing Parallelization Customization Adapt the architecture to applicatio Source: Shekhar Borkar, Intel 4
Motivation and Vision u A few facts § We have sufficient computing power for most applications § Each user/enterprise need high computing power for only limited tasks in his/her application-domain § Application-specific integrated circuits (ASIC) can lead to 10, 000 x+ better power performance efficiency, but too expensive to design and manufacture u Our vision and approach § A general, customizable platform for the given domain(s) • Can be customized to a wide-range of applications in the domain with novel • • compilation and runtime systems Can be massively produced with cost efficiency Can be programmed efficiently u Goal: A “supercomputer-in-a-box” with 100 x performance/power improvement via customization for the intended domain(s) u Analogy: Advance of civilization via specialization/customization 5
Application Domains: Medical Image Processing & Hemodynamic Simulation u Medical imaging has transformed healthcare § An in vivo method for understanding disease development and patient condition § Estimated to be $100 billion/year § More powerful & efficient computation can help • Fewer exposure using compressive sensing with lower sampling frequency • Better clinical assessment using improved registration and segmentation algorithms to provide quantitative measures of disease (e. g. , cancer) u Magnetic resonance (MR) angiography of an aneurysm Hemodynamic simulation § Very useful for surgical procedures involving blood flow and vasculature u Both may take hours to days to construct Intracranial aneurysm reconstruction with hemodynamics 6
reconstruction denoising total variational algorithm fluid registratio n analysis segmentation compressive sensing registration Application Domains: Medical Image Processing Pipeline level set methods Navier-Stokes equations 7
analysis segmentation registration denoising reconstruction Application Domains: Medical Image Processing Pipeline • These algorithms have diverse computation & communication patterns • A single, homogeneous system cannot perform very well on all of these algorithms • Need architecture customization and hardware-software cooptimization • Include many common computation kernels (“motifs”) • Applicable to other domains compressive iterative, local or global communication dense and sparse linear algebra, optimization Bi-harmonic registration (Using the samesensing methods algorithm on all platforms) CPU (Xenon 2. 0 GHz) GPU (Tesla C 1060) FPGA (xc 4 vlx 100) 1 x highly parallel, local 11 x total variational Non-iterative, & global 93 x ~100 W ~5 W communication algorithm ~150 W sparse linear algebra, structured grid, optimization methods 3 D median filter: For each voxel, compute the median of the 3 x 3 neighboring voxels fluid parallel, global communication dense algebra, optimization GPU (Tesla CPUlinear (Xenon 2. 0 GHz) C 1060) methods Quick select 1 x ~100 W Median of medians registratio n FPGA (xc 4 vlx 100) Bit-by-bit majority voting 1200 x level set ~3 W local communication ~140 W dense linear algebra, spectral methods, methods Map. Reduce 70 x Navier-Stokes local communication sparse linear algebra, n-body methods, equations graphical 8 models
Overview of the Proposed Research Customizable Heterogeneous Platform (CHP) Domain-specific-modeling (healthcare applications) Ap pl Reconfigurable RF-I bus Reconfigurable optical bus Transceiver/receiver Optical interface on ain ati m iz Do ter c ra a ch Architecture modeling ica tio n m od eli ng CHP creation Customizable computing engines Customizable interconnects CHP mapping Source-to-source CHP mapper Reconfiguring & optimizing backend Adaptive runtime Design once Invoke many times 9
CHP Creation – Design Space Exploration Core parameters § Frequency & voltage § Datapath bit width § Instruction window size § Issue width § Cache size & configuration § Register file organization § # of thread contexts § … No. C parameters § Interconnect topology § # of virtual channels § Routing policy § Link bandwidth § Router pipeline depth § Number of RF-I enabled routers § RF-I channel and bandwidth allocation § … Custom instructions & accelerators § § § Amount of programmable fabric Shared vs. private accelerators Custom instruction selection Choice of accelerators … Customizable Heterogeneous Platform (CHP) $ $ Fixed Core Custom Core Prog Fabric Reconfigurable RF-I bus Reconfigurable optical bus Transceiver/receiver Optical interface Key questions: Optimal trade-off of efficiency & customizability Which options to fix at CHP creation? Which to be set by CHP mapper? 10
CHP Mapping – Compilation and Runtime Software Systems for Customization Efficient compiler and runtime support to map domain-specification to customizable har Adapt the CHP to a given application for drastic performance/power efficiency improvement Domain-specific applications Abstract execution Application characteristics CHP architecture models Programmer Domain-specific programming model (Domain-specific coordination graph and domain-specific language extensions) Source-to source CHP Mapper Analysis annotation s C/C++ code C/C++ front-end RTL Synthesize r (x. Pilot) Reconfiguring and optimizing back-end Binary code for fixed & customized cores C/System. C behavioral spec Customized target code Performanc e feedback RTL for prog fabric Adaptive runtime Lightweight threads and adaptive configuration CHP architectural prototypes (CHP hardware testbeds, CHP simulation testbed, full CHP) 11
Center for Domain-Specific Computing (CDSC) Organization A diversified & highly accomplished team: 8 in CS&E; 1 in EE; 2 in medical school; 1 in ap Aberle Baraniuk Bui UCLA Domain-specific modeling Chan g Rice Bui, Reinman, Sarkar, Potkonjak Baraniuk CHP creation Chang, Cong, Reinman CHP mapping Cong, Palsberg, Cheng UCSB Cong (Director) Ohio State Sadayappan Cheng Sarkar Cheng Sadayappan All Potkonjak Application modeling Aberle, Bui, Vese Baraniuk Experimental systems All (led by Cong & Bui) All Palsber g Potkonjak Reinma n Sadayappa n Sarkar (Associate Ves e 12
Milestones Year 1 Year 2 Year 3 Year 4 Year 5 Applicatio n modeling Form benchmark sets in medical imaging and hemodynamic & establish baseline results Demonstration of benchmark sets on Prototype 1 a Model the benchmark sets on DSCG & DSLE and drive the CHP optimizations Demonstration of benchmark sets on optimized CHP runtime environment Evaluation of benchmark on final CHP and quantify the impact on real world clinical data Domainspecificati on Develop Domain Specific Coordination Graph (DSCG) with abstract metrics Implementation of DSCG+DSLE executable models for benchmark sets; Refinement of DSCG+DSLE executable models for benchmark sets Public release of DSCG infrastructure and the DSCG+DSLE executable models for benchmark sets CHP creation CHP hierarchical imulation Infrastructure CHP initial designspace tuning; Domain- specific component synthesis & selection CHP designspace exploration with full system simulation System integration CHP mapping Source-to-source CHP mapper for Prototype 1 a, Identification of abstract execution metrics to guide CHP exploration Fine-grained task scheduling system with locality and load balance adaptations Design of software reliability components Refinement of CHP designspace exploration with detailed simulation Reconfiguring and optimizing back-end transformations; Phase-based adoptions in adaptive runtime Demonstration of the full CHP mapping system on Prototypes 1 a & 2 Support of software reliability 14
Milestones for Experimental Platforms u Prototype 1 a: Heterogeneous integration of off-the-shelf CMPs + GPUs + FPGAs, e. g. , u Intel Xeon CPU + Xilinx V 5 FPGA (via FSB) + Nvidia Tesla GPU (via PCI-express 2. 0) u Initial HW platform for CHP compilation and runtime system development u Prototype 1 b: RF-interconnect prototype u RF-I implementation at 45 nm CMOS with multiple digital cores/traffic generators u Performance, u power, and reliability study RF-I tape-out at IBM 90 nm CMOS Prototype 2: final CHP implementation for the proposed healthcare domains u Single-chip integration or 3 D integration 15
Integrated Research and Education u New courses planned based on the research § “Architecture and Compilation for Domain-specific Computing” § “Computational Techniques for Medical Imaging” § “Programming Models and Application Development for Domainspecific Computing” • With projects for new domain, e. g. , scientific computing, VLSI CAD, and digital entertainment § May be jointly taught (multi-disciplinary) § Developed and shared via Connexions (cnx. org), an open-access education platform now with over 1 M users/month (based at Rice) u Graduate student training § Estimated around 18 students in total in four campuses § Seminars and workshops on interdisciplinary research, career development, ethics, entrepreneurship … u Undergraduate student training § 10 summer research fellowship each year, via UCLA FOCUS, Rice AGEP and similar programs u Outreach to high-school students 16
Outreach Partner: Frontier Opportunities in Computing for Underrepresented Students (FOCUS) u Aims to increase the number of under-represented minorities interested in computing disciplines u Currently has 50 underrepresented undergraduates: § 23 in CS § 27 in CSE u 2007 summer research poster competition http: //ceed. ucla. edu The first prize winner 17
Outreach Partner: Science Mathematics Achievement and Research Technology for Students (SMARTS) u A six-week summer college preparation program at UCLA § Engage underrepresented students in science, technology, engineering and math training u SMARTS activities § Course related activities • Math courses (Intro to Statistics and AP Calculus Readiness) • SAT preparation § Research activities u Will have CDSC faculty and graduate students involved to serve as mentors and provide projects u This year, SMARTS program has over 80 applicants § 30 -35 will be admitted (due to limitation of funding) 18
Knowledge Transfer u Main outcome of the project 1. CHP prototypes 2. Compilation and runtime system for CHP mapping 3. Application drivers – original source code & modified code with domainspecific modeling 4. General methodology for customizable computing (mainly through publications) #1 – 3 will be shared with the research community via web as they become available u Industrial partners § Altera, IBM, Intel, Magma, Mentor Graphics, Nvidia, Xilinx § More will be contacted and included if the project is officially funded u Campus partners § UCLA Institute of Digital Research and Education (IDRE) § Institute of Pure and Applied Mathematics (IPAM) § UCLA Wireless Health Institute (WHI) u Technology transfer experience § Impact via industrial partners: IBM, Intel, Xilinx … § Startups: Aplus (acquired by Magma in 2003), Auto. ESL (Magma and Xilinx were 19 investors)
Why an Expedition u Address a fundamental problem – energy efficient computing § What’s beyond parallelization? § Our proposal – a transformative approach using customization u Many challenging research topics § Domain-specific modeling/specification § Novel architecture & microarchitecture for customization § Compilation and runtime software to support intelligent customization § New research in testing, verification, reliability, etc in customizable computing u Integrated effort in modeling, HW, SW, & application development u Demonstration in a critical application domain § Healthcare has a significant impact to economy and society 20
- Slides: 18