35 1 Reconfigurable Computing What Why and Implications

35. 1 Reconfigurable Computing: What, Why, and Implications for Design Automation André De. Hon and John Wawrzynek June 23, 1999 BRASS Project University of California at Berkeley www. cs. berkeley. edu/projects/brass

Outline v Traditional Hardware vs. Software v Characteristics of reconfigurable (RC) arrays v Hybrid: Mixing and Matching v Opportunities for Design Automation

Traditional Choice: Hardware vs. Software v Hardware fast u u u “spatial execution” fine-grained parallelism no parasitic connections v Hardware compact u u u operators tailored to function simple control direct wire connections between operators But fixed!

Traditional Choice: Hardware vs. Software v Software Slow u u sequential execution overhead time “interpreting” operations v Software Inefficient Area u u u fixed width operators, may not match problem general operators, bigger than required area to store instructions, control execution But Flexible!

Reconfigurable Hardware v RC Hardware Fast u u spatial parallelism like hardware problem specific operators, control v RC Hardware Flexible u operators and interconnect programmable like software

Reconfigurable Hardware v Flexibility comes at a cost: area in: ² switches ² configuration u delay in: ² switches (added resistance) ² logic (more spread out) ² modifying configuration (traditionally) u Challenging “compiler” target u

New Design Space

Important Distinction v Instruction Binding Time u When do we decide what operation needs to be performed? v General Principle u Earlier the decision is bound, the less area & delay required for the implementation.

Reconfigurable Advantage ÊExploit cases where operation can be bound and then reused a large number of times. ËCustomization of operator type, width, and interconnect. ÌFlexible low overhead exploitation of application parallelism.

Specialization v Late binding of operations u exploit cases where data can be “wired” into computation u narrows the performance gap between custom hardware and reconfigurable implementation u Example: Multiplication

Runtime Reconfiguration* v Data-driven customization u ex: MPEG encode with partial reconfiguration between (I, P, B) frame types (every 33 ms) v Hardware Virtualization u demand paging, like virtual memory v Dynamic specialization u ex: bind program variables on loop entry *FPGAs poor at supporting this. All very experimental.

Programmable Device Space v Two important variables: u“instruction” or context depth uoperator word width w op op

Programmable Application Space Yield FPGA (c=w=1) “Processor” (c=1024, w=64) v Bit-level, reconfigurable organization is complimentary to processors

Case for Hybrid Architectures v In general, applications have a mix of word sizes and binding times v …and even a mix of fixed and variable processing requirements v Previous slide suggests no single architecture robust across entire space Ü Need heterogenous components to best

Heterogenous Architecture

Design Automation Opportunities v Currently, a limiter to the advancement of this technology is the state of the software flow. v The ideal is HLL compilation with short compile/debug cycle. u Must combine elements of parallizing compilers, ²thread- and ILP-level parallelism extraction u with elements of hardware/software co-design, ²partitioning of “circuits” for RC array from “software” for processor ²coordination of memory accesses

Design Automation Opportunities u and elements of FPGA and ASIC CAD. ²low-level spatial mapping (PPR) ²more importance on pipelining/retiming ²fixed resource constraints: wire tracks, memory/compute ratio preallocated v Flexible nature of the RC array encourages other optimizations: u specialization of circuit instances around early bound data u fast, online algorithms to support run-time specialization

Design Automation Opportunities v Most importantly, the tools must run fast u development requirements similar to software only environment u need to better understand tool quality/time tradeoff v Short of complete integrated HLL compilation u “hand partitioning” between processor and RC array u combined FPGA flow with HLL u library based approach

Summary v Reconfigurable architectures u spatial computing style like hardware u programmable like software u more computation per unit area than processors u efficient where processors are inefficient v Heterogenous architectures (mix processors, reconfigurable, custom) u “general-purpose” and “application-targeted” processing components v Exploiting these architectures: new opportunities for DA optimization.

Extra Slides

Brief History v 1960: Estrin (UCLA) “fixed plus variable structure computer” v 1980’s: Researchers using FPGAs reports “Supercomputer level performance at orders of magnitude lower costs” v Mid 1990’s: DARPA invests $100 M in “Adaptive Computing” v Late 1990’s: 6 startup companies doing “Reconfigurable Computing”

Why the fuss now? v The Promise: “Programmability of microprocessors with performance of ASICs” u Programmability key for: ²standard (low cost) components ²shorter time to market ²adapting to changing standards ²adaptability within a given application v Technology pull: u greater processing capacity per IC u higher costs, fewer new designs u SOC benefits from on-chip flexibility

Application Successes v Research >10 x performance density advantage over microprocessors and DSPs u Pattern matching u Data encryption u Data compression u Video and image processing v Commercial Push u telecom switches u network routers u mobile phones

Programmable Design Space v Variable Effects: uoperator instruction depth can be order of magnitude density difference uoperator word width can be order of magnitude in yielded density difference ² consider narrow (bit) data on wide word architecture op op

Programmable Design Space Density v Small slice of space v 100 density across v Large difference in peak densities u large design space!
- Slides: 25