A High Performance Application Representation for Reconfigurable Systems

A High Performance Application Representation for Reconfigurable Systems Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer Engineering University of California Santa Barbara, CA 93106 -9560 {gong, wanggang, kastner}@ece. ucsb. edu http: //express. ece. ucsb. edu 6/21/2004 GONG et al: A High Performance June 22, 2004 Application Representation for Reconfigurable Systems

Outline Reconfigurable computing systems v Compilation process v Synthesizing to hardware v Experimental results v Concluding remarks v GONG et al: A High Performance Application Representation for Reconfigurable Systems 2 6/21/2004

Outline v Reconfigurable computing systems v Challenges of application representations v Compilation process v Synthesizing to hardware v Experimental results v Concluding remarks v GONG et al: A High Performance Application Representation for Reconfigurable Systems 3 6/21/2004

Reconfigurable Computing Systems Standard programmable platforms v Post-manufacturing customization v Designs shift from physical chips to configuration files v v A software design flow Feature hardware speed with software flexibility v Enable higher productivity v GONG et al: A High Performance Application Representation for Reconfigurable Systems 4 6/21/2004

Application Representations A common application representation is needed to tame the complexity of system synthesis v Requirements v Able to generate software code for microprocessors v Able to be easily translate to hardware configuration files v Allow a variety of transformations and optimizations to exploit the performance v GONG et al: A High Performance Application Representation for Reconfigurable Systems 5 6/21/2004

Parallelism Exploration v Fine grain parallelism Multiple functional units v Issuing an operation to a free functional units v Operations executed independently v v Coarse grain parallelism Executing multiple threads v With occasional synchronization v v Reconfigurable computing systems support both fine and coarse grain parallelism GONG et al: A High Performance Application Representation for Reconfigurable Systems 6 6/21/2004

PDG + SSA The PDG + SSA representation can be used for both hardware synthesis and software generation v The PDG and SSA forms are common representations for software generation v Here we concentrate on hardware synthesis v GONG et al: A High Performance Application Representation for Reconfigurable Systems 7 6/21/2004

Outline Reconfigurable computing systems v Compilation process v Overview v Constructing the PDG v Incorporating the SSA form v Synthesizing to hardware v Experimental results v Concluding remarks v GONG et al: A High Performance Application Representation for Reconfigurable Systems 8 6/21/2004

Overview GONG et al: A High Performance Application Representation for Reconfigurable Systems 9 6/21/2004

Program Dependence Graph v PDG: Program Dependence Graph v v v ENTRY node: the root node of a PDG PREDICATE nodes: producing predicate values from expressions v Diamond-shaped nodes 2, 3, and 4 STATEMENTS nodes: a arbitrary set of operations v Circle nodes: 1, 4, 6, 7, and 8 REGION nodes: summarizing all operations with the same control conditions together. v House-shaped nodes R 2, R 3, R 4 … v R 3: the predicate value of 2 is True Edges represent dependencies GONG et al: A High Performance Application Representation for Reconfigurable Systems 10 6/21/2004

Constructing the PDG from the CDFG var = pred; (i = 0; i based < len; v for Implemented on ++i) Ferrante’s { v Using tree valpost-dominate += diff; if (val > 32767) val = 32767; else if (val < -32768) val = -32768; } return val; algorithm GONG et al: A High Performance Application Representation for Reconfigurable Systems 11 6/21/2004

Constructing the PDG (cont’d) GONG et al: A High Performance Application Representation for Reconfigurable Systems 12 6/21/2004

The Static Single Assignment Form v v v Each variable has exactly one assignment A variable is referenced always using the same name At joint points of control conditions, special Ø nodes are inserted. val += diff; if (val > 32767) val = 32767; else if (val < -32768) val = -32768; val_2 = val_1 + diff; if (val_2 > 32767) val_3 = 32767; else if (val_2 < -32768) val_4 = -32768; val_5 = phi(val_2, val_3, val_4); GONG et al: A High Performance Application Representation for Reconfigurable Systems 13 6/21/2004

Extending the PDG with Ø-Nodes GONG et al: A High Performance Application Representation for Reconfigurable Systems 14 6/21/2004

The Program Representation v Loop independent Ø-nodes v v v taking two or more input values and a predicate value committing one of the inputs depending on this predicate Loop carried Ø-nodes v v v Input: the initial value, the loop-carried value, and also a predicate value Outputs: one to the iteration body, and the other to the loop exit Directing proper values to proper outputs. GONG et al: A High Performance Application Representation for Reconfigurable Systems 15 6/21/2004

Outline Reconfigurable computing systems v Compilation process v Synthesizing to hardware v Data-path elements v Ø-nodes v Experimental results v Concluding remarks v GONG et al: A High Performance Application Representation for Reconfigurable Systems 16 6/21/2004

v Synthesizing the Data-Path v Different resource allocation and binding algorithms can be used (on-going A one-to-one mapping is used work) v Each operation has an operator and several operands v v Operands are synthesized directly to wires in the circuit v Each variable in the SSA form has only one definition point PREDICATE nodes: synthesized to Boolean logic signals to control next-stage transitions and direct multiplexers to commit the correct value. GONG et al: A High Performance Application Representation for Reconfigurable Systems 17 6/21/2004

Synthesizing Ø-nodes v v A loop-independent Ø-nodes are synthesized to a multiplexer. The multiplexer selects input values depending on the predicate values. For a loop carried Ø-node, an additional switch is generated to direct the loop-exiting values GONG et al: A High Performance Application Representation for Reconfigurable Systems 18 6/21/2004

Synthesize to Hardware v Simplifications and optimizations v v v Removing unnecessary control dependencies Cascading/ expanding multipliers obtain better performance Flip-flops are inserted v Guarantee that correct values will available no matter which execution path is taken GONG et al: A High Performance Application Representation for Reconfigurable Systems 19 6/21/2004

Outline Reconfigurable computing systems v Compilation process v Synthesizing to hardware v Experimental results v Setup and benchmarks v Results v v Concluding remarks GONG et al: A High Performance Application Representation for Reconfigurable Systems 20 6/21/2004

Setup and Benchmarks v Benchmark suites Functions from the Media. Bench suite v Profiled using sample data v Only report conservative results v v Estimated execution time Aggressive predicated execution v Only report conservative results v v Area One-to-one mapping without resource sharing v Reported in numbers of FPGA slices v GONG et al: A High Performance Application Representation for Reconfigurable Systems 21 6/21/2004

Estimated Execution Time GONG et al: A High Performance Application Representation for Reconfigurable Systems 22 6/21/2004

Estimated Execution Time (cont’d) GONG et al: A High Performance Application Representation for Reconfigurable Systems 23 6/21/2004

Estimated FPGA Area GONG et al: A High Performance Application Representation for Reconfigurable Systems 24 6/21/2004

Outline Reconfigurable computing systems v Compilation process v Synthesizing to hardware v Experimental results v Concluding remarks v v On-going/future work GONG et al: A High Performance Application Representation for Reconfigurable Systems 25 6/21/2004

Concluding Remarks The PDG+SSA form supports a variety of transformations and enables both coarse and fine grain parallelism v A method to synthesize this form to hardware v This form gives faster execution time using similar area when compared with CFG and PSSA forms v GONG et al: A High Performance Application Representation for Reconfigurable Systems 26 6/21/2004

On-going/Future work Investigate transformations to create coarse grained parallelism using the PDG+SSA form v Augment the PDG+SSA form with architectural information to provide fast estimation. v Integrate of resource sharing and other architectural synthesis techniques v GONG et al: A High Performance Application Representation for Reconfigurable Systems 27 6/21/2004

Thank You Prof Ryan Kastner and Gang Wang v All audiences v GONG et al: A High Performance Application Representation for Reconfigurable Systems 28 6/21/2004

Questions GONG et al: A High Performance Application Representation for Reconfigurable Systems 29 6/21/2004
- Slides: 29