Cpr E Com S 583 Reconfigurable Computing Prof
Cpr. E / Com. S 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #15 – Midterm Review
Project Proposals • Group 1 – FPGA Implementation of Frequency- Domain Audio Effects Processor • Five-band equalizer • Frequency shifter October 10, 2006 Cpr. E 583 – Reconfigurable Computing Lect-15. 2
Project Proposals (cont. ) • Group 2 – Transparent FPGA-Based Network Analyzer • Layer I pass-through • Layer II passive analyzer October 10, 2006 Cpr. E 583 – Reconfigurable Computing Lect-15. 3
Project Proposals (cont. ) • Group 3 – FPGA-Based Library Design for Linear Algebra Applications • Floating-point sparse matrix -vector multiplication • Floating-point banded matrix-vector multiplication • Floating-point lower-upper matrix decomposition October 10, 2006 Cpr. E 583 – Reconfigurable Computing Lect-15. 4
Project Proposals (cont. ) • Group 4 – An Improved Approach of Configuration Compression for FPGA-Based Embedded Systems • Improved compression algorithms • LUT-reordering techniques October 10, 2006 Cpr. E 583 – Reconfigurable Computing Lect-15. 5
Project Proposals (cont. ) • Others Projects: • Group 5 – FPGA Ternary Data Conversion • Group 6 – Analysis of Sobel Edge Detection Implementations • Group 7 – Design and Analysis of Artificial Neural Networks on FPGAs • Reminders: • 11/16 – Project Updates (10 minutes) • 12/5 -12/7 – Final Presentations (25 minutes) • 12/15 – Final Reports October 10, 2006 Cpr. E 583 – Reconfigurable Computing Lect-15. 6
Midterm Review • Using the Silicon PE PE PE SSE $ FFT $ MPP More Cache CISC PE Reconfigurable Fabric PE October 10, 2006 AES $ PE Superscalar MMX $ $ Vector Reconfigurable Processor Cpr. E 583 – Reconfigurable Computing Lect-15. 7
Computational Density (Qualitative) Actel Pro. ASIC Intel Pentium 4 • FPGAs can complete more work per unit time than a processor or DSP: • Less instruction overhead • More active computation onto the same silicon area (allows for more parallelism) • Can control operations at the bit level (as opposed to word level) October 10, 2006 Cpr. E 583 – Reconfigurable Computing Lect-15. 8
Coupling in a Reconfigurable System Workstation Coprocessor CPU FU Attached Processing Unit Memory Caches Standalone Processing Unit I/O Interface • Many places to put reconfigurable computing components • Most implementations involve multiple discrete devices • How should these devices be connected together? October 10, 2006 Cpr. E 583 – Reconfigurable Computing Lect-15. 9
Generic FPGA Architecture CLB CLB CLB CLB IOB CLB CLB CLB IOB IOB CLB Configurable Logic Blocks (CLBs) IOB CLB Input/Output Buffers (IOBs) CLB Programmable interconnect mesh IOB CLB IOB IOB • IOB • FPGA = Field-Programmable Gate Array CLB Island-style FPGA architecture October 10, 2006 Cpr. E 583 – Reconfigurable Computing Lect-15. 10
FPGA Technology • Various FPGA programming technologies (Anti -fuse, (E)EPROM, Flash, SRAM): • SRAM most popular October 10, 2006 Cpr. E 583 – Reconfigurable Computing Lect-15. 11
LUTs and Digital Logic • k inputs 2 k possible input values • k-LUT corresponds to 2 k x 1 bit memory • Truth table is stored k k 2 2 • 2 possible functions – O(2 / k!) unique F = A 0 A 1 A 2 + Ā0 A 1Ā2 + Ā0 Ā1 Ā2 A 0 A 1 A 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 0 0 0 1 1 1 1 0 0 1 1 October 10, 2006 0 1 0 1 1 0 0 1 1 1 1 0 0 1 1 0 1 0 1 1 0 0 1 0 0 0 1 1 0 1 0 1 1 0 Cpr. E 583 – Reconfigurable Computing 1 1 0 0 0 1 1 0 0 0 1 1 1 1 0 0 0 0 . . 255 1 0 0 0 0 1 Lect-15. 12
Architectural Issues [Ahm. Ros 04 A] • What values of N, I, and K minimize the following parameters? • Area • Delay • Area-delay product • Assumptions • All routing wires length 4 • Fully populated IMUX • Wiring is half pass transistor, half tri-state October 10, 2006 Cpr. E 583 – Reconfigurable Computing Lect-15. 13
FPGA Arithmetic • Traditional microprocessors, DSPs, etc. don’t use LUTs • Instead use a w-bit Arithmetic and Logic Unit (ALU) • Carry connections are hard-wired • No switches, no stubs, short wires (1) A AND 2 OR 2 XOR 2 (1, 2) A October 10, 2006 Op ALU A B 3 -LUT Sum Cout / Cin B 2 -LUT 3 -LUT Out (2) (1) A (2) ADD SUB CMP B B Cin Out Cpr. E 583 – Reconfigurable Computing Cout Lect-15. 14
FPGA Arithmetic (cont. ) • Hard-wired carry logic support Altera FLEX 8000 October 10, 2006 Xilinx XCV 4000 Cpr. E 583 – Reconfigurable Computing Lect-15. 15
Arithmetic (cont. ) X 3 X 2 X 1 X 0 Y 0 • Carry save X 2 X 3 multiplication + + X 1 + Y 1 X 0 + Y 2 + + Y 3 + October 10, 2006 + + + Cpr. E 583 – Reconfigurable Computing Z 2 Z 1 Z 0 Lect-15. 16
LUT-Based Constant Multipliers 10101011 x NNNN AAAAAA + BBBBBB SSSSSSSS (N * 1011 (LSN)) (N * 1010 (MSN)) Product N 0–N 7 4 -LUT 4 -LUT 4 -LUT A 0–A 11 + N 0–N 7 4 -LUT 4 -LUT 4 -LUT S 0–S 15 B 4–B 15 • Constants can be changed in the LUTs to program new multipliers October 10, 2006 Cpr. E 583 – Reconfigurable Computing Lect-15. 17
Capacity Trends Virtex-5 550 MHz 24 M gates* Xilinx Device Complexity Virtex-II Pro 450 MHz 8 M gates* Virtex-II 450 MHz 8 M gates Virtex-E 240 MHz 4 M gates Virtex 200 MHz 1 M gates XC 4000 100 MHz 250 K gates XC 2000 50 MHz 1 K gates 1985 XC 3000 85 MHz 7. 5 K gates 1987 1991 XC 5200 50 MHz 23 K gates 1995 Spartan 80 MHz 40 K gates Virtex-4 500 MHz 16 M gates* Spartan-3 326 MHz 5 M gates Spartan-II 200 MHz 200 K gates 1998 1999 2000 2002 2003 2004 2006 Year October 10, 2006 Cpr. E 583 – Reconfigurable Computing Lect-15. 18
Splash 1 Architecture VME Bus VSB Bus Interface FIFO IN FIFO OUT Control F 3 F 2 F 1 F 0 F 31 F 30 F 29 F 28 M 3 M 2 M 1 M 0 M 31 M 30 M 29 M 28 M 4 M 5 M 6 M 7 M 24 M 25 M 26 M 27 F 4 F 5 F 6 F 7 F 24 F 25 F 26 F 27 F 11 F 10 F 9 F 8 F 23 F 22 F 21 F 20 M 11 M 10 M 9 M 8 M 23 M 22 M 21 M 20 M 12 M 13 M 14 M 15 M 16 M 17 M 18 M 19 F 12 F 13 F 14 F 15 F 16 F 17 F 18 F 19 October 10, 2006 Cpr. E 583 – Reconfigurable Computing Lect-15. 19
FPGA-based Router • FPX module contains two FPGAs • NID – network interface device • Performs data queuing • RAD – reprogrammable application device • Specialized control sequences October 10, 2006 Cpr. E 583 – Reconfigurable Computing Lect-15. 20
Mesh Topology • Chips are connected in a nearest-neighbor pattern • Simplicity is key • Linear array is essentially a 1 dimensional mesh October 10, 2006 A B C D E F G H I Cpr. E 583 – Reconfigurable Computing Lect-15. 21
Other Topologies • Crossbar topology: • Devices A-D are routing only • Gives predictable performance • Potential waste of resources for near-neighbor connections October 10, 2006 A B C D W X Y Z Cpr. E 583 – Reconfigurable Computing Lect-15. 22
Logic Emulation • Emulation takes a sizable amount of resources • Compilation time can be large due to FPGA compiles October 10, 2006 Cpr. E 583 – Reconfigurable Computing Lect-15. 23
Systolic Architectures • Goal – general methodology for mapping computations into hardware (spatial computing) structures • Composition: • Simple compute cells (e. g. add, sub, max, min) • Regular interconnect pattern • Pipelined communication between cells • I/O at boundaries x October 10, 2006 x + x c x Cpr. E 583 – Reconfigurable Computing min Lect-15. 24
Finite Impulse Response • Sequential • Memory bandwidth per output – 2 k+1 • O(k) cycles per output • O(1) hardware • Systolic • Memory bandwidth per output – 2 • O(1) cycles per output • O(k) hardware xi w 1 x + October 10, 2006 w 2 x + w 3 x w 4 + Cpr. E 583 – Reconfigurable Computing x + yi Lect-15. 25
Matrix-Vector Product t=4 a 41 a 23 a 14 – t=3 a 31 a 22 a 13 – – t=2 a 21 a 12 – – – t=1 a 11 – – x 1 x 2 x 3 x 4 October 10, 2006 … Cpr. E 583 – Reconfigurable Computing xn y 1 t=n y 2 t = n+1 y 3 t = n+2 y 4 t = n+3 Lect-15. 26
Circuit Netlist and Mapping LUT 0 LUT 4 LUT 1 FF 1 LUT 5 LUT 2 FF 2 LUT 3 October 10, 2006 Cpr. E 583 – Reconfigurable Computing Lect-15. 27
Placing and Routing FPGA Programmable Connections October 10, 2006 Cpr. E 583 – Reconfigurable Computing Lect-15. 28
Next Steps LIBRARY ieee ; USE ieee. std_logic_1164. all ; ENTITY implied IS PORT ( A, B : IN Aeq. B : OUT END implied ; • VHDL / VHDL for Synthesis STD_LOGIC ; STD_LOGIC ) ; ARCHITECTURE Behavior OF implied IS BEGIN PROCESS ( A, B ) BEGIN IF A = B THEN A Aeq. B <= '1' ; B END IF ; END PROCESS ; END Behavior ; October 10, 2006 Cpr. E 583 – Reconfigurable Computing Aeq. B Lect-15. 29
HW/SW Co-Design ARMulator ARM core simulator ARM Core Modelsim ARMulator API Comm. Buffer Socket Handler Modelsim FLI SOCKET #1 Cache ARM Local Memory October 10, 2006 Mem. Access Socket Handler SOCKET #2 AHB Slave I/F AHB Master I/F HDL simulator AHB Slave I/F A M B A AHB Slave I/F Cpr. E 583 – Reconfigurable Computing Comm. Buffer AHB Master I/F ASIC / FPGA Shared Memory Lect-15. 30
Multi-Context FPGAs October 10, 2006 Cpr. E 583 – Reconfigurable Computing Lect-15. 31
Function Unit Architectures • Ra. Pi. D: Reconfigurable Pipelined Datapath • Linear array of function units • Function type determined by application • Function units are connected together as needed using segmented buses • Data enters the pipeline via input streams and exits via output streams October 10, 2006 Cpr. E 583 – Reconfigurable Computing Lect-15. 32
High-Level Compilation C Program SUIF frontend Directives and Automation C Libraries on various Targets HW / SW Partitioner C to RTL VHDL/Verilog GCC compiler for embedded VHDL to FPGA Synthesis VHDL to ASIC Synthesis Object code for Embedded (SA) Binaries for FPGAs (Xilinx) Chip layouts (0. 18 u TSMC) SUIF to GCC October 10, 2006 Cpr. E 583 – Reconfigurable Computing Lect-15. 33
Other Topics? • Second course survey next week • Provide general feedback, suggest additional topics October 10, 2006 Cpr. E 583 – Reconfigurable Computing Lect-15. 34
Midterm Exam • Three questions • Review • Analysis • Extension • Any paper mentioned in class is fair game • Due in 48 hours (10/12 – 2: 00 pm) • No class on Thursday! • Some restrictions: • Work alone • Can ask if something is unclear (“what does this mean? ” questions, not “how do I do this? ” questions) • No late submissions – strict Web. CT deadline October 10, 2006 Cpr. E 583 – Reconfigurable Computing Lect-15. 35
- Slides: 35