CPRE 583 Reconfigurable Computing Lecture 3 Wed 922009
CPRE 583 Reconfigurable Computing Lecture 3: Wed 9/2/2009 (Reconfigurable Computing Architectures, VHDL Overview 3) Instructor: Dr. Phillip Jones (phjones@iastate. edu) Reconfigurable Computing Laboratory Iowa State University Ames, Iowa, USA http: //class. ece. iastate. edu/cpre 583/ 1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Overview • Reinforce some common questions • Finish Chapter 1 Lecture • Continue Chapter 2 • VHDL review 2 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Common Questions • How does an FPGA work? • How does VHDL execute on an FPGA? • How many LUT on the classes FPGA? 44, 000 • State machines will be cover more next lecture • Final Project group selection: choose your own groups • Class machine resources – Coover 2048, 1212; Coover 2041 ML 507 (will be 2) – Distance students xilinx. ece. iastate. edu (other servers on the way) 3 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
What you should learn • Basic trade-offs associated with different aspects of a Reconfigurable Architecture. (Chapter 2) • Practice with timing diagrams, start state machines 4 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Reconfigurable Architectures • Main Idea Chapter 2’s author wants to convey – Applications often have one or more small computationally intense regions of code (kernels) – Can these kernels be sped up using dedicated hardware? – Different kernels have different needs. How does a kernels requirements guide design decisions when implementing a Reconfigurable Architecture? 5 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Reconfigurable Architectures • Forces that drive a Reconfigurable Architecture – Price • Mass production 100 K to millions • Experimental 1 to 10’s – Granularity of reconfiguration • Fine grain • Course Grain – Degree of system integration/coupling • Tightly • Loosely All are a function of the application that will run on the Architecture 6 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Example Points in (Price, Granularity, Coupling) Space $1 M’s Exec Int float RFU Store Decode Intel / AMD Processor Price Coupling $100’s Loose Coarse Tight PC Ethernet Granularity ML 507 Fine 7 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
What’s the point of a Reconfigurable Architecture • Performance metrics – Computational • Throughput • Latency – Power • Total power dissipation • Thermal – Reliability • Recovery from faults Increase application performance! 8 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Typical Approach for Increasing Performance • Application/algorithm implemented in software – Often easier to write an application in software • Profile application (e. g. gprof) – Determine where the application is spending its time • Identify kernels of interest – e. g. application spends 90% of its time in function matrix_multiply() • Design custom hardware/instruction to accelerate kernel(s) – Analysis to kernel to determine how to extract fine/coarse grain parallelism (does any parallelism even exist? ) Amdahl’s Law! 9 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Amdahl’s Law: Example • Application My_app – Running time: 100 seconds – Spends 90 seconds in matrix_mul() • What is the maximum possible speed up of My_app if I place matrix_mul() in hardware? 10 seconds = 10 x faster • What if the original My_app spends 99 seconds in matrx_mul()? 1 seconds = 100 x faster Good recent FPGA paper that illustrates increasing an algorithm’s performance with Hardware “NOVEL FPGA BASED HAAR CLASSIFIER FACE DETECTION ALGORITHM ACCELERATION”, FPL 2008 http: //class. ece. iastate. edu/cpre 583/papers/Shih-Lien_Lu_FPL 2008. pdf 10 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Reconfigurable Architectures • RPF -> VIC (short slide) 11 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity 12 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: Coarse Grain • r. DPA: reconfigurable Data Path Array • Function Units with programmable interconnects Example ALU ALU ALU 13 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: Coarse Grain • r. DPA: reconfigurable Data Path Array • Function Units with programmable interconnects Example ALU ALU ALU 14 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: Coarse Grain • r. DPA: reconfigurable Data Path Array • Function Units with programmable interconnects Example ALU ALU ALU 15 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: Fine Grain • FPGA: Field Programmable Gate Array • Sea of general purpose logic gates CLB CLB CLB CLB Configurable Logic Block 16 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: Fine Grain • FPGA: Field Programmable Gate Array • Sea of general purpose logic gates CLB CLB CLB CLB Configurable Logic Block 17 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: Fine Grain • FPGA: Field Programmable Gate Array • Sea of general purpose logic gates CLB CLB Configurable Logic Block CLB CLB CLB 18 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: Trade-offs associated with LUT size Example: 2 -LUT (4=2 x 2 bits) vs. 10 -LUT (1024=32 x 32 bits) 1024 -bits 2 -LUT 10 -LUT Microprocessor 1024 -bits 19 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: Trade-offs associated with LUT size Example: 2 -LUT (4=2 x 2 bits) vs. 10 -LUT (1024=32 x 32 bits) 1024 -bits Microprocessor op A B 2 -LUT 4 3 3 10 -LUT 3 1024 -bits 20 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: Trade-offs associated with LUT size Example: 2 -LUT (4=2 x 2 bits) vs. 10 -LUT (1024=32 x 32 bits) 1024 -bits Microprocessor op A B 2 -LUT 4 3 3 10 -LUT 3 op A B op 4 A B 3 4 3 3 1024 -bits 3 3 3 21 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: Trade-offs associated with LUT size Example: 2 -LUT (4=2 x 2 bits) vs. 10 -LUT (1024=32 x 32 bits) 1024 -bits Microprocessor op A B 2 -LUT 4 3 op A B op 4 A B 3 3 10 -LUT 3 3 1024 -bits 3 3 22 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: Trade-offs associated with LUT size Example: 2 -LUT (4=2 x 2 bits) vs. 10 -LUT (1024=32 x 32 bits) 1024 -bits Microprocessor op A B 2 -LUT 4 3 1024 -bits 4 op A B 10 -LUT 3 3 3 4 3 3 3 23 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: Trade-offs associated with LUT size Example: 2 -LUT (4=2 x 2 bits) vs. 10 -LUT (1024=32 x 32 bits) 1024 -bits 2 -LUT 10 -LUT Bit logic and constants 24 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 1024 -bits Iowa State University
Granularity: Trade-offs associated with LUT size Example: 2 -LUT (4=2 x 2 bits) vs. 10 -LUT (1024=32 x 32 bits) 1024 -bits 2 -LUT 10 -LUT Bit logic and constants 1024 -bits (A and “ 1100”) or (B or “ 1000”) 25 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: Trade-offs associated with LUT size Example: 2 -LUT (4=2 x 2 bits) vs. 10 -LUT (1024=32 x 32 bits) 1024 -bits 2 -LUT A 10 -LUT B Bit logic and constants 1024 -bits (A and “ 1100”) or (B or “ 1000”) 26 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: Trade-offs associated with LUT size Example: 2 -LUT (4=2 x 2 bits) vs. 10 -LUT (1024=32 x 32 bits) 1024 -bits A 4 2 -LUT AND 10 -LUT 1 Bit logic and constants OR (A and “ 1100”) or (B or “ 1000”) B 0 4 1024 -bits Area that was required using 2 -LUTS OR It’s much worse, each 10 -LUT only has one output 27 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: Example Architectures • Fine grain: GARP • Course grain: Pipe. Rench 28 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: GARP Memory I-cache D-cache CPU RFU Config cache Garp chip 29 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: GARP Memory RFU control (1) I-cache D-cache CPU RFU Config cache Garp chip Execution (16, 2 -bit) N PE (Processing Element) 30 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: GARP Memory RFU control (1) I-cache D-cache CPU RFU Config cache Garp chip Execution (16, 2 -bit) N PE (Processing Element) Example computations in one cycle A<<10 | (b&c) (A-2*b+c) 31 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: GARP Memory I-cache D-cache Impact of configuration size • 1 GHz bus frequency • 128 -bit memory bus • 512 Kbits of configuration size On a RFU context switch how long to load a new full configuration? CPU RFU Config cache Garp chip 4 microseconds An estimate of amount of time for the CPU perform a context switch is ~5 microseconds ~2 x increase context switch latency!! 32 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: GARP Memory RFU control (1) I-cache D-cache CPU RFU Config cache Execution (16, 2 -bit) N PE (Processing Element) Garp chip “The Garp Architecture and C Compiler” http: //www. cs. cmu. edu/~tcal/IEEE-Computer-Garp. pdf 33 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: Pipe. Rench • Coarse granularity • Higher (higher) level programming • Reference papers • Pipe. Rench: A Coprocessor for Streaming Multimedia Acceleration (ISCA 1999): http: //www. cs. cmu. edu/~mihaib/research/isca 99. pdf • Pipe. Rench Implementation of the Instruction Path Coprocessor (Micro 2000): http: //class. ee. iastate. edu/cpre 583/papers/piperench_Micro_2000. pdf 34 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: Pipe. Rench PE PE PE 8 -bit ALU Reg file Global bus Interconnect PE PE PE 8 -bit ALU 8 -bit ALU Reg file Reg file 35 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: Pipe. Rench Cycle 1 PE PE 2 3 4 5 6 Pipeline 0 stage 1 2 PE PE 3 4 36 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: Pipe. Rench Cycle 1 PE PE Pipeline 0 stage 1 2 3 4 5 6 0 2 PE PE 3 4 37 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: Pipe. Rench Cycle 1 PE PE Pipeline 0 stage 1 0 2 3 4 5 6 0 1 2 PE PE 3 4 38 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: Pipe. Rench Cycle 1 PE PE Pipeline 0 stage 1 2 PE PE 0 2 3 0 0 1 1 4 5 6 2 3 4 39 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: Pipe. Rench Cycle 1 PE PE Pipeline 0 stage 1 2 PE PE 3 0 2 3 4 0 0 1 1 1 2 2 5 6 3 4 40 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: Pipe. Rench Cycle 1 PE PE Pipeline 0 stage 1 2 PE PE 3 0 2 3 4 0 0 1 1 1 2 2 2 3 3 4 41 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 5 6 4 Iowa State University
Granularity: Pipe. Rench Cycle 1 PE PE Pipeline 0 stage 1 2 PE PE 3 0 2 3 4 5 6 0 0 1 1 1 2 2 2 3 3 3 4 4 0 4 42 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: Pipe. Rench Cycle 1 PE PE Pipeline 0 stage 1 0 2 3 0 0 1 1 1 2 2 2 3 3 3 4 4 2 PE PE 3 4 2 6 0 4 Cycle 1 5 3 4 5 6 Pipeline 0 stage 1 2 43 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: Pipe. Rench Cycle 1 PE PE Pipeline 0 stage 1 0 2 3 0 0 1 1 1 2 2 2 3 3 3 4 4 2 PE PE 3 4 Pipeline 0 stage 1 2 6 0 4 Cycle 1 5 3 4 5 6 0 2 44 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: Pipe. Rench Cycle 1 PE PE Pipeline 0 stage 1 0 2 3 0 0 1 1 1 2 2 2 3 3 3 4 4 2 PE PE 3 4 Pipeline 0 stage 1 0 2 6 0 4 Cycle 1 5 3 4 5 6 0 1 2 45 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: Pipe. Rench Cycle 1 PE PE Pipeline 0 stage 1 0 2 3 0 0 1 1 1 2 2 2 3 3 3 4 4 2 PE PE 3 4 Pipeline 0 stage 1 2 0 2 3 0 0 1 1 6 0 4 Cycle 1 5 4 5 6 2 46 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: Pipe. Rench Cycle 1 PE PE Pipeline 0 stage 1 0 2 3 0 0 1 1 1 2 2 2 3 3 3 4 4 2 PE PE 3 4 Pipeline 0 stage 1 2 0 6 0 4 Cycle 1 5 2 3 4 0 0 3 1 1 1 2 2 47 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 5 6 Iowa State University
Granularity: Pipe. Rench Cycle 1 PE PE Pipeline 0 stage 1 0 2 3 0 0 1 1 1 2 2 2 3 3 3 4 4 2 PE PE 3 4 Pipeline 0 stage 1 2 0 6 0 4 Cycle 1 5 2 3 4 5 0 0 3 3 1 1 1 4 2 2 2 48 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 6 Iowa State University
Granularity: Pipe. Rench Cycle 1 PE PE Pipeline 0 stage 1 0 2 3 0 0 1 1 1 2 2 2 3 3 3 4 4 2 PE PE 3 4 Pipeline 0 stage 1 2 0 6 0 4 Cycle 1 5 2 3 4 5 6 0 0 3 3 3 1 1 1 4 4 2 2 2 0 49 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Degree of Integration/Coupling • Independent Reconfigurable Coprocessor – Reconfigurable Fabric does not have direct communication with the CPU • Processor + Reconfigurable Processing Fabric – Loosely coupled on the same chip – Tightly coupled on the same chip 50 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Degree of Integration/Coupling DMA Controller Memory Controller L 2 Cache I/O Controller USB PCI NIC PCI-Express SATA Hard Drive 51 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University Main Memory L 1 Cache Write Back FPU Memory Decode Fetch CPU Execute ALU
Degree of Integration/Coupling DMA Controller Memory Controller L 2 Cache I/O Controller USB PCI RPF NIC PCI-Express SATA Hard Drive 52 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University Main Memory L 1 Cache Write Back FPU Memory Decode Fetch CPU Execute ALU
Degree of Integration/Coupling RPF DMA Controller Memory Controller L 2 Cache I/O Controller USB PCI NIC PCI-Express SATA Hard Drive 53 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University Main Memory L 1 Cache Write Back FPU Memory Decode Fetch CPU Execute ALU
Degree of Integration/Coupling DMA Controller L 2 Cache Memory Controller RPF I/O Controller PCI NIC PCI-Express SATA Hard Drive 54 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University Main Memory L 1 Cache Write Back USB FPU Memory Decode Fetch Config I/F CPU Execute ALU
Degree of Integration/Coupling DMA Controller L 2 Cache Memory Controller RPF I/O Controller PCI NIC PCI-Express SATA Hard Drive 55 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University Main Memory L 1 Cache Write Back USB FPU Memory Decode Fetch Config I/F CPU Execute ALU
Degree of Integration/Coupling DMA Controller Memory Controller L 2 Cache RPF USB PCI NIC I/O PCI-Express I/O Controller SATA Hard Drive 56 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University Main Memory L 1 Cache Write Back FPU Memory Decode Fetch Config I/F CPU Execute ALU
Degree of Integration/Coupling DMA Controller Memory Controller L 2 Cache I/O Controller USB PCI NIC PCI-Express SATA Hard Drive 57 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University Main Memory L 1 Cache Write Back FPU RFU Memory Decode Fetch CPU Execute ALU
MP 2 FPGA Power PC PC Ethernet (UDP/IP) Display. c User Defined Instruction Monitor 58 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 VGA Iowa State University
MP 2 FPGA Power PC PC Ethernet (UDP/IP) Display. c User Defined Instruction Monitor 59 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 VGA Iowa State University
MP 2 FPGA Power PC PC Ethernet (UDP/IP) Display. c User Defined Instruction Monitor 60 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 VGA Iowa State University
MP 2 Notes • MUCH less VHDL coding than MP 1 • But you will be writing most of the VHDL from scratch • The focus will be more on learning to read a specification (Power PC coprocessor interface protocol), and designing hardware that follows that protocol. • You will be dealing with some pointer intensive C-code. It’s a small amount of C code, but somewhat challenging to get the pointer math right. 61 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Lecture 3 notes / slides in progress 62 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: Pipe. Rench • Scheduling virtual stage on to physical • Partial/Dynamically reconfig (each cycle) 63 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Granularity: GARP • Impact of configuration size on performance • Context switching • Garp feature • Dynamic reconfigurable • Store multiple configurations in an on chip cache (4) • One configuration at a time • Example app mapping to GARP (loop) • Amdahl's Law The Garp Architecture and C Compiler • http: //www. cs. cmu. edu/~tcal/IEEE-Computer-Garp. pdf 64 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
Overview • Dimensions – Price – Granularity – Coupling – To optimize App Performance (compute (throughput, latency), Power, reliability) • RPF to efficiently implement VICs – Main picture authors' wants to convey • What’s the point or having a Reconfigure arch – Example (Increase App performance) • App -> SW/CPU • Profile • ID kernels of intense compute • Design custom hardware/instruction (Amdels law) – Intel FPL paper, great example for reading by Friday 65 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University
- Slides: 65