CS 295 Modern Systems What Are FPGAs and
- Slides: 22
CS 295: Modern Systems What Are FPGAs and Why Should You Care Sang-Woo Jun Spring, 2019
What Are FPGAs q Field-Programmable Gate Array q Can be configured to act like any circuit – More later! q Can do many things, but we focus on computation acceleration
FPGAs Come In Many Forms PCIe-Attached CPU Integrated In-Storage In-Network
How Is It Different From CPU/GPUs q GPU – The other major accelerator q CPU/GPU hardware is fixed o “General purpose” o we write programs (sequence of instructions) for them q FPGA hardware is not fixed o “Special purpose” o Hardware can be whatever we want o Will our hardware require/support software? Maybe! q Optimized hardware is very efficient o GPU-level performance** o 10 x power efficiency (300 W vs 30 W)
Analogy CPU/GPU comes with fixed circuits FPGA gives you a big bag of components To build whatever “The Z-Berry” “Experimental Investigations on Radiation Characteristics of IC Chips” benryves. com “Z 80 Computer” Shadi Soundation: Homebrew 4 bit CPU Could be a CPU/GPU!
Fine-Grained Parallelism of Special-Purpose Circuits q A = G × m 1 C = x 1 - x 2 E = y 1 - y 2 B = A × m 2 D = C 2 F = E 2 G = D + F Ret = B / G 4 cycles with basic operations A = G × m 1 × m 2 B = (x 1 - x 2)2 C = (y 1 - y 2)2 D = B + C Ret = B / G 3 cycles with compound operations May slow down clock Ret = (G × m 1 × m 2) / ((x 1 - x 2)2 + (y 1 - y 2)2) 1 cycle with even further compound operations
Coarse-Grained Parallelism of Special-Purpose Circuits q Typical unit of parallelism for general-purpose units are threads ~= cores q Special-purpose processing units can also be replicated for parallelism o Large, complex processing units: Few can fit in chip o Small, simple processing units: Many can fit in chip q Only generates hardware useful for the application o Instruction? Decoding? Cache? Coherence?
How Is It Different From ASICs q ASIC (Application-Specific Integrated Circuit) o Special chip purpose-built for an application o E. g. , ASIC bitcoin miner, Intel neural network accelerator o Function cannot be changed once expensively built q + FPGAs can be field-programmed o Function can be changed completely whenever o FPGA fabric emulates custom circuits q - Emulated circuits are not as efficient as bare-metal o ~10 x performance (larger circuits, faster clock) o ~10 x power efficiency
Basic FPGA Architecture “Configurable logic block (CLB)” Programmable ~ I/O block Latch 6 -Input Look-Up Table FF Ex) 2 -LUT for “AND” Input 1 Programmable interconnect Input 2 Output 0 0 1 1 1 Sequential circuit construction
Basic FPGA Architecture – DSP Blocks “DSP block” q CLBs act as gates – Many needed to implement high-level logic q Arithmetic operation provided as efficient ALU blocks o “Digital Signal Processing (DSP) blocks” o Each block provides an adder + multiplier × +/-
Basic FPGA Architecture – Block RAM “Block RAM” q CLB can act as flip-flops o (~1 bit/block) – tiny! q Some on-chip SRAM provided as blocks o ~18/36 Kbit/block, MBs per chip o Massively parallel access to data → multi. TB/s bandwidth
Basic FPGA Architecture – Hard Cores Memory Ethernet ARM PCIe q Some functions are provided as efficient, non-configurable “hard cores” o o o Multi-core ARM cores (“Zynq” series) Multi-Gigabit Transceivers PCIe/Ethernet PHY Memory controllers …
Example Accelerator Card Architecture q “FPGA Mezzanine Card” Expansion General-Purpose I/O Pins o Network Ports, Memory, Storage, PCIe, … Multi-Gigabit Transceivers FMC DRAM 1 Gb. E FPGA 40 Gb. E PCIe DRAM
Example Accelerator Card (VCU 108)
Programming FPGAs q Languages and tools overlap with ASIC/VLSI design q FPGAs for acceleration typically done with either o Hardware Description Languages (HDL): Register-Transfer Level (RTL) languages o High-Level Synthesis: Compiler translates software programming languages to RTL q RTL models a circuit using: o Registers (state), and o Combinational logic (computation)
Hardware Description Language q Software programming languages: Describes process q Hardware description languages: Describes structure std: : queue<float> input_queue; std: : queue<float> output_queue; float factor; Exists in memory while (true) { if ( !input_queue. empty() ) { ret = input_queue. front() * factor; output_queue. push(ret) input_queue. pop(); } } Instructions For CPU FIFO#(Float) input_queue <- mk. FIFO; FIFO#(Float) output_queue <- mk. FIFO; Reg#(Float) factor <- mk. Reg; Float. Mult. Ifc mult <- mk. Float. Mult; Exists on chip rule in; mult. enq(factor, input_queue. first); input_queue. deq; endrule out; ret <- mult. result; output_queue. enq(ret); endrule Creates circuits
Major Hardware Description Languages q Verilog: Most widely used in industry o Relatively low-level language supported by everyone q Chisel – Compiles to Verilog o Relatively high-level language from Berkeley o Embedded in the Scala programming language o Prominently used in RISC-V development (Rocket core, etc) q Bluespec – Compiles to Verilog o Relatively high-level language from MIT o Supports types, interfaces, etc o Also active RISC-V development (Piccolo, etc)
High-Level Synthesis q Compiler translates software programming languages to RTL q High-Level Synthesis compiler from Xilinx, Altera/Intel o o Compiles C/C++, annotated with #pragma’s into RTL Theory/history behind it is a complex can of worms we won’t go into Personal experience: needs to be HEAVILY annotated to get performance Anecdote: Naïve RISC-V in Vivado HLS achieves IPC of 0. 0002 [1], 0. 04 after optimizations [2] q Open. CL o Inherently parallel language more efficiently translated to hardware o Stable software interface [1] http: //msyksphinz. hatenablog. com/entry/2019/02/20/040000 [2] http: //msyksphinz. hatenablog. com/entry/2019/02/27/040000
FPGA Compilation Toolchain High-Level HDL Code Language Compiler Verilog/ VHDL High-level language vendor tool “Which transceiver instance should top_transceiver_01 map to? ” And so, so much more… Constraint File Functional Simulation Cycle-level Simulation FPGA Vendor toolchain (Few open source) Synthesize Netlist Map/ Place/ Route Bitfile
Programming/Using an FPGA Accelerator q Bitfile is programmed to FPGA over “JTAG” interface o Typically used over USB cable o Supports FPGA programming, limited debugging access, etc q PCIe-attached FPGA accelerator card is typically used similarly to GPUs o Program FPGA, execute software o Software copies data to FPGA board, notify FPGA -> FPGA logic performs computations -> Software copies data back from FPGA q FPGA flexibility gives immense freedom of usage patterns o Streaming, coherent memory, …
Partial Reconfiguration FPGA Sub-components q Parts of the FPGA can be swapped out dynamically without turning off FPGA o Physical area is drawn on chip q Used in Amazon F 1, etc q Toolchain support for isolation
FPGAs In The Cloud q Amazon EC 2 F 1 instance (1 – 4 FPGAs) q Microsoft Azure, etc…
- Antigentest åre
- Pages 294 and 295
- Embedded microprocessor system design using fpgas
- 7 series fpgas clocking resources user guide
- Basic fpga architecture
- Como estimar resultados de divisiones
- L
- Decreto 295
- Round 493 295 to the nearest ten thousand
- Topic 1 lesson 1-4
- Komax gamma 333
- Kj no 295
- Modern systems analysis and design
- Modern system analysis and design
- A modern approach to systems analysis and design
- Modern systems analysis and design 7th edition
- Modern railway systems
- Andrew tanenbaum modern operating systems
- Modern med recovery
- Layers of operating system
- Modern operating systems 3rd edition
- Modern operating systems tanenbaum 5th edition
- Deus ex on modern systems