Digital Design using FPGAs and Verilog HDL Project

  • Slides: 42
Download presentation
Digital Design using FPGAs and Verilog HDL Project IT – Autumn 2016 Mahdad Davari

Digital Design using FPGAs and Verilog HDL Project IT – Autumn 2016 Mahdad Davari <mahdad. davari@it. uu. se>

Programmable Devices • Since 1969: PROM, (E)EPROM, PAL, PLA, GAL, CPLD, FPGA • Key

Programmable Devices • Since 1969: PROM, (E)EPROM, PAL, PLA, GAL, CPLD, FPGA • Key Players in programmable-device industry: – Altera (first CPLD) – Xilinx (first FPGA) 2

FPGA from a Bird’s-Eye View 3

FPGA from a Bird’s-Eye View 3

FPGA in a Nutshell 4

FPGA in a Nutshell 4

Logic Slice 5

Logic Slice 5

Look-Up Table (LUT) SRAM cells 0 1 0 1 1 1 abc 6

Look-Up Table (LUT) SRAM cells 0 1 0 1 1 1 abc 6

FPGA Bird’s-Eye View 7

FPGA Bird’s-Eye View 7

Roadmap • • • Programmable Devices FPGA Design Flow FPGA vs GP-CPU vs ASIC

Roadmap • • • Programmable Devices FPGA Design Flow FPGA vs GP-CPU vs ASIC Accelerator Design Example Verilog HDL Example 8

FPGA Design Flow Design Entry (RTL design using HDL) Behavioral Simulation (Model. SIM) NO

FPGA Design Flow Design Entry (RTL design using HDL) Behavioral Simulation (Model. SIM) NO Behaviour OK? Synthesis (Quartus II) Place and Route (PAR) (Quartus II) Timing Analysis (Quartus II) Speed OK? NO Generate Bit Stream & Programme the Device (Quartus II) 9

Roadmap • • • Programmable Devices FPGA Design Flow FPGA vs GP-CPU vs ASIC

Roadmap • • • Programmable Devices FPGA Design Flow FPGA vs GP-CPU vs ASIC Accelerator Design Example Verilog HDL Example 10

CPU vs FPGA vs ASIC High CPU FPGA ASIC st Co er w Po

CPU vs FPGA vs ASIC High CPU FPGA ASIC st Co er w Po d ee Sp Ti m e- to Fle -M xib ar ilit ke t y Low 11

Roadmap • • • Programmable Devices FPGA Design Flow FPGA vs GP-CPU vs ASIC

Roadmap • • • Programmable Devices FPGA Design Flow FPGA vs GP-CPU vs ASIC Accelerator Design Example Verilog HDL Example 12

One Monday Morning … • FFT algorithm on CPU 13

One Monday Morning … • FFT algorithm on CPU 13

Butterfly Operation 14

Butterfly Operation 14

4 -Point Butterfly Operation X 0 X 2 X 1 X 3 2 -Point

4 -Point Butterfly Operation X 0 X 2 X 1 X 3 2 -Point BF Y 0 Y 1 Y 2 Y 3 15

8 -Point Butterfly Operation 16

8 -Point Butterfly Operation 16

16 -Point Butterfly Operation 17

16 -Point Butterfly Operation 17

32 -Point Butterfly Operation 18

32 -Point Butterfly Operation 18

Speedup CPU 8*TMem. + 24*TALU Accel. 1*TMem. + 3*TALU ≈ 8 x ( CPU

Speedup CPU 8*TMem. + 24*TALU Accel. 1*TMem. + 3*TALU ≈ 8 x ( CPU TMem. ≈ Accel. TMem. ) ( CPU TALU ≈ Accel. TALU ) 19

Roadmap • • • Programmable Devices FPGA Design Flow FPGA vs GP-CPU vs ASIC

Roadmap • • • Programmable Devices FPGA Design Flow FPGA vs GP-CPU vs ASIC Accelerator Design Example Verilog HDL Example 20

Top-Down Design 8 -Point FFT 0 1 O 0 O 1 2 3 O

Top-Down Design 8 -Point FFT 0 1 O 0 O 1 2 3 O 2 O 3 4 5 O 4 O 5 6 O 7 7 21

Top-Down Design 22

Top-Down Design 22

Top-Down Design Top: 8 -Point FFT o 0 o 1 0 1 0 i

Top-Down Design Top: 8 -Point FFT o 0 o 1 0 1 0 i 0 1 i 1 o 0 o 1 O 0 O 1 2 i 0 3 i 1 o 0 o 1 2 3 2 i 0 3 i 1 o 0 o 1 O 2 O 3 4 i 0 5 i 1 o 0 o 1 4 5 4 i 0 5 i 1 o 0 o 1 O 4 O 5 6 i 0 o 0 6 o 1 7 6 i 0 o 1 O 6 O 7 0 i 0 1 i 1 2 -Point BF 7 i 1 FFT: Stage 1 1 7 i 1 FFT: Stage 2 7 i 1 FFT: Stage 3 23

Top-Down Design Top: 8 -Point FFT o 0 o 1 0 1 0 i

Top-Down Design Top: 8 -Point FFT o 0 o 1 0 1 0 i 0 1 i 1 o 0 o 1 O 0 O 1 2 i 0 3 i 1 o 0 o 1 2 3 2 i 0 3 i 1 o 0 o 1 O 2 O 3 4 i 0 5 i 1 o 0 o 1 4 5 4 i 0 5 i 1 o 0 o 1 O 4 O 5 6 i 0 o 0 6 o 1 7 6 i 0 o 1 O 6 O 7 0 i 0 1 i 1 2 -Point BF 7 i 1 FFT: Stage 1 1 7 i 1 Pipe 1 FFT: Stage 2 7 i 1 Pipe 2 FFT: Stage 3 Pipe 3 24

Top-Down Design 25

Top-Down Design 25

Top-Down Design Top: 8 -Point FFT o 0 o 1 0 1 0 i

Top-Down Design Top: 8 -Point FFT o 0 o 1 0 1 0 i 0 1 i 1 o 0 o 1 O 0 O 1 2 i 0 3 i 1 o 0 o 1 2 3 2 i 0 3 i 1 o 0 o 1 O 2 O 3 4 i 0 5 i 1 o 0 o 1 4 5 4 i 0 5 i 1 o 0 o 1 O 4 O 5 6 i 0 o 0 6 o 1 7 6 i 0 o 1 O 6 O 7 0 i 0 1 i 1 2 -Point BF 7 i 1 FFT: Stage 1 1 7 i 1 Pipe 1 FFT: Stage 2 7 i 1 Pipe 2 FFT: Stage 3 Pipe 3 26

Top-Down Design 27

Top-Down Design 27

Top-Down Design Top: 8 -Point FFT o 0 o 1 0 1 0 i

Top-Down Design Top: 8 -Point FFT o 0 o 1 0 1 0 i 0 1 i 1 o 0 o 1 O 0 O 1 I 2 2 i 0 I 6 3 i 1 o 0 o 1 2 3 2 i 0 3 i 1 o 0 o 1 O 2 O 3 I 1 4 i 0 I 5 5 i 1 o 0 o 1 4 5 4 i 0 5 i 1 o 0 o 1 O 4 O 5 I 3 6 i 0 I 7 7 i 1 o 0 6 o 1 7 6 i 0 o 0 o 1 O 6 O 7 I 0 0 i 0 I 4 1 i 1 2 -Point BF FFT: Stage 1 1 7 i 1 Pipe 1 FFT: Stage 2 7 i 1 Pipe 2 FFT: Stage 3 Pipe 3 28

Top-Down Design Top: 8 -Point FFT o 0 o 1 0 1 0 i

Top-Down Design Top: 8 -Point FFT o 0 o 1 0 1 0 i 0 1 i 1 o 0 o 1 O 0 O 1 I 2 2 i 0 I 6 3 i 1 o 0 o 1 2 3 2 i 0 3 i 1 o 0 o 1 O 2 O 3 I 1 4 i 0 I 5 5 i 1 o 0 o 1 4 5 4 i 0 5 i 1 o 0 o 1 O 4 O 5 I 3 6 i 0 I 7 7 i 1 o 0 6 o 1 7 6 i 0 o 0 o 1 O 6 O 7 I 0 0 i 0 I 4 1 i 1 Valid Reset Clock 2 -Point BF 1 7 i 1 FFT: Stage 3 FFT: Stage 2 Pipe 1 Pipe 2 Ready Pipe 3 29

Bottom-Up Implementation X 0 X 1 Y 0 2 -Point BFY Y 1 module

Bottom-Up Implementation X 0 X 1 Y 0 2 -Point BFY Y 1 module butterfly (x 0, x 1, y 0, y 1); input x 0, x 1; output y 0, y 1; assign = x 0(y 0, + x 1; Adder y 0 Add 1 x 0, x 1); assign y 1 = Sub 1 x 0 – x 1; Subtractor (y 1, x 0, x 1); endmodule 30

Bottom-Up Implementation X 0 o 1 0 1 X 1 o 0 o 1

Bottom-Up Implementation X 0 o 1 0 1 X 1 o 0 o 1 2 3 4 i 0 5 i o 0 o 1 4 5 6 i 0 o 0 6 o 1 7 0 i 0 1 i 1 2 i 0 3 i 1 2 -Point BF 1 7 i 1 FFT Y 0 module fft (i 0, i 1, i 2, i 3, i 4, i 5, i 6, i 7, o 0, o 1, o 2, o 3, o 4, o 5, o 6, o 7); Y 1 Input i 0, i 1, i 2, i 3, i 4, i 5, i 6, i 7; output o 0, o 1, o 2, o 3, o 4, o 5, o 6, o 7; butterfly bf 1 (i 0, i 1, o 0, o 1); butterfly bf 2 (i 2, i 3, o 2, o 3); butterfly bf 3 (i 4, i 5, o 4, o 5); butterfly bf 4 (. y 0 (o 6), . y 1 (o 7), . x 0 (i 6), . x 1 (i 7)); endmodule 31

Top-Down Design Top: 8 -Point FFT o 0 o 1 0 1 0 i

Top-Down Design Top: 8 -Point FFT o 0 o 1 0 1 0 i 0 1 i 1 o 0 o 1 O 0 O 1 I 2 2 i 0 I 6 3 i 1 o 0 o 1 2 3 2 i 0 3 i 1 o 0 o 1 O 2 O 3 I 1 4 i 0 I 5 5 i 1 o 0 o 1 4 5 4 i 0 5 i 1 o 0 o 1 O 4 O 5 I 3 6 i 0 I 7 7 i 1 o 0 6 o 1 7 6 i 0 o 0 o 1 O 6 O 7 1 7 i 1 FFT: Stage 3 FFT: Stage 2 ad y Valid Reset Clk 2 -Point BF Re I 0 0 i 0 I 4 1 i 1 Pipe 2 Pipe 3 32

Bottom-Up Implementation module top (i 0, i 1, i 2, i 3, i, i

Bottom-Up Implementation module top (i 0, i 1, i 2, i 3, i, i 4, i 5, i 6, i 7, valid, rst, output clk, [7: 0] o, output ready); (input [7: 0] input valid, rst, clk, o 0, o 1, o 2, o 3, o 4, o 5, o 6, o 7, ready); Y 0 reg [8: 0] pipe 1; input i 0, i 1, i 2, i 3, i 4, i 5, i 6, i 7, valid, rst, clk; reg [8: 0] pipe 2; output o 1, o 2, o 3, o 4, o 5, o 6, o 7, ready; Y 1 reg [8: 0]o 0, pipe 3; endmodule wire [7: 0] w 1; wire [7: 0] w 2; wire [7: 0] w 3; fft stage 1 (i[0], i[4], i[2], i[6], i[1], i[5], i[3], i[7], w 1[0: 7]); fft stage 2 (pipe 1[0], pipe 1[2], pipe 1[1], pipe 1[3], pipe 1[4], pipe 1[6], pipe 1[5], pipe 1[7], w 2[0: 7]); fft stage 3 (pipe 2[0], pipe 2[4], pipe 2[2], pipe 2[6], pipe 2[1], pipe 2[5], pipe 2[3], pipe 2[7], w 3[0: 7]); // continued in the next slide … 33

Bottom-Up Implementation // continued from the previous slide … always @ (posedge clk) begin

Bottom-Up Implementation // continued from the previous slide … always @ (posedge clk) begin if (rst) begin pipe 1 <= 9’b 00000; pipe 2 <= 9’d 0; pipe 3 <= 0; end else begin pipe 1 <= {valid, w 1}; pipe 2 <= {pipe 1[8], w 2}; pipe 3 <= {pipe 2[8], w 3}; end Y 0 Y 1 // continued in the next slide … 34

Bottom-Up Implementation // continued from the previous slide … Y always @ (w 3

Bottom-Up Implementation // continued from the previous slide … Y always @ (w 3 or pipe 3[8]) // also always @ 0(w 3, pipe 3[8]), or simply always @ (*) for all the signals Begin Y 1 {ready, o} = pipe 3; end // assign {ready, o} = pipe 3; endmodule 35

Testbench Top (Design Under Test) Input Generator Input Output == Expected Result Test OK!

Testbench Top (Design Under Test) Input Generator Input Output == Expected Result Test OK! 36

Testbench module fft_tb; reg clk, rst, valid; reg [8: 0] i; wire [8: 0]

Testbench module fft_tb; reg clk, rst, valid; reg [8: 0] i; wire [8: 0] o; wire ready; top dut (. i(i), . valid(valid), . rst(rst), . clk(clk), . o(o), . ready(ready)); always #5 clk = !clk; initial begin rst=0; clk=0; valid=0; rst = #20 1’b 1; i = #20 8’hff; valid = 1’b 1; valid = #10 1’b 0; #50 $finish; endmodule 37

Net Types in Verilog • Wire – Used only as connectors, or – left-hand

Net Types in Verilog • Wire – Used only as connectors, or – left-hand side of “assign”, e. g. “assign w = a & b” • Reg – Implements combinatorial or sequential logic – Used inside “always” blocks 38

Combinatorial vs. Sequential // combinatorial // sequential wire my. Wire; assign my. Wire =

Combinatorial vs. Sequential // combinatorial // sequential wire my. Wire; assign my. Wire = a | b; reg my. Reg; always @ (posedge Clk) my. Reg <= a | b; reg my. Reg; always @ (a or b) // also @ (a, b) my. Reg = a | b; N. B. always @ (a or b) begin if (a == 1 or b == 1) my. Reg = 1; else my. Reg = 1; end N. B. - a net should be assigned ONLY in a single block - combinatorial: = - sequential: <= 39

Two-Dimensional Input Ports module my. Module (input [7: 0] i [0: 3], output [7:

Two-Dimensional Input Ports module my. Module (input [7: 0] i [0: 3], output [7: 0] o [0: 3]); module my. Module (input [31: 0] i, output [31: 0] o); wire [7: 0] my. Array [0: 3]; assign {my. Array [3], my. Array [2], my. Array [1], my. Array [0]} = i; assign o = {my. Array [0], my. Array [1], my. Array [2], my. Array [3]} 40

Useful References • • • https: //www. doulos. com/knowhow/verilog_designers_guide/ (good starting point into Verilog)

Useful References • • • https: //www. doulos. com/knowhow/verilog_designers_guide/ (good starting point into Verilog) https: //inst. eecs. berkeley. edu/~cs 150/Documents/Nets. pdf (net types in Verilog, wire vs. reg) http: //www. asic-world. com/tidbits/blocking. html (blocking vs. non-blocking assignmets, see the example) http: //web. mit. edu/6. 111/www/f 2007/handouts/L 06. pdf (another reference for blocking vs. nonblocking assignments and finite-state-machine design; slides 1 to 7 and slides 11 to 15) http: //www. asic-world. com/verilog/art_testbench_writing 1. html (writing testbenches in Verilog) http: //www. rfwireless-world. com/source-code/ (useful source code examples; jump to Verilog part) http: //www. fpl 2016. org/slides/Gupta%20 --%20 Accelerating%20 Datacenter%20 Workloads. pdf (HARP-related material) http: //web. cs. ucla. edu/~haoyc/pdf/dac 16. pdf (HARP-related paper) https: //pdfs. semanticscholar. org/8 b 8 f/8 cb 7885 bc 751 fa 919 d 216 d 96 caf 4 a 0234717. pdf (HARPrelated paper) 41

Thank you! 42

Thank you! 42