6 375 Tutorial 4 RISCV and Final Projects

  • Slides: 31
Download presentation
6. 375 Tutorial 4 RISC-V and Final Projects Ming Liu March 4, 2016 http:

6. 375 Tutorial 4 RISC-V and Final Projects Ming Liu March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -1

Overview Branch Target Buffers RISC V Infrastructure Final Project March 4, 2016 http: //csg.

Overview Branch Target Buffers RISC V Infrastructure Final Project March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -2

Two-stage Pipeline with BTB Fetch Decode-Register. Fetch-Execute-Memory-Write. Back Update BTB {PC, correct PC} BTB

Two-stage Pipeline with BTB Fetch Decode-Register. Fetch-Execute-Memory-Write. Back Update BTB {PC, correct PC} BTB Predict Next PC PC kill f 2 d Register File misprediction correct pc Decode Execute Inst Memory Data Memory BTB: Branch Target Buffer At fetch: Use BTB to predict next PC At execute: Update BTB with correct next PC n March 4, 2016 Only if instruction is a branch (i. Type == J, Jr, Br) http: //csg. csail. mit. edu/6. 375 T 04 -3

Next Address Predictor: Branch Target Buffer (BTB) 2 k-entry direct-mapped BTB pc pci targeti

Next Address Predictor: Branch Target Buffer (BTB) 2 k-entry direct-mapped BTB pc pci targeti valid • k i. Mem • Even small BTBs are effective match BTB remembers recent targets for a set of control instructions n n March 4, 2016 Fetch: looks for the pc and the associated target in BTB; if pc in not found then ppc is pc+4 Execute: checks prediction, if wrong kills the instruction and updates BTB (only for branches and jumps) http: //csg. csail. mit. edu/6. 375 T 04 -4

Next Addr Predictor interface Addr. Pred; method Addr nap(Addr pc); method Action update(Redirect rd);

Next Addr Predictor interface Addr. Pred; method Addr nap(Addr pc); method Action update(Redirect rd); endinterface • Two implementations: a) Simple PC+4 predictor b) Predictor using BTB March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -5

Simple PC+4 predictor module mk. Pc. Plus 4(Addr. Pred); method Addr nap(Addr pc); return

Simple PC+4 predictor module mk. Pc. Plus 4(Addr. Pred); method Addr nap(Addr pc); return pc + 4; endmethod Action update(Redirect rd); endmethod endmodule March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -6

BTB predictor module mk. Btb(Addr. Pred); Reg. File#(Btb. Index, Addr) ppc. Arr <- mk.

BTB predictor module mk. Btb(Addr. Pred); Reg. File#(Btb. Index, Addr) ppc. Arr <- mk. Reg. File. Full; Reg. File#(Btb. Index, Btb. Tag) entry. Pc. Arr <- mk. Reg. File. Full; Vector#(Btb. Entries, Reg#(Bool)) valid. Arr <- replicate. M(mk. Reg(False)); function Btb. Index get. Index(Addr pc)=truncate(pc>>2); function Btb. Tag get. Tag(Addr pc) = truncate. LSB(pc); method Addr nap(Addr pc); Btb. Index index = get. Index(pc); Btb. Tag tag = get. Tag(pc); if(valid. Arr[index] && tag == entry. Pc. Arr. sub(index)) return ppc. Arr. sub(index); else return (pc + 4); endmethod Action update(Redirect redirect); . . . endmodule March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -7

BTB predictor update method redirect input contains a pc, the correct next pc and

BTB predictor update method redirect input contains a pc, the correct next pc and whether the branch was taken or not (to avoid making entries for not-taken branches) method Action update(Redirect redirect); if(redirect. taken) begin let index = get. Index(redirect. pc); let tag = get. Tag(redirect. pc); valid. Arr[index] <= True; entry. Pc. Arr. upd(index, tag); ppc. Arr. upd(index, redirect. next. Pc); end else if(tag == entry. Pc. Arr. sub(index)) valid. Arr[index] <= False; endmethod March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -8

Multiple Predictors: BTB + Branch Direction Predictors mispred insts must be filtered Next Addr

Multiple Predictors: BTB + Branch Direction Predictors mispred insts must be filtered Next Addr Pred tight • loop P C • Need • next PC immediately March 4, 2016 Br Dir Pred correct mispred Decode Reg Read • Instr type, PC relative targets available • Simple conditions, register targets available http: //csg. csail. mit. edu/6. 375 correct mispred Execute Write Back • Complex conditions available T 04 -9

RISC-V Processor SCE-MI Infrastructure March 4, 2016 http: //csg. csail. mit. edu/6. 375 T

RISC-V Processor SCE-MI Infrastructure March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -10

RISC-V Interface mk. Proc – BSV cpu. To. Host host. To. Cpu Host (Testbench)

RISC-V Interface mk. Proc – BSV cpu. To. Host host. To. Cpu Host (Testbench) March 4, 2016 CSR PC Core i. Mem. Init i. Mem d. Mem. Init d. Mem http: //csg. csail. mit. edu/6. 375 T 04 -11

RISC-V Interface interface Proc; method Action host. To. Cpu(Addr startpc); method Action. Value#(Cpu. To.

RISC-V Interface interface Proc; method Action host. To. Cpu(Addr startpc); method Action. Value#(Cpu. To. Host. Data) cpu. To. Host; interface Mem. Init i. Mem. Init; interface Mem. Init d. Mem. Init; endinterface typedef struct { Cpu. To. Host. Type c 2 h. Type; Bit#(16) data; } Cpu. To. Host. Data deriving(Bits, Eq); typedef enum { Exit. Code, Print. Char, Print. Int. Low, Print. Int. High } Cpu. To. Host. Type deriving(Bits, Eq); March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -12

RISC-V Interface: cpu. To. Host Write mtohost CSR: csrw mtohost, rs 1 n n

RISC-V Interface: cpu. To. Host Write mtohost CSR: csrw mtohost, rs 1 n n rs 1[15: 0]: data w 32 -bit Integer needs two writes rs 1[17: 16]: c 2 h. Type w 0: Exit code w 1: Print character w 2: Print low 16 bits w 3: Print high 16 bits typedef struct { Cpu. To. Host. Type c 2 h. Type; Bit#(16) data; } Cpu. To. Host. Data deriving(Bits, Eq); March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -13

RISC-V Interface: Others host. To. Cpu n Tells the processor to start running from

RISC-V Interface: Others host. To. Cpu n Tells the processor to start running from the given address i. Mem. Init/d. Mem. Init n n n March 4, 2016 Used to initialize i. Mem and d. Mem Can also be used to check when initialization is done Defined in Mem. Init. bsv http: //csg. csail. mit. edu/6. 375 T 04 -14

Sce. Mi Interface tb – C++ mk. Proc – BSV CSR PC Core i.

Sce. Mi Interface tb – C++ mk. Proc – BSV CSR PC Core i. Mem d. Mem March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -15

Load Program tb – C++ mk. Proc – BSV CSR PC Core add. riscv.

Load Program tb – C++ mk. Proc – BSV CSR PC Core add. riscv. vmh i. Mem d. Mem Bypass this step in simulation March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -16

Load Program tb – C++ mk. Proc – BSV CSR PC Core i. Mem

Load Program tb – C++ mk. Proc – BSV CSR PC Core i. Mem d. Mem mem. vmh Simulation: load with mem. vmh (fixed file name) n March 4, 2016 Copy <test>. riscv. vmh to mem. vmh http: //csg. csail. mit. edu/6. 375 T 04 -17

Start Processor tb – C++ mk. Proc – BSV CSR Starting PC 0 x

Start Processor tb – C++ mk. Proc – BSV CSR Starting PC 0 x 200 PC Core i. Mem d. Mem March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -18

Print & Exit tb – C++ Get reg c 2 h. Type: 1, 2,

Print & Exit tb – C++ Get reg c 2 h. Type: 1, 2, 3: print 0: Exit Data == 0 PASSED Data != 0 FAILED March 4, 2016 mk. Proc – BSV CSR PC Core i. Mem d. Mem http: //csg. csail. mit. edu/6. 375 T 04 -19

Final Project March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -20

Final Project March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -20

Overview Groups of 2 -3 students Each group assigned to a graduate mentor in

Overview Groups of 2 -3 students Each group assigned to a graduate mentor in our group Groups meet individually with Arvind, mentor and me Weekly reports due before the meeting n March 4, 2016 Email to 6. 375 -admin@mit. edu and mentor http: //csg. csail. mit. edu/6. 375 T 04 -21

Schedule March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -22

Schedule March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -22

Project Considerations Design a complex digital system Choose an application that could benefit from

Project Considerations Design a complex digital system Choose an application that could benefit from hardware acceleration or FPGAs Application should be well understood n Find/implement reference software code Look at past year projects on the website March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -23

FPGA IPs and Resources Many Xilinx related IPs are available in the BSV library

FPGA IPs and Resources Many Xilinx related IPs are available in the BSV library n n $BLUESPECDIR/BSVSource/Xilinx BRAMs, DRAM, Clock generators/buffers, LED controller, HDMI controller, LCD controller Can wrap Verilog libraries/IPs in BSV code using import. BVI n March 4, 2016 Tutorial: http: //wiki. bluespec. com/Home/Experienced. Users/Import-BVI http: //csg. csail. mit. edu/6. 375 T 04 -24

BRAMs on FPGAs Fast, small, on-chip distributed RAM on FPGA n n n 1

BRAMs on FPGAs Fast, small, on-chip distributed RAM on FPGA n n n 1 cycle access latency 36 Kbits x 1500 (approx) = ~6. 75 MB total Up to 2 ports • Port A • Port B • Request • BRAM • Resp March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -25

BRAMs in BSV Library 2 Ported BRAM server: mk. BRAM 2 Server() Large FIFOs:

BRAMs in BSV Library 2 Ported BRAM server: mk. BRAM 2 Server() Large FIFOs: mk. Sized. BRAMFIFO() Large sync FIFOs: mk. Sync. BRAMFIFO() Primitive BRAM: mk. BRAMCore 2() import BRAM: : *; BRAM_Configure cfg = default. Value ; cfg. memory. Size = 1024*32 ; //define custom memory. Size //instantiate 32 K x 16 bits BRAM module BRAM 2 Port#(UInt#(15), Bit#(16)) bram <- mk. BRAM 2 Server (cfg) ; rule do. Write; bram. port. A. request. put( BRAMRequest{ write: True, response. On. Write: False, address: 15’h 01 datain: data } ); March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -26

DRAM on FPGA Large capacity (1 GB on VC 707) Longer access latency, especially

DRAM on FPGA Large capacity (1 GB on VC 707) Longer access latency, especially random access BSV library at $BLUESPECDIR/BSVSource/ Xilinx/Xilinx. VC 707 DDR 3. bsv Misc/DDR 3. bsv n • DRAM • Off-chip • FPGA DRAM Controller IP Not officially in documentation Example code will be given as part of Lab 6 March 4, 2016 • DDR 3_Pins http: //csg. csail. mit. edu/6. 375 • BSV Wrapper • DDR 3_User T 04 -27

DRAM Request/Response 512 -bit wide user interface DDR Request: n n Write: write or

DRAM Request/Response 512 -bit wide user interface DDR Request: n n Write: write or read Byteen: byte enable mask. Which of the 8 -bit bytes in the 512 -bits will be written Address: DRAM address for 512 -bit words Data: data to be written DDR Response: n March 4, 2016 Bit#(512) read data http: //csg. csail. mit. edu/6. 375 T 04 -28

Indirect Memory Access Host CPU load/stores data from host DRAM to PCIe device (FPGA)

Indirect Memory Access Host CPU load/stores data from host DRAM to PCIe device (FPGA) n n Low bandwidth, consumes CPU cycles Used in Sce. Mi: ~50 MB/s • Host CPU Bus Host DRAM FPGA DRAM March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -29

Direct Memory Access (DMA) Host CPU sets up DMA engine performs data transfer n

Direct Memory Access (DMA) Host CPU sets up DMA engine performs data transfer n n High bandwidth, minimal CPU involved: 1 -4 GB/s Not supported by Sce. Mi • Host CPU Bus Host DRAM DMA Eng FPGA DRAM March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -30

Connectal A Sce. Mi Alternative Open source hardware/software codesign library n n Generates glue

Connectal A Sce. Mi Alternative Open source hardware/software codesign library n n Generates glue logic between software/hardware Supports DMA https: //github. com/cambridgehackers/c onnectal Guest lecture next Wed on this March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -31