6 375 Tutorial 4 RISCV and Final Projects































- Slides: 31

6. 375 Tutorial 4 RISC-V and Final Projects Ming Liu March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -1

Overview Branch Target Buffers RISC V Infrastructure Final Project March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -2

Two-stage Pipeline with BTB Fetch Decode-Register. Fetch-Execute-Memory-Write. Back Update BTB {PC, correct PC} BTB Predict Next PC PC kill f 2 d Register File misprediction correct pc Decode Execute Inst Memory Data Memory BTB: Branch Target Buffer At fetch: Use BTB to predict next PC At execute: Update BTB with correct next PC n March 4, 2016 Only if instruction is a branch (i. Type == J, Jr, Br) http: //csg. csail. mit. edu/6. 375 T 04 -3

Next Address Predictor: Branch Target Buffer (BTB) 2 k-entry direct-mapped BTB pc pci targeti valid • k i. Mem • Even small BTBs are effective match BTB remembers recent targets for a set of control instructions n n March 4, 2016 Fetch: looks for the pc and the associated target in BTB; if pc in not found then ppc is pc+4 Execute: checks prediction, if wrong kills the instruction and updates BTB (only for branches and jumps) http: //csg. csail. mit. edu/6. 375 T 04 -4

Next Addr Predictor interface Addr. Pred; method Addr nap(Addr pc); method Action update(Redirect rd); endinterface • Two implementations: a) Simple PC+4 predictor b) Predictor using BTB March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -5

Simple PC+4 predictor module mk. Pc. Plus 4(Addr. Pred); method Addr nap(Addr pc); return pc + 4; endmethod Action update(Redirect rd); endmethod endmodule March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -6

BTB predictor module mk. Btb(Addr. Pred); Reg. File#(Btb. Index, Addr) ppc. Arr <- mk. Reg. File. Full; Reg. File#(Btb. Index, Btb. Tag) entry. Pc. Arr <- mk. Reg. File. Full; Vector#(Btb. Entries, Reg#(Bool)) valid. Arr <- replicate. M(mk. Reg(False)); function Btb. Index get. Index(Addr pc)=truncate(pc>>2); function Btb. Tag get. Tag(Addr pc) = truncate. LSB(pc); method Addr nap(Addr pc); Btb. Index index = get. Index(pc); Btb. Tag tag = get. Tag(pc); if(valid. Arr[index] && tag == entry. Pc. Arr. sub(index)) return ppc. Arr. sub(index); else return (pc + 4); endmethod Action update(Redirect redirect); . . . endmodule March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -7

BTB predictor update method redirect input contains a pc, the correct next pc and whether the branch was taken or not (to avoid making entries for not-taken branches) method Action update(Redirect redirect); if(redirect. taken) begin let index = get. Index(redirect. pc); let tag = get. Tag(redirect. pc); valid. Arr[index] <= True; entry. Pc. Arr. upd(index, tag); ppc. Arr. upd(index, redirect. next. Pc); end else if(tag == entry. Pc. Arr. sub(index)) valid. Arr[index] <= False; endmethod March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -8

Multiple Predictors: BTB + Branch Direction Predictors mispred insts must be filtered Next Addr Pred tight • loop P C • Need • next PC immediately March 4, 2016 Br Dir Pred correct mispred Decode Reg Read • Instr type, PC relative targets available • Simple conditions, register targets available http: //csg. csail. mit. edu/6. 375 correct mispred Execute Write Back • Complex conditions available T 04 -9

RISC-V Processor SCE-MI Infrastructure March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -10

RISC-V Interface mk. Proc – BSV cpu. To. Host host. To. Cpu Host (Testbench) March 4, 2016 CSR PC Core i. Mem. Init i. Mem d. Mem. Init d. Mem http: //csg. csail. mit. edu/6. 375 T 04 -11

RISC-V Interface interface Proc; method Action host. To. Cpu(Addr startpc); method Action. Value#(Cpu. To. Host. Data) cpu. To. Host; interface Mem. Init i. Mem. Init; interface Mem. Init d. Mem. Init; endinterface typedef struct { Cpu. To. Host. Type c 2 h. Type; Bit#(16) data; } Cpu. To. Host. Data deriving(Bits, Eq); typedef enum { Exit. Code, Print. Char, Print. Int. Low, Print. Int. High } Cpu. To. Host. Type deriving(Bits, Eq); March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -12

RISC-V Interface: cpu. To. Host Write mtohost CSR: csrw mtohost, rs 1 n n rs 1[15: 0]: data w 32 -bit Integer needs two writes rs 1[17: 16]: c 2 h. Type w 0: Exit code w 1: Print character w 2: Print low 16 bits w 3: Print high 16 bits typedef struct { Cpu. To. Host. Type c 2 h. Type; Bit#(16) data; } Cpu. To. Host. Data deriving(Bits, Eq); March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -13

RISC-V Interface: Others host. To. Cpu n Tells the processor to start running from the given address i. Mem. Init/d. Mem. Init n n n March 4, 2016 Used to initialize i. Mem and d. Mem Can also be used to check when initialization is done Defined in Mem. Init. bsv http: //csg. csail. mit. edu/6. 375 T 04 -14

Sce. Mi Interface tb – C++ mk. Proc – BSV CSR PC Core i. Mem d. Mem March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -15

Load Program tb – C++ mk. Proc – BSV CSR PC Core add. riscv. vmh i. Mem d. Mem Bypass this step in simulation March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -16

Load Program tb – C++ mk. Proc – BSV CSR PC Core i. Mem d. Mem mem. vmh Simulation: load with mem. vmh (fixed file name) n March 4, 2016 Copy <test>. riscv. vmh to mem. vmh http: //csg. csail. mit. edu/6. 375 T 04 -17

Start Processor tb – C++ mk. Proc – BSV CSR Starting PC 0 x 200 PC Core i. Mem d. Mem March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -18

Print & Exit tb – C++ Get reg c 2 h. Type: 1, 2, 3: print 0: Exit Data == 0 PASSED Data != 0 FAILED March 4, 2016 mk. Proc – BSV CSR PC Core i. Mem d. Mem http: //csg. csail. mit. edu/6. 375 T 04 -19

Final Project March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -20

Overview Groups of 2 -3 students Each group assigned to a graduate mentor in our group Groups meet individually with Arvind, mentor and me Weekly reports due before the meeting n March 4, 2016 Email to 6. 375 -admin@mit. edu and mentor http: //csg. csail. mit. edu/6. 375 T 04 -21

Schedule March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -22

Project Considerations Design a complex digital system Choose an application that could benefit from hardware acceleration or FPGAs Application should be well understood n Find/implement reference software code Look at past year projects on the website March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -23

FPGA IPs and Resources Many Xilinx related IPs are available in the BSV library n n $BLUESPECDIR/BSVSource/Xilinx BRAMs, DRAM, Clock generators/buffers, LED controller, HDMI controller, LCD controller Can wrap Verilog libraries/IPs in BSV code using import. BVI n March 4, 2016 Tutorial: http: //wiki. bluespec. com/Home/Experienced. Users/Import-BVI http: //csg. csail. mit. edu/6. 375 T 04 -24

BRAMs on FPGAs Fast, small, on-chip distributed RAM on FPGA n n n 1 cycle access latency 36 Kbits x 1500 (approx) = ~6. 75 MB total Up to 2 ports • Port A • Port B • Request • BRAM • Resp March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -25

BRAMs in BSV Library 2 Ported BRAM server: mk. BRAM 2 Server() Large FIFOs: mk. Sized. BRAMFIFO() Large sync FIFOs: mk. Sync. BRAMFIFO() Primitive BRAM: mk. BRAMCore 2() import BRAM: : *; BRAM_Configure cfg = default. Value ; cfg. memory. Size = 1024*32 ; //define custom memory. Size //instantiate 32 K x 16 bits BRAM module BRAM 2 Port#(UInt#(15), Bit#(16)) bram <- mk. BRAM 2 Server (cfg) ; rule do. Write; bram. port. A. request. put( BRAMRequest{ write: True, response. On. Write: False, address: 15’h 01 datain: data } ); March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -26

DRAM on FPGA Large capacity (1 GB on VC 707) Longer access latency, especially random access BSV library at $BLUESPECDIR/BSVSource/ Xilinx/Xilinx. VC 707 DDR 3. bsv Misc/DDR 3. bsv n • DRAM • Off-chip • FPGA DRAM Controller IP Not officially in documentation Example code will be given as part of Lab 6 March 4, 2016 • DDR 3_Pins http: //csg. csail. mit. edu/6. 375 • BSV Wrapper • DDR 3_User T 04 -27

DRAM Request/Response 512 -bit wide user interface DDR Request: n n Write: write or read Byteen: byte enable mask. Which of the 8 -bit bytes in the 512 -bits will be written Address: DRAM address for 512 -bit words Data: data to be written DDR Response: n March 4, 2016 Bit#(512) read data http: //csg. csail. mit. edu/6. 375 T 04 -28

Indirect Memory Access Host CPU load/stores data from host DRAM to PCIe device (FPGA) n n Low bandwidth, consumes CPU cycles Used in Sce. Mi: ~50 MB/s • Host CPU Bus Host DRAM FPGA DRAM March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -29

Direct Memory Access (DMA) Host CPU sets up DMA engine performs data transfer n n High bandwidth, minimal CPU involved: 1 -4 GB/s Not supported by Sce. Mi • Host CPU Bus Host DRAM DMA Eng FPGA DRAM March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -30

Connectal A Sce. Mi Alternative Open source hardware/software codesign library n n Generates glue logic between software/hardware Supports DMA https: //github. com/cambridgehackers/c onnectal Guest lecture next Wed on this March 4, 2016 http: //csg. csail. mit. edu/6. 375 T 04 -31