Constructive Computer Architecture Multistage Pipelined Processors and modular

  • Slides: 32
Download presentation
Constructive Computer Architecture: Multistage Pipelined Processors and modular refinement Arvind Computer Science & Artificial

Constructive Computer Architecture: Multistage Pipelined Processors and modular refinement Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October 16, 2013 http: //csg. csail. mit. edu/6. s 195 L 13 -1

Contributors to the course material Arvind, Rishiyur S. Nikhil, Joel Emer, Muralidaran Vijayaraghavan Staff

Contributors to the course material Arvind, Rishiyur S. Nikhil, Joel Emer, Muralidaran Vijayaraghavan Staff and students in 6. 375 (Spring 2013), 6. S 195 (Fall 2012), 6. S 078 (Spring 2012) n Asif Khan, Richard Ruhler, Sang Woo Jun, Abhinav Agarwal, Myron King, Kermin Fleming, Ming Liu, Li. Shiuan Peh External n n October 16, 2013 Prof Amey Karkare & students at IIT Kanpur Jihong Kim & students at Seoul Nation University Derek Chiou, University of Texas at Austin Yoav Etsion & students at Technion http: //csg. csail. mit. edu/6. s 195 L 13 -2

Normal Register File module mk. RFile(RFile); Vector#(32, Reg#(Data)) rfile <- replicate. M(mk. Reg(0)); method

Normal Register File module mk. RFile(RFile); Vector#(32, Reg#(Data)) rfile <- replicate. M(mk. Reg(0)); method Action wr(RIndx rindx, Data data); if(rindx!=0) rfile[rindx] <= data; endmethod Data rd 1(RIndx rindx) = rfile[rindx]; method Data rd 2(RIndx rindx) = rfile[rindx]; endmodule {rd 1, rd 2} < wr October 16, 2013 http: //csg. csail. mit. edu/6. s 195 L 13 -3

Bypass Register File using EHR module mk. Bypass. RFile(RFile); Vector#(32, Ehr#(2, Data)) rfile <

Bypass Register File using EHR module mk. Bypass. RFile(RFile); Vector#(32, Ehr#(2, Data)) rfile < replicate. M(mk. Ehr(0)); method Action wr(RIndx rindx, Data data); if(rindex!=0) (rfile[rindex])[0] <= data; endmethod Data rd 1(RIndx rindx) = (rfile[rindx])[1]; method Data rd 2(RIndx rindx) = (rfile[rindx])[1]; endmodule wr < {rd 1, rd 2} October 16, 2013 http: //csg. csail. mit. edu/6. s 195 L 13 -4

Bypass Register File with external bypassing {rd 1, rd 2} < wr rd rf

Bypass Register File with external bypassing {rd 1, rd 2} < wr rd rf normal module mk. Bypass. RFile(Bypass. RFile); move RFile rf <- mk. RFile; Fifo#(1, Tuple 2#(RIndx, Data)) bypass <- mk. Bypass. SFifo; wr < {rd 1, rd 2} rule move; let {. idx, . data} = bypass. first; rf. wr(idx, data); bypass. deq; endrule method Action wr(RIndx rindx, Data data); if(rindex!=0) bypass. enq(tuple 2(rindx, data)); endmethod Data rd 1(RIndx rindx) = return (!bypass. search 1(rindx))? rf. rd 1(rindx) : bypass. read 1(rindx); method Data rd 2(RIndx rindx) = return (!bypass. search 2(rindx))? rf. rd 2(rindx) : bypass. read 2(rindx); Octoberendmodule 16, 2013 http: //csg. csail. mit. edu/6. s 195 L 13 -5

Scoreboard implementation using searchable Fifos module mk. CFScoreboard(Scoreboard#(size)); SFifo#(size, Maybe#(RIndx)) f <- mk. CFSFifo(is.

Scoreboard implementation using searchable Fifos module mk. CFScoreboard(Scoreboard#(size)); SFifo#(size, Maybe#(RIndx)) f <- mk. CFSFifo(is. Found); method insert = f. enq; method remove = f. deq; method search 1 = f. search 1; method search 2 = f. search 2; endmodule function Bool is. Found (Maybe#(RIndx) dst, Maybe#(RIndx) src); return is. Valid(dst) && is. Valid(src) && (from. Maybe(? , dst)==from. Maybe(? , src)); endfunction October 16, 2013 http: //csg. csail. mit. edu/6. s 195 L 13 -6

f. Epoch 3 -Stage-DH pipeline pred PC redirect Register File e. Epoch Execute Decode

f. Epoch 3 -Stage-DH pipeline pred PC redirect Register File e. Epoch Execute Decode e 2 c d 2 e Inst Memory scoreboard Data Memory Exec 2 Commit{Maybe#(RIndx)dst, Data data}; October 16, 2013 http: //csg. csail. mit. edu/6. s 195 L 13 -7

3 -Stage-DH pipeline module mk. Proc(Proc); Reg#(Addr) pc <- mk. Reg. U; RFile rf

3 -Stage-DH pipeline module mk. Proc(Proc); Reg#(Addr) pc <- mk. Reg. U; RFile rf <- mk. Bypass. RFile; IMemory i. Mem <- mk. IMemory; DMemory d. Mem <- mk. DMemory; Fifo#(1, Decode 2 Execute) d 2 e <- mk. Pipeline. Fifo; Fifo#(1, Exec 2 Commit) e 2 c <- mk. Pipeline. Fifo; Scoreboard#(2) sb <- mk. Pipeline. Scoreboard; // contains two instructions Reg#(Bool) f. Epoch <- mk. Reg(False); Reg#(Bool) e. Epoch <- mk. Reg(False); Fifo#(Addr) redirect <- mk. Bypass. Fifo; October 16, 2013 http: //csg. csail. mit. edu/6. s 195 L 13 -8

3 -Stage-DH pipeline do. Fetch rule Unchanged from 2 -stage rule do. Fetch; let

3 -Stage-DH pipeline do. Fetch rule Unchanged from 2 -stage rule do. Fetch; let inst = i. Mem. req(pc); if(redirect. not. Empty) begin f. Epoch <= !f. Epoch; pc <= redirect. first; redirect. deq; end else begin let ppc = next. Addr. Predictor(pc); let d. Inst = decode(inst); let stall = sb. search 1(d. Inst. src 1)|| sb. search 2(d. Inst. src 2) || sb. search 3(d. Inst. dst); ; if(!stall) begin let r. Val 1 = rf. rd 1(valid. Reg. Value(d. Inst. src 1)); let r. Val 2 = rf. rd 2(valid. Reg. Value(d. Inst. src 2)); d 2 e. enq(Decode 2 Execute{pc: pc, ppc: ppc, d. Iinst: d. Inst, epoch: f. Epoch, r. Val 1: r. Val 1, r. Val 2: r. Val 2}); sb. insert(d. Inst. r. Dst); pc <= ppc; end endrule October 16, 2013 http: //csg. csail. mit. edu/6. s 195 L 13 -9

3 -Stage-DH pipeline do. Execute rule do. Execute; let x = d 2 e.

3 -Stage-DH pipeline do. Execute rule do. Execute; let x = d 2 e. first; let d. Inst = x. d. Inst; let pc = x. pc; let ppc = x. ppc; let epoch = x. epoch; let r. Val 1 = x. r. Val 1; let r. Val 2 = x. r. Val 2; if(epoch == e. Epoch) begin let e. Inst = exec(d. Inst, r. Val 1, r. Val 2, pc, ppc); if(e. Inst. i. Type == Ld) e. Inst. data < d. Mem. req(Mem. Req{op: Ld, addr: e. Inst. addr, data: ? }); else if (e. Inst. i. Type == St) let d <- d. Mem. req(Mem. Req{op: St, addr: e. Inst. addr, data: e. Inst. data}); if (is. Valid(e. Inst. dst)) e 2 c. enq(Exec 2 Commit{dst: e. Inst. dst, data: e. Inst. data}); rf. wr(valid. Reg. Value(e. Inst. dst), e. Inst. data); if(e. Inst. mispredict) begin redirect. enq(e. Inst. addr); e. Epoch <= !e. Epoch; end end else e 2 c. enq(Exec 2 Commit{dst: Invalid, data: ? }); d 2 e. deq; sb. remove; endrule October 16, 2013 http: //csg. csail. mit. edu/6. s 195 L 13 -10

3 -Stage-DH pipeline do. Commit rule do. Commit; let dst = e. Inst. first.

3 -Stage-DH pipeline do. Commit rule do. Commit; let dst = e. Inst. first. dst; let data = e. Inst. first. data; if(is. Valid(dst)) rf. wr(tuple 2(from. Maybe(? , dst), data); e 2 c. deq; sb. remove; endrule October 16, 2013 http: //csg. csail. mit. edu/6. s 195 L 13 -11

Successive refinement & Modular Structure rf pc fetch i. Mem writeback d. Mem CPU

Successive refinement & Modular Structure rf pc fetch i. Mem writeback d. Mem CPU Can we derive a multistage pipeline by successive refinement of a 2 -stage pipeline? October 16, 2013 memory execute decode pc fetch & decode http: //csg. csail. mit. edu/6. s 195 CPU rf execute d 2 e L 13 -12

Architectural refinements Separating Fetch and Decode Replace magic memory by multicycle memory Multicycle functional

Architectural refinements Separating Fetch and Decode Replace magic memory by multicycle memory Multicycle functional units … Nirav Dave, M. C. Ng, M. Pellauer, Arvind [Memocode 2010] A design flow based on modular refinement October 16, 2013 http: //csg. csail. mit. edu/6. s 195 L 13 -13

2 -stage Processor Pipeline Register File rd 1 rd 2 redirect do. Fetch search

2 -stage Processor Pipeline Register File rd 1 rd 2 redirect do. Fetch search wr do. Execute insert d 2 e remove Scoreboard Encapsulate Fetch and Execute in their own modules respectively Pass methods of other modules as parameters For correctness, an instruction should be deleted from sb only after rf has been updated n n October 16, 2013 remove and wr should happen atomically search and rd 1, rd 2 should happen atomically http: //csg. csail. mit. edu/6. s 195 L 13 -14

Interface Arguments Any subset of methods from a module interface can be used to

Interface Arguments Any subset of methods from a module interface can be used to define a partial interface Fifo. Enq#(t); method Action enq(t x); endinterface A function can be defined to extract the desired methods from an interface function Fifo. Enq#(t) get. Fifo. Enq(Fifo#(n, t) f); return interface Fifo. Enq#(t); method Action enq(t x) = f. enq(x); endinterface endfunction October 16, 2013 http: //csg. csail. mit. edu/6. s 195 L 13 -15

Modular Processor module mk. Modular. Proc(Proc); IMemory i. Mem <- mk. IMemory; DMemory d.

Modular Processor module mk. Modular. Proc(Proc); IMemory i. Mem <- mk. IMemory; DMemory d. Mem <- mk. DMemory; Fifo#(Decode 2 Execute) d 2 e <- mk. Pipeline. Fifo; Fifo#(Addr) redirect <- mk. Bypass. Fifo; RFile rf <- mk. Bypass. RFile; Scoreboard#(1) sb <- mk. Pipeline. Scoreboard; Fetch fetch <- mk. Fetch(i. Mem, get. Fifo. Enq(d 2 e), get. Fifo. Deq(redirect) get. Rf. Read(rf), get. Sb. Ins. Search(sb); Execute execute <- mk. Execute(d. Mem, get. Fifo. Deq(d 2 e), get. Fifo. Enq(redirect), get. Rf. W(rf), get. Sb. Rem(sb); endmodule no rules – all communication takes place via method calls to shared modules October 16, 2013 http: //csg. csail. mit. edu/6. s 195 L 13 -16

Fetch Module mk. Fetch(Imemory i. Mem, Fifo. Enq#(Decode 2 Execute) d 2 e, Fifo.

Fetch Module mk. Fetch(Imemory i. Mem, Fifo. Enq#(Decode 2 Execute) d 2 e, Fifo. Deq#(Addr) redirect, Register. File. Read rf, Scoreboard. Insert sb) Reg#(Addr) pc <- mk. Reg. U; Reg#(Bool) f. Epoch <- mk. Reg(False); rule fetch ; if(redirect. not. Empty) begin. . endrule endmodule October 16, 2013 http: //csg. csail. mit. edu/6. s 195 L 13 -17

Fetch Module continued rule fetch ; if(redirect. not. Empty) begin f. Epoch <= !f.

Fetch Module continued rule fetch ; if(redirect. not. Empty) begin f. Epoch <= !f. Epoch; pc <= redirect. first; redirect. deq; end Unchanged from 2 -stage else begin let inst. F = i. Mem. req(pc); let ppc. F = next. Addr. Predictor(pc); let d. Inst = decode(inst. F); let stall = sb. search 1(d. Inst. src 1)|| sb. search 2(d. Inst. src 2); if(!stall) begin let r. Val 1 = rf. rd 1(valid. Reg. Value(d. Inst. src 1)); let r. Val 2 = rf. rd 2(valid. Reg. Value(d. Inst. src 2)); d 2 e. enq(Decode 2 Execute{pc: pc, ppc: ppc. F, d. Iinst: d. Inst, epoch: f. Epoch, r. Val 1: r. Val 1, r. Val 2: r. Val 2}); sb. insert(d. Inst. dst); pc <= ppc. F; end endrule October 16, 2013 http: //csg. csail. mit. edu/6. s 195 L 13 -18

Execute Module mk. Execute(Dmemory d. Mem, Fifo. Deq#(Decode 2 Execute) d 2 e, Fifo.

Execute Module mk. Execute(Dmemory d. Mem, Fifo. Deq#(Decode 2 Execute) d 2 e, Fifo. Enq#(Addr) redirect, Register. File. Write rf, Scoreboard. Insert sb) Reg#(Bool) e. Epoch <- mk. Reg(False); rule do. Execute; . . . endrule endmodule October 16, 2013 http: //csg. csail. mit. edu/6. s 195 L 13 -19

Execute Module continued rule do. Execute; let x = d 2 e. first; let

Execute Module continued rule do. Execute; let x = d 2 e. first; let d. Inst. E = x. d. Inst; let pc. E = x. pc; let ppc. E = x. ppc; Unchanged from 2 -stage let epoch = x. epoch; let r. Val 1 E = x. r. Val 1; let r. Val 2 E = x. r. Val 2; if(epoch == e. Epoch) begin let e. Inst = exec(d. Inst. E, r. Val 1 E, r. Val 2 E, pc. E, ppc. E); if(e. Inst. i. Type == Ld) e. Inst. data < d. Mem. req(Mem. Req{op: Ld, addr: e. Inst. addr, data: ? }); else if (e. Inst. i. Type == St) let d <- d. Mem. req(Mem. Req{op: St, addr: e. Inst. addr, data: e. Inst. data}); if(is. Valid(e. Inst. dst)) rf. wr(from. Maybe(? , e. Inst. dst), e. Inst. data); if(e. Inst. mispredict) begin redirect. enq(e. Inst. addr); e. Epoch <= !e. Epoch; end end d 2 e. deq; sb. remove; endrule October 16, 2013 http: //csg. csail. mit. edu/6. s 195 L 13 -20

Modular refinement: Separating Fetch and Decode pc fetch rf decode i. Mem October 16,

Modular refinement: Separating Fetch and Decode pc fetch rf decode i. Mem October 16, 2013 http: //csg. csail. mit. edu/6. s 195 L 13 -21

Fetch Module refinement module mk. Fetch(Imemory i. Mem, Fifo. Enq#(Decode 2 Execute) d 2

Fetch Module refinement module mk. Fetch(Imemory i. Mem, Fifo. Enq#(Decode 2 Execute) d 2 e, Fifo. Deq#(Addr) redirect, Register. File. Read rf, Scoreboard. Insert sb) Reg#(Addr) pc <- mk. Reg. U; Reg#(Bool) f. Epoch <- mk. Reg(False); Fifo#(Fetch 2 Decode) f 2 d <- mk. Pipeline. Fifo; rule fetch ; if(redirect. not. Empty) begin. . endrule decode ; . . endrule endmodule October 16, 2013 http: //csg. csail. mit. edu/6. s 195 L 13 -22

Fetch Module: Fetch rule fetch ; if(redirect. not. Empty) begin f. Epoch <= !f.

Fetch Module: Fetch rule fetch ; if(redirect. not. Empty) begin f. Epoch <= !f. Epoch; pc <= redirect. first; redirect. deq; end else begin let inst. F = i. Mem. req(pc); let ppc. F = next. Addr. Predictor(pc); f 2 d. enq(Fetch 2 Decode{pc: pc, ppc: ppc. F, inst: inst. F, epoch: f. Epoch); pc <= ppc. F endrule October 16, 2013 http: //csg. csail. mit. edu/6. s 195 L 13 -23

Fetch Module: Decode rule decode ; let x = f 2 d. first; let

Fetch Module: Decode rule decode ; let x = f 2 d. first; let inst. D = x. inst; let pc. D = x. pc; let ppc. D = x. ppc let in. Ep = x. epoch let d. Inst = decode(inst. D); let stall = sb. search 1(d. Inst. src 1)|| sb. search 2(d. Inst. src 2); || sb. search 3(d. Inst. dst); if(!stall) begin let r. Val 1 = rf. rd 1(valid. Reg. Value(d. Inst. src 1)); let r. Val 2 = rf. rd 2(valid. Reg. Value(d. Inst. src 2)); d 2 e. enq(Decode 2 Execute{pc: pc. D, ppc: ppc. D, d. Iinst: d. Inst, epoch: in. Ep; r. Val 1: r. Val 1, r. Val 2: r. Val 2}); sb. insert(d. Inst. dst); f 2 d. deq endrule October 16, 2013 http: //csg. csail. mit. edu/6. s 195 L 13 -24

Separate refinement Notice our refined Fetch&Decode module should work correctly with the old Execute

Separate refinement Notice our refined Fetch&Decode module should work correctly with the old Execute module or its refinements This is a very important aspect of modular refinements October 16, 2013 http: //csg. csail. mit. edu/6. s 195 L 13 -25

Modular refinement: Replace magic memory by multicycle memory pc fetch 1 fetch 2 i.

Modular refinement: Replace magic memory by multicycle memory pc fetch 1 fetch 2 i. Mem October 16, 2013 http: //csg. csail. mit. edu/6. s 195 L 13 -26

Memory and Caches Suppose i. Mem is replaced by a cache which takes 0

Memory and Caches Suppose i. Mem is replaced by a cache which takes 0 or 1 cycle in case of a hit and unknown number of variable cycles on a cache miss View i. Mem as a request/response system and split the fetch stage into two rules – to send a req and to receive a res October 16, 2013 Epoch PC http: //csg. csail. mit. edu/6. s 195 Next Addr Pred f 2 d Decode i. Mem L 13 -27

Splitting the fetch stage To split the fetch stage into two rules, insert a

Splitting the fetch stage To split the fetch stage into two rules, insert a bypass FIFO’s to deal with (0, n) cycle memory response Epoch PC Next Addr Pred f 2 d Decode f 12 f 2 i. Mem October 16, 2013 http: //csg. csail. mit. edu/6. s 195 L 13 -28

Fetch Module: 2 nd refinement Epoch PC Pred f 12 f 2 f 2

Fetch Module: 2 nd refinement Epoch PC Pred f 12 f 2 f 2 d i. Mem module mk. Fetch(Imemory i. Mem, Fifo. Enq#(Decode 2 Execute) d 2 e, Fifo. Deq#(Addr) redirect, Register. File. Read rf, Scoreboard. Insert sb) Reg#(Addr) pc <- mk. Reg. U; Reg#(Bool) f. Epoch <- mk. Reg(False); Fifo#(Fetch 2 Decode) f 2 d <- mk. Pipeline. Fifo; Fifo#(Fetch 2 Decode) f 12 f 2 <- mk. Bypass. Fifo; rule fetch 1 ; . . . . endrule decode ; . . endrule endmodule October 16, 2013 http: //csg. csail. mit. edu/6. s 195 L 13 -29

Fetch Module: Fetch 1 rule Epoch PC Pred f 12 f 2 f 2

Fetch Module: Fetch 1 rule Epoch PC Pred f 12 f 2 f 2 d i. Mem rule fetch 1 ; if(redirect. not. Empty) begin f. Epoch <= !f. Epoch; pc <= redirect. first; redirect. deq; end else begin let ppc. F = next. Addr. Predictor(pc); pc <= ppc; i. Cache. req(Mem. Req{op: Ld, addr: pc, data: ? }); f 12 f 2. enq(Fetch 2 Decoode{pc: pc, ppc: ppc. F, inst: ? , epoch: f. Epoch}); endrule October 16, 2013 http: //csg. csail. mit. edu/6. s 195 L 13 -30

Fetch Module: Fetch 2 rule PC Pred f 12 f 2 rule do. Fetch

Fetch Module: Fetch 2 rule PC Pred f 12 f 2 rule do. Fetch 2; let inst <- i. Cache. resp; let x = f 12 f 2. first; x. inst = inst; f 12 f 2. deq; f 2 d. enq(x); endrule October 16, 2013 Epoch http: //csg. csail. mit. edu/6. s 195 f 2 d i. Mem L 13 -31

Takeaway Multistage pipelines are straightforward extensions of 2 -stage pipelines Modular refinement is a

Takeaway Multistage pipelines are straightforward extensions of 2 -stage pipelines Modular refinement is a powerful idea; lets different teams work on different modules with only an early implementation of other modules BSV compiler currently does not permit separate compilation of modules with interface parameters Recursive call structure amongst modules is supported by the compiler in a limited way. n n October 16, 2013 The syntax is complicated Compiler detects and rejects truly cyclic method calls http: //csg. csail. mit. edu/6. s 195 L 13 -32