Constructive Computer Architecture Data Hazards in Pipelined Processors




























- Slides: 28
Constructive Computer Architecture: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October 11, 2013 http: //csg. csail. mit. edu/6. S 195 L 12 -1
Contributors to the course material Arvind, Rishiyur S. Nikhil, Joel Emer, Muralidaran Vijayaraghavan Staff and students in 6. 375 (Spring 2013), 6. S 195 (Fall 2012), 6. S 078 (Spring 2012) n Asif Khan, Richard Ruhler, Sang Woo Jun, Abhinav Agarwal, Myron King, Kermin Fleming, Ming Liu, Li. Shiuan Peh External n n October 11, 2013 Prof Amey Karkare & students at IIT Kanpur Jihong Kim & students at Seoul Nation University Derek Chiou, University of Texas at Austin Yoav Etsion & students at Technion http: //csg. csail. mit. edu/6. S 195 L 12 -2
A different 2 -Stage pipeline: 2 -Stage-DH pipeline Execute, Memory, Write. Back f. Epoch Fetch, Decode, Register. Fetch PC pred redirect Register File e. Epoch Execute Decode d 2 e Inst Memory October 11, 2013 Fifos Use the same epoch solution for control hazards as before http: //csg. csail. mit. edu/6. S 195 Data Memory L 12 -3
Type Decode 2 Execute The Fetch stage, in addition to fetching the instruction, also decodes the instruction and fetches the operands from the register file. It passes these operands to the Execute stage typedef struct { Addr pc; Addr ppc; Bool epoch; Decoded. Inst; Data r. Val 1; Data r. Val 2; } Decode 2 Execute deriving (Bits, Eq); values instead of register names October 11, 2013 http: //csg. csail. mit. edu/6. S 195 L 12 -4
2 -Stage-DH pipeline module mk. Proc(Proc); Reg#(Addr) pc <- mk. Reg. U; RFile rf <- mk. RFile; IMemory i. Mem <- mk. IMemory; DMemory d. Mem <- mk. DMemory; Fifo#(Decode 2 Execute) d 2 e <- mk. Fifo; Reg#(Bool) f. Epoch <- mk. Reg(False); Reg#(Bool) e. Epoch <- mk. Reg(False); Fifo#(Addr) exec. Redirect <- mk. Fifo; rule do. Fetch … rule do. Execute … October 11, 2013 http: //csg. csail. mit. edu/6. S 195 L 12 -5
2 -Stage-DH pipeline do. Fetch rule first attempt rule do. Fetch; let inst. F = i. Mem. req(pc); if(exec. Redirect. not. Empty) begin f. Epoch <= !f. Epoch; pc <= exec. Redirect. first; exec. Redirect. deq; end else begin let ppc. F = next. Addr. Predictor(pc); pc <= ppc. F; moved let d. Inst = decode(inst. F); let r. Val 1 = rf. rd 1(valid. Reg. Value(d. Inst. src 1)); from let r. Val 2 = rf. rd 2(valid. Reg. Value(d. Inst. src 2)); Execute d 2 e. enq(Decode 2 Execute{pc: pc, ppc: ppc. F, d. Iinst: d. Inst, epoch: f. Epoch, r. Val 1: r. Val 1, r. Val 2: r. Val 2}); endrule October 11, 2013 http: //csg. csail. mit. edu/6. S 195 L 12 -6
2 -Stage-DH pipeline do. Execute rule first attempt Not quite correct. Why? rule do. Execute; let x = d 2 e. first; Fetch is potentially let d. Inst. E = x. d. Inst; let pc. E = x. pc; reading stale values let ppc. E = x. ppc; let epoch = x. epoch; from rf let r. Val 1 E = x. r. Val 1; let r. Val 2 E = x. r. Val 2; if(epoch == e. Epoch) begin let e. Inst = exec(d. Inst. E, r. Val 1 E, r. Val 2 E, pc. E, ppc. E); if(e. Inst. i. Type == Ld) e. Inst. data <- d. Mem. req(Mem. Req{op: Ld, addr: e. Inst. addr, data: ? }); else if (e. Inst. i. Type == St) let d <- no d. Mem. req(Mem. Req{op: St, addr: e. Inst. addr, data: e. Inst. data}); if (is. Valid(e. Inst. dst) && change valid. Value(e. Inst. dst). reg. Type == Normal) rf. wr(valid. Reg. Value(e. Inst. dst), e. Inst. data); if(e. Inst. mispredict) begin exec. Redirect. enq(e. Inst. addr); e. Epoch <= !e. Epoch; end end d 2 e. deq; endrule October 11, 2013 http: //csg. csail. mit. edu/6. S 195 L 12 -7
Data Hazards fetch & decode time t 0 FDstage EXstage execute pc rf d. Mem d 2 e t 1 t 2 t 3 t 4 t 5 t 6 t 7 FD 1 FD 2 FD 3 FD 4 FD 5 EX 1 EX 2 EX 3 EX 4 EX 5 . . I 1 Add(R 1, R 2, R 3) I 2 Add(R 4, R 1, R 2) I 2 must be stalled until I 1 updates the register file time t 0 FDstage EXstage October 11, 2013 t 1 t 2 t 3 t 4 t 5 t 6 t 7. . FD 1 FD 2 FD 3 FD 4 FD 5 EX 1 EX 2 EX 3 EX 4 EX 5 http: //csg. csail. mit. edu/6. S 195 L 12 -8
Dealing with data hazards Keep track of instructions in the pipeline and determine if the register values to be fetched are stale, i. e. , will be modified by some older instruction still in the pipeline. This condition is referred to as a read-after-write (RAW) hazard Stall the Fetch from dispatching the instruction as long as RAW hazard prevails RAW hazard will disappear as the pipeline drains Scoreboard: A data structure to keep track of the instructions in the pipeline beyond the Fetch stage October 11, 2013 http: //csg. csail. mit. edu/6. S 195 L 12 -9
Data Hazard Data hazard depends upon the match between the source registers of the fetched instruction and the destination register of an instruction already in the pipeline Both the source and destination registers must be Valid for a hazard to exist function Bool is. Found (Maybe#(Full. Indx) x, Maybe#(Full. Indx) y); if(x matches Valid. xv &&& y matches Valid. yv &&& yv == xv) return True; else return False; endfunction October 11, 2013 http: //csg. csail. mit. edu/6. S 195 L 12 -10
Scoreboard: Keeping track of instructions in execution Scoreboard: a data structure to keep track of the destination registers of the instructions beyond the fetch stage n n October 11, 2013 method insert: inserts the destination (if any) of an instruction in the scoreboard when the instruction is decoded method search 1(src): searches the scoreboard for a data hazard method search 2(src): same as search 1 method remove: deletes the oldest entry when an instruction commits http: //csg. csail. mit. edu/6. S 195 L 12 -11
f. Epoch 2 -Stage-DH pipeline: Scoreboard and Stall logic PC pred redirect Register File e. Epoch Execute Decode d 2 e Inst Memory October 11, 2013 scoreboard http: //csg. csail. mit. edu/6. S 195 Data Memory L 12 -12
2 -Stage-DH pipeline corrected module mk. Proc(Proc); Reg#(Addr) pc <- mk. Reg. U; RFile rf <- mk. RFile; IMemory i. Mem <- mk. IMemory; DMemory d. Mem <- mk. DMemory; Fifo#(Decode 2 Execute) d 2 e <- mk. Fifo; Reg#(Bool) f. Epoch <- mk. Reg(False); Reg#(Bool) e. Epoch <- mk. Reg(False); Fifo#(Addr) exec. Redirect <- mk. Fifo; Scoreboard#(1) sb <- mk. Scoreboard; // contains only one slot because Execute // can contain at most one instruction rule do. Fetch … rule do. Execute … October 11, 2013 http: //csg. csail. mit. edu/6. S 195 L 12 -13
2 -Stage-DH pipeline do. Fetch rule second attempt rule do. Fetch; if(exec. Redirect. not. Empty) begin f. Epoch <= !f. Epoch; pc <= exec. Redirect. first; exec. Redirect. deq; end else What should happen to pc when Fetch stalls? begin let inst. F = i. Mem. req(pc); let ppc. F = next. Addr. Predictor(pc); pc <= ppc. F; let d. Inst = decode(inst. F); let stall = sb. search 1(d. Inst. src 1)|| sb. search 2(d. Inst. src 2); if(!stall) begin let r. Val 1 = rf. rd 1(valid. Reg. Value(d. Inst. src 1)); let r. Val 2 = rf. rd 2(valid. Reg. Value(d. Inst. src 2)); d 2 e. enq(Decode 2 Execute{pc: pc, ppc: ppc. F, d. Iinst: d. Inst, epoch: f. Epoch, pc should change only r. Val 1: r. Val 1, r. Val 2: r. Val 2}); when the instruction sb. insert(d. Inst. r. Dst); end is enqueued in d 2 e endrule October 11, 2013 http: //csg. csail. mit. edu/6. S 195 L 12 -14
2 -Stage-DH pipeline do. Fetch rule corrected rule do. Fetch; if(exec. Redirect. not. Empty) begin f. Epoch <= !f. Epoch; pc <= exec. Redirect. first; exec. Redirect. deq; end To avoid structural else hazards, scoreboard must begin allow two search ports let inst. F = i. Mem. req(pc); let ppc. F = next. Addr. Predictor(pc); pc <= ppc. F; let d. Inst = decode(inst. F); let stall = sb. search 1(d. Inst. src 1)|| sb. search 2(d. Inst. src 2); if(!stall) begin let r. Val 1 = rf. rd 1(valid. Reg. Value(d. Inst. src 1)); let r. Val 2 = rf. rd 2(valid. Reg. Value(d. Inst. src 2)); d 2 e. enq(Decode 2 Execute{pc: pc, ppc: ppc. F, d. Iinst: d. Inst, epoch: f. Epoch, r. Val 1: r. Val 1, r. Val 2: r. Val 2}); pc <= ppc. F; end sb. insert(d. Inst. r. Dst); end endrule October 11, 2013 http: //csg. csail. mit. edu/6. S 195 L 12 -15
2 -Stage-DH pipeline do. Execute rule corrected rule do. Execute; let x = d 2 e. first; let d. Inst. E = x. d. Inst; let pc. E = x. pc; let ppc. E = x. ppc; let epoch = x. epoch; let r. Val 1 E = x. r. Val 1; let r. Val 2 E = x. r. Val 2; if(epoch == e. Epoch) begin let e. Inst = exec(d. Inst. E, r. Val 1 E, r. Val 2 E, pc. E, ppc. E); if(e. Inst. i. Type == Ld) e. Inst. data < d. Mem. req(Mem. Req{op: Ld, addr: e. Inst. addr, data: ? }); else if (e. Inst. i. Type == St) let d <- d. Mem. req(Mem. Req{op: St, addr: e. Inst. addr, data: e. Inst. data}); if (is. Valid(e. Inst. dst)) rf. wr(valid. Reg. Value(e. Inst. dst), e. Inst. data); if(e. Inst. mispredict) begin exec. Redirect. enq(e. Inst. addr); e. Epoch <= !e. Epoch; end end d 2 e. deq; sb. remove; endrule October 11, 2013 http: //csg. csail. mit. edu/6. S 195 L 12 -16
A correctness issues Register File rd 1 rd 2 redirect do. Fetch search wr do. Execute insert d 2 e remove Scoreboard If the search by Decode does not see an instruction in the scoreboard, then its effect must have taken place. This means that any updates to the register file by that instruction must be visible to the subsequent register reads n n remove and wr should happen atomically search and rd 1, rd 2 should happen atomically Fetch and Execute can execute in any order October 11, 2013 http: //csg. csail. mit. edu/6. S 195 L 12 -17
Concurrently executable Fetch and Execute Register File rd 1 rd 2 redirect do. Fetch search wr do. Execute insert d 2 e which is better? remove Scoreboard Case 1: do. Execute < dofetch n rf: wr < rd (bypass rf) n sb: remove < {search, insert} n d 2 e: {first, deq} {<, CF} enq (pipelined or CF Fifo) n redirect: enq {<, CF} {deq, first} (bypass or CF Fifo) Case 2: do. Fetch < do. Execute n rf: rd < wr (normal rf) n sb: {search, insert} < remove n d 2 e: enq {<, CF} {deq, first} (bypass or CF Fifo) n redirect: {first, deq} {<, CF} enq (pipelined or CF Fifo) October 11, 2013 http: //csg. csail. mit. edu/6. S 195 L 12 -18
Performance issues Register File rd 1 rd 2 redirect do. Fetch search wr do. Execute insert d 2 e remove Scoreboard To avoid a stall due to a RAW hazard between successive instructions n sb: remove < ? search n rf: wr ? (bypass rf) < rd To minimize stalls due to control hazards n redirect: bypass ? fifo What kind of fifo should be used for d 2 e ? n Either a pipeline or CF fifo would do fine October 11, 2013 http: //csg. csail. mit. edu/6. S 195 L 12 -19
2 -Stage-DH pipeline with proper specification of Fifos, rf, scoreboard module mk. Proc(Proc); Reg#(Addr) pc <- mk. Reg. U; RFile rf <- mk. Bypass. RFile; IMemory i. Mem <- mk. IMemory; DMemory d. Mem <- mk. DMemory; Fifo#(Decode 2 Execute) d 2 e <- mk. Pipeline. Fifo; Reg#(Bool) f. Epoch <- mk. Reg(False); Reg#(Bool) e. Epoch <- mk. Reg(False); Fifo#(Addr) exec. Redirect <- mk. Bypass. Fifo; Scoreboard#(1) sb <- mk. Pipeline. Scoreboard; // contains only one slot because Execute // can contain at most one instruction rule do. Fetch … rule do. Execute … October 11, 2013 Can a destination register name appear more than once in the scoreboard ? http: //csg. csail. mit. edu/6. S 195 L 12 -20
WAW hazards If multiple instructions in the scoreboard can update the register which the current instruction wants to read, then the current instruction has to read the update for the youngest of those instructions This is not a problem in our design because n n October 11, 2013 instructions are committed in order the RAW hazard for the instruction at the decode stage will remain as long as the any instruction with the required destination is present in sb http: //csg. csail. mit. edu/6. S 195 L 12 -21
An alternative design for sb Instead of keeping track of the destination of every instruction in the pipeline, we can associated a bit with every register to indicate if that register is the destination of some instruction in the pipeline n Appropriate register bit is set when an instruction enters the execute stage and cleared when the instruction is committed The design will not work if multiple instructions in the pipeline have the same destination n October 11, 2013 don’t let an instruction with WAW hazard enter the pipeline http: //csg. csail. mit. edu/6. S 195 L 12 -22
Fetch rule to avoid WAW hazard rule do. Fetch; if(exec. Redirect. not. Empty) begin f. Epoch <= !f. Epoch; pc <= exec. Redirect. first; exec. Redirect. deq; end else begin let inst. F = i. Mem. req(pc); let ppc. F = next. Addr. Predictor(pc); let d. Inst = decode(inst. F); let stall = sb. search 1(d. Inst. src 1)|| sb. search 2(d. Inst. src 2); || sb. search 3(d. Inst. dst); if(!stall) begin let r. Val 1 = rf. rd 1(valid. Reg. Value(d. Inst. src 1)); let r. Val 2 = rf. rd 2(valid. Reg. Value(d. Inst. src 2)); d 2 e. enq(Decode 2 Execute{pc: pc, ppc: ppc. F, d. Iinst: d. Inst, epoch: f. Epoch, r. Val 1: r. Val 1, r. Val 2: r. Val 2}); sb. insert(d. Inst. r. Dst); pc <= ppc. F; end endrule October 11, 2013 http: //csg. csail. mit. edu/6. S 195 L 12 -23
Summary Instruction pipelining requires dealing with control and data hazards Speculation is necessary to deal with control hazards Data hazards are avoided by withholding instructions in the decode stage until the hazard disappears Performance issues are subtle n n For instance, the value of having a bypass network depends on how frequently it is exercised by programs Bypassing necessarily increases combinational paths which can slow down the clock next – module implementations and multistage pipelines October 11, 2013 http: //csg. csail. mit. edu/6. S 195 L 12 -24
Time permitting. . . Normal Register File module mk. RFile(RFile); Vector#(32, Reg#(Data)) rfile <- replicate. M(mk. Reg(0)); method Action wr(RIndx rindx, Data data); if(rindx!=0) rfile[rindx] <= data; endmethod Data rd 1(RIndx rindx) = rfile[rindx]; method Data rd 2(RIndx rindx) = rfile[rindx]; endmodule {rd 1, rd 2} < wr October 11, 2013 http: //csg. csail. mit. edu/6. S 195 L 12 -25
Bypass Register File using EHR module mk. Bypass. RFile(RFile); Vector#(32, Ehr#(2, Data)) rfile < replicate. M(mk. Ehr(0)); method Action wr(RIndx rindx, Data data); if(rindex!=0) (rfile[rindex])[0] <= data; endmethod Data rd 1(RIndx rindx) = (rfile[rindx])[1]; method Data rd 2(RIndx rindx) = (rfile[rindx])[1]; endmodule wr < {rd 1, rd 2} October 11, 2013 http: //csg. csail. mit. edu/6. S 195 L 12 -26
Bypass Register File with external bypassing rd rf module mk. Bypass. RFile(Bypass. RFile); move RFile rf <- mk. RFile; Fifo#(1, Tuple 2#(RIndx, Data)) bypass <- mk. Bypass. SFifo; rule move; begin rf. wr(bypass. first); bypass. deq end; endrule method Action wr(RIndx rindx, Data data); if(rindex!=0) bypass. enq(tuple 2(rindx, data)); endmethod Data rd 1(RIndx rindx) = return (!bypass. search 1(rindx)) ? rf. rd 1(rindx) : bypass. read 1(rindx); method Data rd 2(RIndx rindx) = return (!bypass. search 2(rindx)) ? rf. rd 2(rindx) : bypass. read 2(rindx); wr < {rd 1, rd 2} endmodule October 11, 2013 http: //csg. csail. mit. edu/6. S 195 L 12 -27
Scoreboard implementation using searchable Fifos function Bool is. Found (Maybe#(RIndx) dst, Maybe#(RIndx) src); return is. Valid(dst) && is. Valid(src) && (valid. Value(dst)==valid. Value(src)); endfunction module mk. CFScoreboard(Scoreboard#(size)); SFifo#(size, Maybe#(RIndx)) f <- mk. CFSFifo(is. Found); method insert = f. enq; method remove = f. deq; method search 1 = f. search 1; method search 2 = f. search 2; endmodule October 11, 2013 http: //csg. csail. mit. edu/6. S 195 L 12 -28