Constructive Computer Architecture Data Hazards in Pipelined Processors

  • Slides: 24
Download presentation
Constructive Computer Architecture: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence

Constructive Computer Architecture: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October 14, 2015 http: //csg. csail. mit. edu/6. 175 L 13 -1

Consider a different twostage pipeline Fetch Decode, Register. Fetch pred Register File Insti+1 PC

Consider a different twostage pipeline Fetch Decode, Register. Fetch pred Register File Insti+1 PC Execute, Memory, Write. Back f 2 d Decode Inst Memory Execute Data Memory Suppose we move the pipeline stage from Fetch to after Decode and Register fetch for a better balance of work in two stages Pipeline will still have control hazards October 14, 2015 http: //csg. csail. mit. edu/6. 175 L 13 -2

A different 2 -Stage pipeline: 2 -Stage-DH pipeline Execute, Memory, Write. Back f. Epoch

A different 2 -Stage pipeline: 2 -Stage-DH pipeline Execute, Memory, Write. Back f. Epoch Fetch, Decode, Register. Fetch PC pred redirect Register File e. Epoch Execute Decode d 2 e Inst Memory October 14, 2015 Fifos Use the same epoch solution for control hazards as before http: //csg. csail. mit. edu/6. 175 Data Memory L 13 -3

Converting the old pipeline into the new one rule do. Fetch; . . .

Converting the old pipeline into the new one rule do. Fetch; . . . let inst. F = i. Mem. req(pc); f 2 d. enq(Fetch 2 Execute{. . . inst: inst. F. . . }); . . . endrule do. Execute; . . . let d. Inst = let r. Val 1 = let r. Val 2 = let e. Inst =. . . endrule inst. F decode(inst. D); rf. rd 1(from. Maybe(? , d. Inst. src 1)); rf. rd 2(from. Maybe(? , d. Inst. src 2)); exec(d. Inst, r. Val 1, r. Val 2, pc. D, ppc. D); Not quite correct. Why? Fetch is potentially reading stale values from rf October 14, 2015 http: //csg. csail. mit. edu/6. 175 L 13 -4

Data Hazards fetch & decode time t 0 FDstage EXstage execute pc rf d.

Data Hazards fetch & decode time t 0 FDstage EXstage execute pc rf d. Mem d 2 e t 1 t 2 t 3 t 4 t 5 t 6 t 7 FD 1 FD 2 FD 3 FD 4 FD 5 EX 1 EX 2 EX 3 EX 4 EX 5 . . I 1 R 1 R 2+R 3 I 2 R 4 R 1+R 2 I 2 must be stalled until I 1 updates the register file time t 0 FDstage EXstage October 14, 2015 t 1 t 2 t 3 t 4 t 5 t 6 t 7. . FD 1 FD 2 FD 3 FD 4 FD 5 EX 1 EX 2 EX 3 EX 4 EX 5 http: //csg. csail. mit. edu/6. 175 L 13 -5

Dealing with data hazards Keep track of instructions in the pipeline and determine if

Dealing with data hazards Keep track of instructions in the pipeline and determine if the register values to be fetched are stale, i. e. , will be modified by some older instruction still in the pipeline. This condition is referred to as a read-after-write (RAW) hazard Stall the Fetch from dispatching the instruction as long as RAW hazard prevails RAW hazard will disappear as the pipeline drains Scoreboard: A data structure to keep track of the instructions in the pipeline beyond the Fetch stage October 14, 2015 http: //csg. csail. mit. edu/6. 175 L 13 -6

Data Hazard Data hazard depends upon the match between the source registers of the

Data Hazard Data hazard depends upon the match between the source registers of the fetched instruction and the destination register of an instruction already in the pipeline Both the source and destination registers must be Valid for a hazard to exist function Bool is. Found (Maybe#(RIndex) x, Maybe#(RIndex) y); if(x matches Valid. xv &&& y matches Valid. yv &&& yv == xv) return True; else return False; endfunction October 14, 2015 http: //csg. csail. mit. edu/6. 175 L 13 -7

Scoreboard: Keeping track of instructions in execution Scoreboard: a data structure to keep track

Scoreboard: Keeping track of instructions in execution Scoreboard: a data structure to keep track of the destination registers of the instructions beyond the fetch stage n n October 14, 2015 method insert: inserts the destination (if any) of an instruction in the scoreboard when the instruction is decoded method search 1(src): searches the scoreboard for a data hazard method search 2(src): same as search 1 method remove: deletes the oldest entry when an instruction commits http: //csg. csail. mit. edu/6. 175 L 13 -8

f. Epoch 2 -Stage-DH pipeline: Scoreboard and Stall logic PC pred redirect Register File

f. Epoch 2 -Stage-DH pipeline: Scoreboard and Stall logic PC pred redirect Register File e. Epoch Execute Decode d 2 e Inst Memory October 14, 2015 scoreboard http: //csg. csail. mit. edu/6. 175 Data Memory L 13 -9

2 -Stage-DH pipeline module mk. Proc(Proc); Reg#(Addr) pc <- mk. Reg. U; RFile rf

2 -Stage-DH pipeline module mk. Proc(Proc); Reg#(Addr) pc <- mk. Reg. U; RFile rf <- mk. RFile; IMemory i. Mem <- mk. IMemory; DMemory d. Mem <- mk. DMemory; Fifo#(Decode 2 Execute) d 2 e <- mk. Fifo; Reg#(Bool) f. Epoch <- mk. Reg(False); Reg#(Bool) e. Epoch <- mk. Reg(False); Fifo#(Addr) redirect <- mk. Fifo; Scoreboard#(1) sb <- mk. Scoreboard; // contains only one slot because Execute // can contain at most one instruction rule do. Fetch … rule do. Execute … October 14, 2015 http: //csg. csail. mit. edu/6. 175 L 13 -10

2 -Stage-DH pipeline do. Fetch rule do. Fetch; if(redirect. not. Empty) begin f. Epoch

2 -Stage-DH pipeline do. Fetch rule do. Fetch; if(redirect. not. Empty) begin f. Epoch <= !f. Epoch; pc <= redirect. first; redirect. deq; end else What should happen to pc when Fetch stalls? begin let inst. F = i. Mem. req(pc); let ppc. F = next. Addr. Predictor(pc); pc <= ppc. F; let d. Inst = decode(inst. F); let stall = sb. search 1(d. Inst. src 1)|| sb. search 2(d. Inst. src 2); if(!stall) begin let r. Val 1 = rf. rd 1(from. Maybe(? , d. Inst. src 1)); let r. Val 2 = rf. rd 2(from. Maybe(? , d. Inst. src 2)); d 2 e. enq(Decode 2 Execute{pc: pc, ppc: ppc. F, d. Iinst: d. Inst, epoch: f. Epoch, pc should change only r. Val 1: r. Val 1, r. Val 2: r. Val 2}); when the instruction sb. insert(d. Inst. r. Dst); end is enqueued in d 2 e endrule October 14, 2015 http: //csg. csail. mit. edu/6. 175 L 13 -11

2 -Stage-DH pipeline do. Fetch rule corrected rule do. Fetch; if(redirect. not. Empty) begin

2 -Stage-DH pipeline do. Fetch rule corrected rule do. Fetch; if(redirect. not. Empty) begin f. Epoch <= !f. Epoch; pc <= redirect. first; redirect. deq; end To avoid structural else hazards, scoreboard must begin allow two search ports let inst. F = i. Mem. req(pc); let ppc. F = next. Addr. Predictor(pc); pc <= ppc. F; let d. Inst = decode(inst. F); let stall = sb. search 1(d. Inst. src 1)|| sb. search 2(d. Inst. src 2); if(!stall) begin let r. Val 1 = rf. rd 1(from. Maybe(? , d. Inst. src 1)); let r. Val 2 = rf. rd 2(from. Maybe(? , d. Inst. src 2)); d 2 e. enq(Decode 2 Execute{pc: pc, ppc: ppc. F, d. Iinst: d. Inst, epoch: f. Epoch, r. Val 1: r. Val 1, r. Val 2: r. Val 2}); sb. insert(d. Inst. r. Dst); pc end<= ppc. F; end endrule October 14, 2015 http: //csg. csail. mit. edu/6. 175 L 13 -12

2 -Stage-DH pipeline do. Execute rule do. Execute; let x = d 2 e.

2 -Stage-DH pipeline do. Execute rule do. Execute; let x = d 2 e. first; let d. Inst. E = x. d. Inst; let pc. E = x. pc; let ppc. E = x. ppc; let epoch = x. epoch; let r. Val 1 E = x. r. Val 1; let r. Val 2 E = x. r. Val 2; if(epoch == e. Epoch) begin let e. Inst = exec(d. Inst. E, r. Val 1 E, r. Val 2 E, pc. E, ppc. E); if(e. Inst. i. Type == Ld) e. Inst. data <d. Mem. req(Mem. Req{op: Ld, addr: e. Inst. addr, data: ? }); else if (e. Inst. i. Type == St) let d <d. Mem. req(Mem. Req{op: St, addr: e. Inst. addr, data: e. Inst. data}); if (is. Valid(e. Inst. dst)) rf. wr(from. Maybe(? , e. Inst. dst), e. Inst. data); if(e. Inst. mispredict) begin redirect. enq(e. Inst. addr); e. Epoch <= !e. Epoch; end d 2 e. deq; sb. remove; endrule October 14, 2015 http: //csg. csail. mit. edu/6. 175 L 13 -13

A correctness issues Register File rd 1 rd 2 do. Fetch search wr redirect

A correctness issues Register File rd 1 rd 2 do. Fetch search wr redirect do. Execute insert d 2 e remove Scoreboard If the search by Decode does not see an instruction in the scoreboard, then its effect must have taken place. This means that any updates to the register file by that instruction must be visible to the subsequent register reads n n remove and wr should happen atomically search and rd 1, rd 2 should happen atomically Fetch and Execute can execute in any order October 14, 2015 http: //csg. csail. mit. edu/6. 175 L 13 -14

Concurrently executable Fetch and Execute Register File rd 1 rd 2 do. Fetch search

Concurrently executable Fetch and Execute Register File rd 1 rd 2 do. Fetch search wr redirect do. Execute insert d 2 e which is better? remove Scoreboard Case 1: do. Execute < dofetch n rf: wr < rd (bypass rf) n sb: remove < {search, insert} n d 2 e: {first, deq} {<, CF} enq (pipelined or CF Fifo) n redirect: enq {<, CF} {deq, first} (bypass or CF Fifo) Case 2: do. Fetch < do. Execute n rf: rd < wr (normal rf) n sb: {search, insert} < remove n d 2 e: enq {<, CF} {deq, first} (bypass or CF Fifo) n redirect: {first, deq} {<, CF} enq (pipelined or CF Fifo) October 14, 2015 http: //csg. csail. mit. edu/6. 175 L 13 -15

Performance issues Register File rd 1 rd 2 do. Fetch search wr redirect do.

Performance issues Register File rd 1 rd 2 do. Fetch search wr redirect do. Execute insert d 2 e remove Scoreboard To avoid a stall due to a RAW hazard between successive instructions n sb: remove < ? search n rf: wr ? (bypass rf) < rd To minimize stalls due to control hazards n redirect: bypass ? fifo What kind of fifo should be used for d 2 e ? n Either a pipeline or CF fifo would do fine October 14, 2015 http: //csg. csail. mit. edu/6. 175 L 13 -16

2 -Stage-DH pipeline with proper specification of Fifos, rf, scoreboard module mk. Proc(Proc); Reg#(Addr)

2 -Stage-DH pipeline with proper specification of Fifos, rf, scoreboard module mk. Proc(Proc); Reg#(Addr) pc <- mk. Reg. U; RFile rf <- mk. Bypass. RFile; IMemory i. Mem <- mk. IMemory; DMemory d. Mem <- mk. DMemory; Fifo#(Decode 2 Execute) d 2 e <- mk. Pipeline. Fifo; Reg#(Bool) f. Epoch <- mk. Reg(False); Reg#(Bool) e. Epoch <- mk. Reg(False); Fifo#(Addr) redirect <- mk. Bypass. Fifo; Scoreboard#(1) sb <- mk. Pipeline. Scoreboard; // contains only one slot because Execute // can contain at most one instruction rule do. Fetch … rule do. Execute … October 14, 2015 Can a destination register name appear more than once in the scoreboard ? http: //csg. csail. mit. edu/6. 175 L 13 -17

WAW hazards If multiple instructions in the scoreboard can update the register which the

WAW hazards If multiple instructions in the scoreboard can update the register which the current instruction wants to read, then the current instruction has to read the update for the youngest of those instructions This is not a problem in our design because n n October 14, 2015 instructions are committed in order the RAW hazard for the instruction at the decode stage will remain as long as the any instruction with the required destination is present in sb http: //csg. csail. mit. edu/6. 175 L 13 -18

An alternative design for sb Instead of keeping track of the destination of every

An alternative design for sb Instead of keeping track of the destination of every instruction in the pipeline, we can associated a counter with every register to indicate the number of instructions in the pipeline for which this register is the destination n The appropriate counter is incremented when an instruction enters the execute stage and decremented when the instruction is committed This design is more efficient (less hardware) because it avoids an associative search October 14, 2015 http: //csg. csail. mit. edu/6. 175 L 13 -19

Summary Instruction pipelining requires dealing with control and data hazards Speculation is necessary to

Summary Instruction pipelining requires dealing with control and data hazards Speculation is necessary to deal with control hazards Data hazards are avoided by withholding instructions in the decode stage until the hazard disappears Performance issues are subtle n n October 14, 2015 For instance, the value of having a bypass network depends on how frequently it is exercised by programs Bypassing necessarily increases combinational path lenths which can slow down the clock The rest of the slides will be discussed in the Recitation http: //csg. csail. mit. edu/6. 175 L 13 -20

Normal Register File module mk. RFile(RFile); Vector#(32, Reg#(Data)) rfile <- replicate. M(mk. Reg(0)); method

Normal Register File module mk. RFile(RFile); Vector#(32, Reg#(Data)) rfile <- replicate. M(mk. Reg(0)); method Action wr(RIndx rindx, Data data); if(rindx!=0) rfile[rindx] <= data; endmethod Data rd 1(RIndx rindx) = rfile[rindx]; method Data rd 2(RIndx rindx) = rfile[rindx]; endmodule {rd 1, rd 2} < wr October 14, 2015 http: //csg. csail. mit. edu/6. 175 L 13 -21

Bypass Register File using EHR module mk. Bypass. RFile(RFile); Vector#(32, Ehr#(2, Data)) rfile <replicate.

Bypass Register File using EHR module mk. Bypass. RFile(RFile); Vector#(32, Ehr#(2, Data)) rfile <replicate. M(mk. Ehr(0)); method Action wr(RIndx rindx, Data data); if(rindex!=0) (rfile[rindex])[0] <= data; endmethod Data rd 1(RIndx rindx) = (rfile[rindx])[1]; method Data rd 2(RIndx rindx) = (rfile[rindx])[1]; endmodule wr < {rd 1, rd 2} October 14, 2015 http: //csg. csail. mit. edu/6. 175 L 13 -22

Bypass Register File with external bypassing rd rf module mk. Bypass. RFile(Bypass. RFile); move

Bypass Register File with external bypassing rd rf module mk. Bypass. RFile(Bypass. RFile); move RFile rf <- mk. RFile; Fifo#(1, Tuple 2#(RIndx, Data)) bypass <- mk. Bypass. SFifo; rule move; begin rf. wr(bypass. first); bypass. deq end; endrule method Action wr(RIndx rindx, Data data); if(rindex!=0) bypass. enq(tuple 2(rindx, data)); endmethod Data rd 1(RIndx rindx) = return (!bypass. search 1(rindx)) ? rf. rd 1(rindx) : bypass. read 1(rindx); method Data rd 2(RIndx rindx) = return (!bypass. search 2(rindx)) ? rf. rd 2(rindx) : bypass. read 2(rindx); wr < {rd 1, rd 2} endmodule October 14, 2015 http: //csg. csail. mit. edu/6. 175 L 13 -23

Scoreboard implementation using searchable Fifos function Bool is. Found (Maybe#(RIndx) dst, Maybe#(RIndx) src); return

Scoreboard implementation using searchable Fifos function Bool is. Found (Maybe#(RIndx) dst, Maybe#(RIndx) src); return is. Valid(dst) && is. Valid(src) && (from. Maybe(? , dst)==from. Maybe(? , src)); endfunction module mk. CFScoreboard(Scoreboard#(size)); SFifo#(size, Maybe#(RIndx)) f <- mk. CFSFifo(is. Found); method insert = f. enq; method remove = f. deq; method search 1 = f. search 1; method search 2 = f. search 2; endmodule October 14, 2015 http: //csg. csail. mit. edu/6. 175 L 13 -24