Constructive Computer Architecture Control Hazards Arvind Computer Science

  • Slides: 24
Download presentation
Constructive Computer Architecture: Control Hazards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute

Constructive Computer Architecture: Control Hazards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October 12, 2016 http: //csg. csail. mit. edu/6. 175 L 12 -1

Two-Cycle RISC-V: Fetch Analysis Execute Register File stage PC +4 Inst Memory fr Decode

Two-Cycle RISC-V: Fetch Analysis Execute Register File stage PC +4 Inst Memory fr Decode Execute In any given clock cycle, lot of unused hardware ! Data Memory Pipeline execution of instructions to increase throughput October 12, 2016 http: //csg. csail. mit. edu/6. 175 L 12 -2

Problems in Instruction pipelining Insti+1 PC +4 Inst Memory Insti f 2 d Register

Problems in Instruction pipelining Insti+1 PC +4 Inst Memory Insti f 2 d Register File Decode Execute Data Memory Control hazard: Insti+1 is not known until Insti is at least decoded. So which instruction should be fetched? Structural hazard: Two instructions in the pipeline may require the same resource at the same time, e. g. , contention for memory Data hazard: Insti may affect the state of the machine (pc, rf, d. Mem) – Insti+1 must be fully cognizant of this change none of these hazards were present in the FFT pipeline October 12, 2016 http: //csg. csail. mit. edu/6. 175 L 12 -3

Arithmetic versus Instruction pipelining The data items in an arithmetic pipeline, e. g. ,

Arithmetic versus Instruction pipelining The data items in an arithmetic pipeline, e. g. , FFT, are independent of each other f 0 f 1 f 2 x in. Q s. Reg 1 s. Reg 2 out. Q The entities in an instruction pipeline affect each other n n October 12, 2016 This causes pipeline stalls or requires other fancy tricks to avoid stalls Processor pipelines are significantly more complicated than arithmetic pipelines http: //csg. csail. mit. edu/6. 175 L 12 -4

The power of computers comes from the fact that the instructions in a program

The power of computers comes from the fact that the instructions in a program are not independent of each other must deal with hazard October 12, 2016 http: //csg. csail. mit. edu/6. 175 L 12 -5

Control Hazards Insti+1 +4 PC Inst Memory Insti f 2 d Register File Decode

Control Hazards Insti+1 +4 PC Inst Memory Insti f 2 d Register File Decode Execute Data Memory Insti+1 is not known until Insti is at least decoded. So which instruction should be fetched? General solution – speculate, i. e. , predict the next instruction address n n requires the next-instruction-address prediction machinery; can be as simple as pc+4 prediction machinery is usually elaborate because it dynamically learns from the past behavior of the program What if speculation goes wrong? n October 12, 2016 machinery to kill the wrong-path instructions, restore the correct processor state and restart the execution at the correct pc http: //csg. csail. mit. edu/6. 175 L 12 -6

Two-stage Pipelined SMIPS Fetch stage Decode-Register. Fetch-Execute-Memory. Write. Back stage Register File kill misprediction

Two-stage Pipelined SMIPS Fetch stage Decode-Register. Fetch-Execute-Memory. Write. Back stage Register File kill misprediction correct pc PC nap Decode f 2 d Inst Memory Data Memory Fetch stage must predict the next instruction to fetch to have any pipelining October 12, 2016 Execute In case of a misprediction the Execute stage must kill the mispredicted instruction in f 2 d http: //csg. csail. mit. edu/6. 175 L 12 -7

Elastic two-stage pipeline PC Fetch pc redirect f 2 d Execute <inst, pc, ppc>

Elastic two-stage pipeline PC Fetch pc redirect f 2 d Execute <inst, pc, ppc> We replace f 2 d register by a FIFO to make the machine more elastic, that is, Fetch keeps putting instructions into f 2 d and Execute keeps removing and executing instructions from f 2 d Fetch passes the pc and predicted pc in addition to the inst to Execute; Execute redirects the PC in case of a miss-prediction October 12, 2016 http: //csg. csail. mit. edu/6. 175 L 12 -8

An elastic Two-Stage pipeline pass the pc and predicted pc rule do. Fetch ;

An elastic Two-Stage pipeline pass the pc and predicted pc rule do. Fetch ; to the execute stage let inst = i. Mem. req(pc); let ppc = nap(pc); pc <= ppc; f 2 d. enq(Fetch 2 Decode{pc: pc, ppc: ppc, inst: inst}); endrule Can these rules execute concurrently assuming the FIFO allows concurrent enq, deq and clear? rule do. Execute ; let x = f 2 d. first; let inpc = x. pc; No let ppc = x. ppc; let inst = x. inst; double writes in pc let d. Inst = decode(inst); . . . register fetch. . . ; let e. Inst = exec(d. Inst, r. Val 1, r. Val 2, inpc, ppc); . . . memory operation. . . exec returns a flag to. . . rf update. . . indicate misprediction if (e. Inst. mispredict) begin pc <= e. Inst. addr; f 2 d. clear; end else f 2 d. deq; endrule October 12, 2016 http: //csg. csail. mit. edu/6. 175 L 12 -9

An elastic Two-Stage pipeline: for concurrency make pc into an EHR rule do. Fetch

An elastic Two-Stage pipeline: for concurrency make pc into an EHR rule do. Fetch ; let inst = i. Mem. req(pc[0]); let ppc = nap(pc[0]); pc[0] <= ppc; f 2 d. enq(Fetch 2 Decode{pc: pc[0], ppc: ppc, inst: inst}); endrule Should enq > clear or (enq < clear) ? rule do. Execute; let x = f 2 d. first; let inpc = x. pc; let ppc = x. ppc; let inst = x. inst; let d. Inst = decode(inst); . . . register fetch. . . ; let e. Inst = exec(d. Inst, r. Val 1, r. Val 2, inpc, ppc); . . . memory operation. . . rf update. . . if (e. Inst. mispredict) begin pc[1] <= e. Inst. addr; f 2 d. clear; end else f 2 d. deq; endrule October 12, 2016 http: //csg. csail. mit. edu/6. 175 L 12 -10

A correctness issue PC Fetch pc redirect f 2 d Execute <inst, pc, ppc>

A correctness issue PC Fetch pc redirect f 2 d Execute <inst, pc, ppc> Once Execute redirects the PC, n n no wrong path instruction should be executed the next instruction executed must be the redirected one (enq < clear) October 12, 2016 http: //csg. csail. mit. edu/6. 175 L 12 -11

Killing fetched instructions In the simple design with combinational memory we have discussed so

Killing fetched instructions In the simple design with combinational memory we have discussed so far, all the mispredicted instructions were present in f 2 d. So the Execute stage can atomically: Clear f 2 d Set pc to the correct target n n In highly pipelined machines there can be multiple mispredicted and partially executed instructions in the pipeline; it will generally take more than one cycle to kill all such instructions Need a more general solution then clearing the f 2 d FIFO October 12, 2016 http: //csg. csail. mit. edu/6. 175 L 12 -12

Epoch: a method to manage control hazards Add an epoch register in the processor

Epoch: a method to manage control hazards Add an epoch register in the processor state The Execute stage changes the epoch whenever the pc prediction is wrong and sets the pc to the correct value The Fetch stage associates the current epoch with every instruction when it is fetched Fetch Execute The epoch of the Epoch instruction is examined target. PC when it is ready to execute. If the processor nap inst f 2 d PC epoch has changed the instruction is thrown away i. Mem October 12, 2016 http: //csg. csail. mit. edu/6. 175 L 12 -13

An epoch based solution Can these rules execute concurrently ? rule do. Fetch ;

An epoch based solution Can these rules execute concurrently ? rule do. Fetch ; let inst. F=i. Mem. req(pc[0]); let ppc. F=nap(pc[0]); pc[0]<=ppc. F; yes f 2 d. enq(Fetch 2 Decode{pc: pc[0], ppc: ppc. F, epoch: epoch, inst: inst. F}); endrule two values for epoch are sufficient rule do. Execute; let x=f 2 d. first; let pc. D=x. pc; let in. Ep=x. epoch; let ppc. D = x. ppc; let inst. D = x. inst; if(in. Ep == epoch) begin let d. Inst = decode(inst. D); . . . register fetch. . . ; let e. Inst = exec(d. Inst, r. Val 1, r. Val 2, pc. D, ppc. D); . . . memory operation. . . rf update. . . if (e. Inst. mispredict) begin pc[1] <= e. Inst. addr; epoch <= next(epoch); end f 2 d. deq; endrule October 12, 2016 http: //csg. csail. mit. edu/6. 175 L 12 -14

Discussion Epoch based solution kills one wrong-path instruction at a time in the execute

Discussion Epoch based solution kills one wrong-path instruction at a time in the execute stage It may be slow, but it is more robust in more complex pipelines, if you have multiple stages between fetch and execute or if you have outstanding instruction requests to the i. Mem It requires the Execute stage to set the pc and epoch registers simultaneously which may result in a long combinational path from Execute to Fetch October 12, 2016 http: //csg. csail. mit. edu/6. 175 L 12 -15

Decoupled Fetch and Execute <corrected pc, new epoch> Fetch Execute <inst, pc, ppc, epoch>

Decoupled Fetch and Execute <corrected pc, new epoch> Fetch Execute <inst, pc, ppc, epoch> In decoupled systems a subsystem reads and modifies only local state atomically n In our solution, pc and epoch are read by both rules Properly decoupled systems permit greater freedom in independent refinement of subsystems October 12, 2016 http: //csg. csail. mit. edu/6. 175 L 12 -16

A decoupled solution using epochs Fetch f. Epoch e. Epoch Execute Add f. Epoch

A decoupled solution using epochs Fetch f. Epoch e. Epoch Execute Add f. Epoch and e. Epoch registers to the processor state; initialize them to the same value The epoch changes whenever Execute detects the pc prediction to be wrong. This change is reflected immediately in e. Epoch and eventually in f. Epoch via a message from Execute to Fetch Associate f. Epoch with every instruction when it is fetched In the execute stage, reject, i. e. , kill, the instruction if its epoch does not match e. Epoch October 12, 2016 http: //csg. csail. mit. edu/6. 175 L 12 -17

Control Hazard resolution PC redirect f. Epoch FIFO +4 f 2 d e. Epoch

Control Hazard resolution PC redirect f. Epoch FIFO +4 f 2 d e. Epoch A robust two-rule solution Register File Decode Execute FIFO Data Inst Execute sends information about Memory the target pc to Fetch, which updates f. Epoch and pc whenever it examines the redirect (PC) fifo October 12, 2016 http: //csg. csail. mit. edu/6. 175 L 12 -18

Two-stage pipeline Decoupled code structure module mk. Proc(Proc); Fifo#(Fetch 2 Execute) Fifo#(Addr) redirect Reg#(Bool)

Two-stage pipeline Decoupled code structure module mk. Proc(Proc); Fifo#(Fetch 2 Execute) Fifo#(Addr) redirect Reg#(Bool) f. Epoch <Reg#(Bool) e. Epoch <- f 2 d <- mk. Fifo; mk. Reg(False); rule do. Fetch; let inst. F = i. Mem. req(pc); . . . f 2 d. enq(. . . inst. F. . . , f. Epoch); endrule do. Execute; if(in. Ep == e. Epoch) begin Decode and execute the instruction; update state; In case of misprediction, redirect. enq(correct pc); end f 2 d. deq; endrule endmodule October 12, 2016 http: //csg. csail. mit. edu/6. 175 L 12 -19

The Fetch rule do. Fetch; let inst. F = i. Mem. req(pc); if(!redirect. not.

The Fetch rule do. Fetch; let inst. F = i. Mem. req(pc); if(!redirect. not. Empty) begin let ppc. F = nap(pc); pc <= ppc. F; f 2 d. enq(Fetch 2 Execute{pc: pc, ppc: ppc. F, inst: inst. F, epoch: f. Epoch}); end else begin f. Epoch <= !f. Epoch; pc <= redirect. first; redirect. deq; endrule Notice: In case of PC redirection, nothing is enqueued into f 2 d October 12, 2016 http: //csg. csail. mit. edu/6. 175 L 12 -20

The Execute rule do. Execute; let inst. D = f 2 d. first. inst;

The Execute rule do. Execute; let inst. D = f 2 d. first. inst; let pc. F = f 2 d. first. pc; let ppc. D = f 2 d. first. ppc; let in. Ep = f 2 d. first. epoch; if(in. Ep == e. Epoch) begin let d. Inst = decode(inst. D); let r. Val 1 = rf. rd 1(from. Maybe(? , d. Inst. src 1)); let r. Val 2 = rf. rd 2(from. Maybe(? , d. Inst. src 2)); let e. Inst = exec(d. Inst, r. Val 1, r. Val 2, pc. D, ppc. D); if(e. Inst. i. Type == Ld) e. Inst. data <d. Mem. req(Mem. Req{op: Ld, addr: e. Inst. addr, data: ? }); else if (e. Inst. i. Type == St) let d <d. Mem. req(Mem. Req{op: St, addr: e. Inst. addr, data: e. Inst. data}); if (is. Valid(e. Inst. dst)) rf. wr(from. Maybe(? , e. Inst. dst), e. Inst. data); if(e. Inst. mispredict) begin redirect. enq(e. Inst. addr); e. Epoch <= !in. Ep; end Can these rules execute concurrently? f 2 d. deq; yes, assuming CF FIFOs endrule October 12, 2016 http: //csg. csail. mit. edu/6. 175 L 12 -21

Epoch mechanism is independent of the branch prediction scheme used. We will study sophisticated

Epoch mechanism is independent of the branch prediction scheme used. We will study sophisticated branch prediction schemes later October 12, 2016 http: //csg. csail. mit. edu/6. 175 L 12 -22

Conflict-free FIFO with a Clear method To be discussed in the tutorial db da

Conflict-free FIFO with a Clear method To be discussed in the tutorial db da module mk. CFFifo(Fifo#(2, t)) provisos(Bits#(t, t. Sz)); Ehr#(3, t) da <- mk. Ehr(? ); If there is only one Ehr#(2, Bool) va <- mk. Ehr(False); element in the FIFO it Ehr#(2, t) db <- mk. Ehr(? ); Ehr#(3, Bool) vb <- mk. Ehr(False); resides in da rule canonicalize if(vb[2] && !va[2]); da[2] <= db[2]; va[2] <= True; vb[2] <= False; endrule method Action enq(t x) if(!vb[0]); db[0] <= x; vb[0] <= True; endmethod Action deq if (va[0]); first CF enq va[0] <= False; endmethod deq CF enq method t first if(va[0]); first < deq return da[0]; endmethod enq < clear method Action clear; va[1] <= False ; vb[1] <= False endmethod endmodule Canonicalize must be the last rule to fire! October 12, 2016 http: //csg. csail. mit. edu/6. 175 L 12 -23

Why canonicalize must be the last rule to fire rule foo ; f. deq;

Why canonicalize must be the last rule to fire rule foo ; f. deq; if (p) f. clear endrule Consider rule foo. If p is false then canonicalize must fire after deq for proper concurrency. If canonicalize uses EHR indices between deq and clear, then canonicalize won’t fire when p is false first deq first enq < October 12, 2016 http: //csg. csail. mit. edu/6. 175 CF enq < deq clear L 12 -24