COMP 206 Computer Architecture and Implementation Montek Singh

Control Hazard ã A peculiar kind of RAW hazard involving the program counter l

Control Hazard: Example Br-1 Br Br+1 Br+2 Br+3 … T Unpipelined implementation Pipelined with

More on Control Hazards ã Branch delay: the length of the control hazard ã

Reducing Branch Delays Example: sub beq add. . . go: lw $10, $4, $8

Dealing with Branch Delays ã Four strategies l Stall l Predict Taken, variation A

Stall Strategy: 12 -Stage Pipeline ã Pipeline stalls on all branches ã Instructions 1

PNT Strategy: 12 -Stage Pipeline ã Pipeline continues execution assuming that the branch will

PTA Strategy: 12 -Stage Pipeline ã Pipeline predicts all branches to be taken and

PTB Strategy: 12 -Stage Pipeline ã Pipeline predicts all instructions to be taken and

Effect of Control Hazards on Pipelines Assume that 20% of all instructions are transfers

Solution for 12 -Stage Pipeline ã Stall: 0. 25*3+0. 75*(T*5+(1 -T)*4) = 3. 75+0.

Delayed Branches on MIPS ã One branch delay slot on MIPS ã Always execute

Details of Various Branch Flavors A B C D true E F G H

Instruction Sequence Alteration Strategies ã To allow for more aggressive filling of branch delay

Example: Branch Penalties Consider a DLX pipeline with a single branch delay slot in

Slides: 17

Download presentation

COMP 206: Computer Architecture and Implementation Montek Singh Mon. , Sep 22, 2004 Topic: Pipelining -- Intermediate Concepts (Control Hazards) 1

Control Hazard ã A peculiar kind of RAW hazard involving the program counter l PC written by branch instruction l PC read by instruction fetch unit (not another instruction) ã Possible misbehavior is that instructions fetched and executed after the branch instruction are not the ones specified by the branch instruction 2

Control Hazard: Example Br-1 Br Br+1 Br+2 Br+3 … T Unpipelined implementation Pipelined with PNT strategy 3

More on Control Hazards ã Branch delay: the length of the control hazard ã What determines branch delay? l We need to know that we have a branch instruction l We need to have the BTA l We need to know the branch outcome l So, we have to wait until we know all of these quantities ã An older pipeline (DLX, HP 2): l …computes BTA in EX l …computes branch outcome in EX l …changes PC in MEM ã To reduce branch delay, these steps are moved to earlier pipeline stages in MIPS (HP 3): l Can’t move up beyond ID (need to know it’s a branch instruction) 4

Reducing Branch Delays Example: sub beq add. . . go: lw $10, $4, $8 $10, $3, go $12, $5 $4, 16($12) IF/ID ID/EX 5

Dealing with Branch Delays ã Four strategies l Stall l Predict Taken, variation A (PTA) l Predict Taken, variation B (PTB) l Predict Not Taken (PNT) ã Consider a hypothetical 12 -stage pipeline l Instruction is fetched in stage 1 (IF) l Opcode becomes known in stage 2 (ID) l BTA becomes known in stage 4 l Branch outcome becomes known in stage 6 ã Parameters l PU, PT, PNT: penalties of unconditional branch, taken branch, untaken branch l T: probability of branch being taken 6

Stall Strategy: 12 -Stage Pipeline ã Pipeline stalls on all branches ã Instructions 1 and 8 are branches l 1 is not taken, 8 is taken ã Opcode determination in stage 2 stalls pipeline ã Branch outcome determination in stage 6 restarts pipeline from IF or ID ã BTA determination in stage 4 would restart pipeline from IF for jumps ã PU = 3, PT = 5, PNT = 4 7

PNT Strategy: 12 -Stage Pipeline ã Pipeline continues execution assuming that the branch will fall through ã Instructions 1 and 12 are branches l 1 is not taken, 12 is taken ã Branch outcome determination in stage 6 restarts pipeline from IF for taken branches (cancelling instructions already in pipeline) ã BTA determination in stage 4 would restart pipeline from IF for jumps ã PU = 3, PT = 5, PNT = 0 8

PTA Strategy: 12 -Stage Pipeline ã Pipeline predicts all branches to be taken and restarts pipeline from IF at BTA as soon as BTA is known (cancelling instructions already in pipe) ã Instructions 1 and 7 are branches l 1 is not taken, 7 is taken ã Branch outcome determination in stage 6 restarts pipeline from IF for untaken branches (cancelling instructions already in pipeline) ã PU = 3, PT = 3, PNT = 5 9

PTB Strategy: 12 -Stage Pipeline ã Pipeline predicts all instructions to be taken and starts fetching from BTA as soon as it is known in stage 4 (but without cancelling instructions already in pipeline) ã Instructions 1 and 10 are branch instructions l 1 is not taken, 10 is taken ã Branch outcome determination in stage 6 restarts pipeline from IF on fall- through path (for untaken branches), and causes cancellation ã PU = 3, PT = 3, PNT = 2 10

Effect of Control Hazards on Pipelines Assume that 20% of all instructions are transfers of control, split 5% for unconditional jumps and 15% for conditional branches. For each of the four branching schemes for the 12 -stage pipeline, determine the branch penalty as a function of T, the probability of a branch being taken. 11

Solution for 12 -Stage Pipeline ã Stall: 0. 25*3+0. 75*(T*5+(1 -T)*4) = 3. 75+0. 75 T ã PTA: 0. 25*3+0. 75*(T*3+(1 -T)*5) = 4. 5 -1. 5 T ã PTB: 0. 25*3+0. 75*(T*3+(1 -T)*2) = 2. 25+0. 75 T ã PNT: 0. 25*3+0. 75*(T*5+(1 -T)*0) = 0. 75+3. 75 T 12

Delayed Branches on MIPS ã One branch delay slot on MIPS ã Always execute instruction in branch delay slot (irrespective of branch outcome) ã Question: What instruction do we put in the branch delay slot? l Fill with NOP (always possible, penalty = 1) l Fill from before (not always possible, penalty = 0) l Fill from target (not always possible, penalty = 1 -T) Ø BTA is dynamic Ø BTA is another branch l Fill from fall-through (not always possible, penalty = T) 13

Details of Various Branch Flavors A B C D true E F G H false X: cond M N P Q A: B: C: D: X: if cond goto E M: N: P: Q: … E: F: G: Ordinary H: A: B: C: X: if cond goto E D: M: N: P: Q: … Delayed, E: F: filled from G: before H: 14

Instruction Sequence Alteration Strategies ã To allow for more aggressive filling of branch delay slot from target or fall-through, we can selectively cancel instructions ã Classification of branches l Delayed branch Ø Instruction in branch delay slot is always executed l Plain branch Ø Instruction in branch delay slot is cancelled if branch is taken Ø Useful if compiler filled branch delay slot from fall-through l Canceling (annulling, nullifying) branch Ø Instruction in branch delay slot is cancelled if branch is not taken Ø Useful if compiler filled branch delay slot from target ã Should not cancel instruction if it may cause exception ã A bit in the instruction set by compiler makes the choice l MIPS, SPARC, PA-RISC: delayed (0), canceling (1) l M 88000, i 860: delayed (0), plain (1) 15

Example: Branch Penalties Consider a DLX pipeline with a single branch delay slot in which 25% of branches are unconditional. 50% of the unconditional branches have their delay slots filled from before, 40% from the target, and 10% with NOPs. The branch delay slots of the conditional branches are filled from various sources as shown in the table below, depending on the kind of branch used. For each of the cases, determine the branch penalty as a function of T, the probability that a conditional branch is taken. How do these penalties compare to those obtained by using a Stall, PT, or PNT strategy? For all of Stall, PT, and PNT on DLX: PU = 1, PT = 1, PNT = 0 16

Solution: Branch Penalties 17