Pipeline Hazards CSE 308 Computer Architecture Prof Muhamed

  • Slides: 39
Download presentation
Pipeline Hazards CSE 308 – Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King

Pipeline Hazards CSE 308 – Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals

Pipeline Hazards v Hazards: situations that would cause incorrect execution If next instruction were

Pipeline Hazards v Hazards: situations that would cause incorrect execution If next instruction were launched during its designated clock cycle 1. Structural hazards Caused by resource contention Using same resource by two instructions during the same cycle 2. Data hazards An instruction may compute a result needed by next instruction Hardware can detect dependencies between instructions 3. Control hazards Caused by instructions that change control flow (branches/jumps) Delays in changing the flow of control v Hazards complicate pipeline control and limit performance Pipeline Hazards © Muhamed Mudawar, CSE 308 – KFUPM Slide 2

Conflict due to one Memory Port Ifetch Instr 3 DMem Reg DMem Reg ALU

Conflict due to one Memory Port Ifetch Instr 3 DMem Reg DMem Reg ALU Instr 2 Reg Cycle 5 ALU Ifetch Cycle 4 ALU Load Cycle 2 ALU Cycle 1 ALU Time Cycle 3 Ifetch Instr 4 Instr 5 Ifetch Structural Hazard: Can’t load data and fetch Instruction 4 during clock cycle 4 Pipeline Hazards Cycle 6 Reg Ifetch © Muhamed Mudawar, CSE 308 – KFUPM Cycle 7 Same memory is used for instructions and data Reg DMem Reg Slide 3

Resolving structural hazards v Problem Attempt to use the same hardware resource By two

Resolving structural hazards v Problem Attempt to use the same hardware resource By two different instructions during the same cycle v Solution 1: Wait Must detect the hazard Must have mechanism to delay instruction access to resource Serious: hazard cannot be ignored v Solution 2: Redesign the pipeline Add more hardware to eliminate the structural hazard In our example: use two memories with two memory ports ² Instruction Memory ² Data Memory Pipeline Hazards Can be implemented as caches © Muhamed Mudawar, CSE 308 – KFUPM Slide 4

Solution 1 : Detect Hazard and Delay Time (clock cycles) Instr 1 Instr 2

Solution 1 : Detect Hazard and Delay Time (clock cycles) Instr 1 Instr 2 Stall Instr 3 Pipeline Hazards Reg Ifetch Introduce a bubble to delay instruction fetching DMem Reg Bubble Reg DMem ALU Ifetch Cycle 6 Cycle 7 Bubble Ifetch © Muhamed Mudawar, CSE 308 – KFUPM Reg DMem Bubble Reg A bubble is a NOP instruction Reg Bubble ALU O r d e r Load ALU I n s t r. Cycle 3 Cycle 4 Cycle 5 ALU Cycle 1 Cycle 2 Bubble Reg DMem Slide 5

Solution 2: Add More Hardware v Eliminate structural hazard at design time v Use

Solution 2: Add More Hardware v Eliminate structural hazard at design time v Use two separate memories with two memory ports Instruction and data memories can be implemented as caches IF ID EX IF/ID ID/EX EX/MEM Inc Instruction Registers 0 Rd Pipeline Hazards ALU result Rt Instruction Memory MEM/WB zero Rs m u x Reg_dst PC 1 Address WB A d d Extend Data_in 00 Imm 16 0 m u x MEM 0 m u x A L U 1 Address Data Memory 0 m u x 1 Data_in 1 © Muhamed Mudawar, CSE 308 – KFUPM Slide 6

Data Hazards v Dependency between instructions causes a data hazard v The dependent instructions

Data Hazards v Dependency between instructions causes a data hazard v The dependent instructions are close to each other Pipelined execution might change the order of operand access v Read After Write – RAW Hazard Given two instructions I and J, where I comes before J … Instruction J should read an operand after it is written by I Called a data dependence in compiler terminology I: add $1, $2, $3 # r 1 is written J: sub $4, $1, $3 # r 1 is read Hazard occurs when J reads the operand before I writes it Pipeline Hazards © Muhamed Mudawar, CSE 308 – KFUPM Slide 7

Example of a RAW Data Hazard Program Execution Order Time (in cycles) value of

Example of a RAW Data Hazard Program Execution Order Time (in cycles) value of $2 sub $2, $1, $3 and $4, $2, $5 or $6, $3, $2 add $7, $2 sw $8, 10($2) CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 10 10 10/20 20 IM Reg ALU DM Reg IM Reg ALU DM v Result of sub is needed by and, or, add, & sw instructions v Instructions and & or will read old value of $2 from reg file v During CC 5, $2 is written and read – new value is read Pipeline Hazards © Muhamed Mudawar, CSE 308 – KFUPM Slide 8

Stalling the Pipeline Program Execution Order Time (in cycles) value of $2 sub $2,

Stalling the Pipeline Program Execution Order Time (in cycles) value of $2 sub $2, $1, $3 CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 10 10 10/20 20 IM Reg ALU DM Reg IM Reg Reg ALU DM Reg bubble ALU DM Reg IM Reg ALU and $4, $2, $5 bubble into ID/EX or $6, $3, $2 DM v The and instruction cannot fetch $2 until CC 5 v Two bubbles (NOP instructions) are inserted in ID/EX At the end of CC 3 and CC 4 cycles Pipeline Hazards © Muhamed Mudawar, CSE 308 – KFUPM Slide 9

Forwarding ALU Result v The ALU result is forwarded (fed back) to the ALU

Forwarding ALU Result v The ALU result is forwarded (fed back) to the ALU input No bubbles are inserted into the pipeline and no cycles are wasted v ALU result exists in either EX/MEM or MEM/WB register Program Execution Order Time (in cycles) sub $2, $1, $3 and $4, $2, $5 or $6, $3, $2 add $7, $2 sw $8, 10($2) Pipeline Hazards CC 1 CC 2 CC 3 CC 4 CC 5 IM Reg ALU DM Reg IM Reg ALU DM © Muhamed Mudawar, CSE 308 – KFUPM CC 6 CC 7 CC 8 Slide 10

Implementing Forwarding v Two multiplexers are added at the inputs of the ALU v

Implementing Forwarding v Two multiplexers are added at the inputs of the ALU v ALU result in EX/MEM is forwarded (fed back) v Writeback data in MEM/WB is also forwarded v Two signals: Forward. A and Forward. B control forwarding Registers B Rt ALU result m u x Pipeline Hazards Data Memory 0 m u x 1 Data_in Reg. Dst Reg. Write Address Rw Rw Rb Ra Rd Rb Rw m u x A L U ALU result m u x A Rs Load data Extend MEM/WB Rw Imm 16 EX/MEM ALUSrc ALU result ID/EX Imm 32 IF/ID Writeback data Forward. B Forward. A © Muhamed Mudawar, CSE 308 – KFUPM Slide 11

Determining Rb and Rw in ID/EX v Rb-Rw selector determines Rb and Rw in

Determining Rb and Rw in ID/EX v Rb-Rw selector determines Rb and Rw in ID/EX v Controlled by Reg. Dst and Reg. Write Slight modification and extension to P&H textbook v ID/EX. Rw can be determined as follows: ID/EX. Rw = IF/ID. Rt else ID/EX. Rw = IF/ID. Rd Imm 32 Imm 16 Extend Rs Registers Rt B else if (Reg. Dst = Rt) ID/EX A If (Reg. Write == ‘ 0’) ID/EX. Rw = ‘ 0’ IF/ID if (Reg. Dst = Rt) ID/EX. Rb = ‘ 0’ else ID/EX. Rb = IF/ID. Rt Pipeline Hazards © Muhamed Mudawar, CSE 308 – KFUPM Rd Reg. Dst Rw Rb Ra v ID/EX. Rb can be determined as follows: Rb Rw Reg. Write Slide 12

RAW Hazard Detection v RAW hazards can be detected by the pipeline v Current

RAW Hazard Detection v RAW hazards can be detected by the pipeline v Current instruction is decoded and is in ID/EX register v Previous instruction is in the EX/MEM register v Second previous instruction is in the MEM/WB register v RAW Hazard Conditions: ID/EX. Ra = EX/MEM. Rw ≠ 0 Raw Hazard detected with ID/EX. Rb = EX/MEM. Rw ≠ 0 Previous Instruction ID/EX. Ra = MEM/WB. Rw ≠ 0 Raw Hazard detected with ID/EX. Rb = MEM/WB. Rw ≠ 0 Second Previous Instruction Pipeline Hazards © Muhamed Mudawar, CSE 308 – KFUPM Slide 13

Forwarding Control Signals Control Signal Source Explanation Forward. A = 00 ID/EX First ALU

Forwarding Control Signals Control Signal Source Explanation Forward. A = 00 ID/EX First ALU operand comes from the register file Forward. A = 10 EX/MEM Forwarded from the previous ALU result Forward. A = 01 MEM/WB Forwarded from data memory or 2 nd previous result Forward. B = 00 ID/EX Second ALU operand comes from the register file Forward. B = 10 EX/MEM Forwarded from the previous ALU result Forward. B = 01 MEM/WB Forwarded from data memory or 2 nd previous result if (EX/MEM. Rw ≠ 0 and EX/MEM. Rw == ID/EX. Ra) Forward. A = 10 else if (MEM/WB. Rw ≠ 0 and MEM/WB. Rw == ID/EX. Ra) Forward. A = 01 else Forward. A = 00 if (EX/MEM. Rw ≠ 0 and EX/MEM. Rw == ID/EX. Rb) Forward. B = 10 else if (MEM/WB. Rw ≠ 0 and MEM/WB. Rw == ID/EX. Rb) Forward. B = 01 else Forward. B = 00 Pipeline Hazards © Muhamed Mudawar, CSE 308 – KFUPM Slide 14

Forwarding Unit v Forwarding unit generates Forward. A and Forward. B That are used

Forwarding Unit v Forwarding unit generates Forward. A and Forward. B That are used to control the two forwarding multiplexers v Uses Ra and Rb in ID/EX and Rw in EX/MEM & MEM/WB Registers ALU result Rb Rw m u x A L U Reg. Dst Reg. Write Address Data Memory 0 m u x 1 Data_in Rw Rd Rw Rb Ra B Rt m u x ALU result A Rs Load data Extend MEM/WB Rw Imm 16 EX/MEM ALUSrc ALU result ID/EX Imm 32 IF/ID Writeback data Forward. B Forward. A Forwarding Unit Pipeline Hazards © Muhamed Mudawar, CSE 308 – KFUPM Slide 15

Data Hazard Even with Forwarding v Unfortunately, not all data hazards can be forwarded

Data Hazard Even with Forwarding v Unfortunately, not all data hazards can be forwarded The load has a delay that cannot be eliminated by forwarding v In the example shown below … The LW instruction does not have data until end of CC 4 AND instruction wants data at beginning of CC 4 - NOT possible Program Order Time (in cycles) lw $2, 20($1) and $4, $2, $5 or $6, $3, $2 add $7, $2 Pipeline Hazards CC 1 CC 2 CC 3 CC 4 CC 5 IM Reg ALU DM Reg IM Reg ALU DM © Muhamed Mudawar, CSE 308 – KFUPM CC 6 CC 7 CC 8 Reg Slide 16

Pipeline Interlock v Interlock detects a RAW hazard and stalls the pipeline Example: pipeline

Pipeline Interlock v Interlock detects a RAW hazard and stalls the pipeline Example: pipeline interlock causes a stall to the AND instruction ² Freezes the IF/ID register holding the AND instruction ² Allows the LW instruction to proceed ² Introduces a bubble (no-op) instruction into the ID/EX register ² No instruction is started during CC 4, and PC is kept unchanged Time (in cycles) Program Order lw $2, 20($1) CC 1 CC 2 CC 3 CC 4 CC 5 IM Reg ALU DM Reg IM Reg bubble IM and $4, $2, $5 bubble into ID/EX or $6, $3, $2 Pipeline Hazards CC 6 CC 7 ALU DM Reg IM Reg ALU © Muhamed Mudawar, CSE 308 – KFUPM DM CC 8 Reg Slide 17

Implementing the Pipeline Interlock v Detecting a RAW hazard after a load instruction The

Implementing the Pipeline Interlock v Detecting a RAW hazard after a load instruction The load instruction will be in the ID/EX register An instruction that needs the load data will be in the IF/ID register v Condition for stalling the pipeline if ((ID/EX. Mem. Read == 1) and (ID/EX. Rw ≠ 0) and ((ID/EX. Rw == IF/ID. Rs) or (ID/EX. Rw == IF/ID. Rt))) Stall v To stall the pipeline when a load hazard is detected: Freeze the PC and IF/ID registers ² No new instruction is fetched ² Instruction after Load is stalled Load moves normally from the ID/EX to the EX/MEM register A bubble is introduced into the ID/EX register Pipeline Hazards © Muhamed Mudawar, CSE 308 – KFUPM Slide 18

Hazard Detection and Stall Unit m u x ALU result A L U Rb

Hazard Detection and Stall Unit m u x ALU result A L U Rb Rw m u x Address Data Memory 0 m u x 1 Data_in Reg. Write Hazard Detection and Stall Unit Pipeline Hazards EX m u x M 0 WB Main Control Writeback data Bubble IF/IDWrite PCWrite Reg. Dst Op Rd m u x Rw Rw Rb Ra Instruction Memory ALU result Registers Rt MEM/WB Rw Address A Rs B PC Instruction Extend ALU result Imm 16 EX/MEM Load data ID/EX Imm 32 IF/ID The pipelined is stalled by Making PCWrite = ‘ 0’ and IF/IDWrite = ‘ 0’ and introducing a bubble into the ID/EX control signals ID/EX. Mem. Read © Muhamed Mudawar, CSE 308 – KFUPM Slide 19

Compiler Scheduling v Compilers can schedule code in a way to avoid load stalls

Compiler Scheduling v Compilers can schedule code in a way to avoid load stalls v Consider the following statements: a = b + c; d = e – f; v Slow code: lw lw add sw lw lw sub sw Pipeline Hazards $10, $11, $12, $13, $14, $15, v Fast code: No Stalls 0($1) // $1 = &b 0($2) // $2 = &c $10, $11 // stall 0($3) // $3 = &a 0($4) // $4 = &e 0($5) // $5 = &f $13, $14 // stall 0($6) // $6 = &d © Muhamed Mudawar, CSE 308 – KFUPM lw lw add sw sub sw $10, $11, $13, $14, $12, $15, $14, 0($1) 0($2) 0($4) 0($5) $10, $11 0($3) $13, $14 0($6) Slide 20

Write After Read – WAR Hazard v Instruction J should write its result after

Write After Read – WAR Hazard v Instruction J should write its result after it is read by I v Called an anti-dependence by compiler writers I: sub $4, $1, $3 # r 1 is read J: add $1, $2, $3 # r 1 is written v Results from reuse of the name $1 v Hazard occurs when J writes $1 before I reads it v Can’t happen in our basic 5 -stage pipeline because: All instructions take 5 stages, and Reads are always in stage 2, and Writes are always in stage 5 Pipeline Hazards © Muhamed Mudawar, CSE 308 – KFUPM Slide 21

Write After Write – WAW Hazard v Instruction J should write its result after

Write After Write – WAW Hazard v Instruction J should write its result after instruction I v Called an output-dependence in compiler terminology I: sub $1, $4, $3 // $1 is written J: add $1, $2, $3 // $1 is written again v This hazard also results from the reuse of name $1 v Hazard occurs when writes occur in the wrong order v Can’t happen in our basic 5 -stage pipeline because: All instructions take 5 stages, and Writes are ordered and always take place in stage 5 v WAR and WAW hazards can occur in complex pipelines v Notice that Read After Read – RAR is NOT a hazard Pipeline Hazards © Muhamed Mudawar, CSE 308 – KFUPM Slide 22

Control Hazards v Branch instructions can cause great performance loss v Branch instructions need

Control Hazards v Branch instructions can cause great performance loss v Branch instructions need two things: The result of branch: Taken or Not Taken Branch target: ² PC + 4 Branch NOT taken ² PC + 4 + immediate*4 Branch Taken v Branch instruction is not detected until the ID stage At which point a new instruction has already been fetched v For our original pipeline: Effective address is not calculated until EX stage Branch condition get set in the EX/MEM register (EX/MEM. zero) 3 -cycle branch delay Pipeline Hazards © Muhamed Mudawar, CSE 308 – KFUPM Slide 23

Branch Delay – CC 1 v Consider the pipelined execution of: beq $1, $3,

Branch Delay – CC 1 v Consider the pipelined execution of: beq $1, $3, 100 v During the first cycle, beq is fetched in the IF stage beq $1, $3, 100 ID EX IF/ID ID/EX EX/MEM +4 A d d PCSrc Rs Address Registers Instruction Rt Op Instruction Memory m u x A L U ALU result Writeback data Main Control Pipeline Hazards m u x W M E 1 Zero m u x Reg_dst m u x Extend Data_in 0 PC = 1000 Imm 16 © Muhamed Mudawar, CSE 308 – KFUPM W M 1004 MEM Slide 24

Branch Delay – CC 2 v During the second cycle, beq is decoded in

Branch Delay – CC 2 v During the second cycle, beq is decoded in the ID stage v The next_1 instruction is fetched in the IF stage IF/ID A d d Extend Rt Zero m u x Rs Registers beq Instruction Memory m u x A L U ALU result Writeback data Main Control Pipeline Hazards EX/MEM W M E $1 100 Instruction Imm 16 Data_in 1 Address $3 m u x PC = 1004 PCSrc MEM ID/EX Reg_dst +4 EX 1004 1008 0 beq $1, $3, 100 © Muhamed Mudawar, CSE 308 – KFUPM W M next_1 Slide 25

Branch Delay – CC 3 v During the third cycle, beq is executed in

Branch Delay – CC 3 v During the third cycle, beq is executed in the EX stage v The next_2 instruction is fetched in the IF stage ID/EX 1004 PCSrc IF/ID 1 Rs Address Instruction Memory Rt Registers Reg_dst m u x Extend Data_in 0 PC = 1008 Imm 16 MEM EX/MEM A d d 100 +4 beq $1, $3, 100 1234 1012 next_1 1008 next_2 Zero m u x A L U ALU result Pipeline Hazards © Muhamed Mudawar, CSE 308 – KFUPM Beq = 1 W M Main Control W M E Writeback data Slide 26

Branch Delay – CC 4 v During the fourth cycle, beq reaches MEM stage

Branch Delay – CC 4 v During the fourth cycle, beq reaches MEM stage v The next_3 instruction is fetched in the IF stage 1 m u x Rs Address Instruction Memory Rt Registers m u x Reg_dst m u x A d d Extend Data_in 0 PC = 1012 Imm 16 EX/MEM 1404 ID/EX 1 PCSrc IF/ID beq $1, $3, 100 m u x A L U Zero = 1 ALU result 0 +4 next_1 1008 1016 next_2 1012 next_3 Pipeline Hazards © Muhamed Mudawar, CSE 308 – KFUPM W M Main Control W M E Writeback data Beq = 1 Slide 27

Branch Delay – CC 5 v During the fifth cycle, branch_target instruction is fetched

Branch Delay – CC 5 v During the fifth cycle, branch_target instruction is fetched v Next_1 thru next_3 should be converted into NOPs +4 PCSrc IF/ID ID/EX Rs Instruction Memory Rt Registers m u x Reg_dst 1 A d d Zero m u x Address next_1 EX/MEM Extend Data_in m u x PC = 1404 Imm 16 0 next_2 1012 1408 next_3 1016 branch_target m u x A L U ALU result Pipeline Hazards © Muhamed Mudawar, CSE 308 – KFUPM W M Main Control W M E Writeback data Slide 28

3 -Cycle Branch Delay v Next_1 thru Next_3 will be fetched anyway v Pipeline

3 -Cycle Branch Delay v Next_1 thru Next_3 will be fetched anyway v Pipeline should flush Next_1 thru Next_3 if branch is taken v Otherwise, they can be executed normally beq $1, $3, 100 Next_1 // bubble Next_2 // bubble Next_3 // bubble cc 1 cc 2 cc 3 cc 4 cc 5 IM Reg ALU DM Reg IM Reg Bubble Bubble IM Reg ALU Branch_Target Pipeline Hazards © Muhamed Mudawar, CSE 308 – KFUPM cc 6 cc 7 Slide 29

Reducing the Delay of Branches v Branch delay can be reduced from 3 cycles

Reducing the Delay of Branches v Branch delay can be reduced from 3 cycles to just 1 cycle v Branch decision is moved from 4 th into 2 nd pipeline stage Branches can be determined earlier in the ID stage Branch address calculation adder is moved to ID stage A comparator in the ID stage to compare the two fetched registers ² To determine branch decision, whether the branch is taken or not v Only one instruction that follows the branch will be fetched v If the branch is taken then only one instruction is flushed v We need a control signal IF. Flush to zero the IF/ID register This will convert the fetched instruction into a NOP Pipeline Hazards © Muhamed Mudawar, CSE 308 – KFUPM Slide 30

Reducing the Delay of Branches IF. Flush Hazard detection unit ID/EX M u x

Reducing the Delay of Branches IF. Flush Hazard detection unit ID/EX M u x Control 0 M u x IF/ID 4 EX/MEM M WB EX M MEM/WB WB Shift left 2 Registers PC WB = Instruction memory M u x ALU M u x Data memory M u x Sign extend M u x Forwarding unit Pipeline Hazards © Muhamed Mudawar, CSE 308 – KFUPM Slide 31

Branch Hazard Alternatives v Always stall the pipeline until branch direction is known Next

Branch Hazard Alternatives v Always stall the pipeline until branch direction is known Next instruction is always flushed (turned into a NOP) v Predict Branch Not Taken Fetch successor instruction: PC+4 already calculated Almost half of MIPS branches are not taken on average Flush instructions in pipeline only if branch is actually taken v Predict Branch Taken Can predict backward branches in loops taken most of time However, branch target address is determined in ID stage Must reduce branch delay from 1 cycle to 0, but how? v Delayed Branch Define branch to take place AFTER the following instruction Pipeline Hazards © Muhamed Mudawar, CSE 308 – KFUPM Slide 32

Delayed Branch v Define branch to take place after the next instruction v For

Delayed Branch v Define branch to take place after the next instruction v For a 1 -cycle branch delay, we have one delay slot branch instruction branch delay slot – next instruction. . . branch target – if branch taken branch instruction (taken) IF branch delay slot (next instruction) branch target ID EX MEM WB IF ID EX MEM WB v Compiler/assembler fills the branch delay slot By selecting a useful instruction Pipeline Hazards © Muhamed Mudawar, CSE 308 – KFUPM Slide 33

Scheduling the Branch Delay Slot v From an independent instruction before the branch v

Scheduling the Branch Delay Slot v From an independent instruction before the branch v From a target instruction when branch is predicted taken v From fall through when branch is predicted not taken add $t 2, $t 3, $t 4 sub $t 4, $t 5, $t 6 beq $s 1, $s 0 add $t 2, $t 3, $t 4 beq $s 1, $s 0 From Target From Before Delay Slot beq $s 1, $s 0 From Fall Through Delay Slot sub $t 4, $t 5, $t 6 beq $s 1, $s 0 sub $t 4, $t 5, $t 6 Pipeline Hazards © Muhamed Mudawar, CSE 308 – KFUPM Slide 34

More on Delayed Branch v Scheduling delay slot with Independent instruction is the best

More on Delayed Branch v Scheduling delay slot with Independent instruction is the best choice ² However, not always possible to find an independent instruction Target instruction is useful when branch is predicted taken ² Such as in a loop branch ² May need to duplicate instruction if it can be reached by another path ² Cancel branch delay instruction if branch is not taken Fall through is useful when branch is predicted not taken ² Cancel branch delay instruction if branch is taken v Disadvantages of delayed branch Branch delay can increase to multiple cycles in deeper pipelines Zero-delay branching + dynamic branch prediction are required Pipeline Hazards © Muhamed Mudawar, CSE 308 – KFUPM Slide 35

Zero-Delayed Branch v How can we achieve zero-delay for a taken branch … If

Zero-Delayed Branch v How can we achieve zero-delay for a taken branch … If the branch target address is computed in the ID stage ? v Solution Check the PC to see if the instruction being fetched is a branch Store the branch target address in a table in the IF stage Such a table is called the branch target buffer If branch is predicted taken then ² Next PC = branch target fetched from target buffer Otherwise, if branch is predicted not taken then ² Next PC = PC + 4 Zero-delay is achieved because Next PC is determined in IF stage Pipeline Hazards © Muhamed Mudawar, CSE 308 – KFUPM Slide 36

Branch Target and Prediction Buffer v The branch target buffer is implemented as a

Branch Target and Prediction Buffer v The branch target buffer is implemented as a small cache That stores the branch target address of taken branches v We also have a branch prediction buffer To store the prediction bits for branch instructions The prediction bits are dynamically determined by the hardware Branch Target Buffer mux PC Pipeline Hazards Lookup © Muhamed Mudawar, CSE 308 – KFUPM Target Address Prediction Buffer PC of Branch +4 Slide 37

Dynamic Branch Prediction v Prediction of branches at runtime using prediction bits One or

Dynamic Branch Prediction v Prediction of branches at runtime using prediction bits One or few prediction bits are associated with a branch instruction v Branch prediction buffer is a small memory Indexed by the lower portion of the address of branch instruction v The simplest scheme is to have 1 prediction bit per branch v We don’t know if the prediction bit is correct or not v If correct prediction … Continue normal execution – no wasted cycles v If incorrect prediction (misprediction) … Flush the instructions that were incorrectly fetched – wasted cycles Update prediction bit and target address for future use Pipeline Hazards © Muhamed Mudawar, CSE 308 – KFUPM Slide 38

2 -bit Prediction Scheme v Prediction is just a hint that is assumed to

2 -bit Prediction Scheme v Prediction is just a hint that is assumed to be correct v If incorrect then fetched instructions are flushed v 1 -bit prediction scheme has a performance shortcoming A loop branch is almost always taken, except for last iteration 1 -bit scheme will predict incorrectly twice, rather than once On the first and last loop iterations v 2 -bit prediction schemes are often used A prediction must be wrong twice before it is changed A loop branch is mispredicted only once on the last iteration Pipeline Hazards Taken Predict Taken Not Taken © Muhamed Mudawar, CSE 308 – KFUPM Taken Not Taken Slide 39