ECE 232 Hardware Organization and Design Part 13

  • Slides: 15
Download presentation
ECE 232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6) http: //www.

ECE 232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6) http: //www. ecs. umass. edu/ece 232/ Adapted from Computer Organization and Design, Patterson & Hennessy, UCB

Branch Instructions Cause Control Hazards Inst 3 Inst 4 ECE 232: Br. Predict 2

Branch Instructions Cause Control Hazards Inst 3 Inst 4 ECE 232: Br. Predict 2 M W F D EX M IM Reg DM IM Reg ALU lw EX ALU O r d e r beq D ALU I n s t r. F ALU jr DM W Reg Reg DM Reg Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Koren

BEQ resolved during the MEM stage PCSrc ID/EX EX/MEM Control IF/ID Add Shift left

BEQ resolved during the MEM stage PCSrc ID/EX EX/MEM Control IF/ID Add Shift left 2 4 PC Instruction Memory Read Address Read Addr 1 Register Read 1 Read Addr Data 2 File Write Addr Read Data 2 Write Data 16 ECE 232: Br. Predict 3 Add Sign Extend Branch MEM/WB Data Memory ALU Address Read Data Write Data 32 Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Koren

One Way to “Fix” a Control Hazard beq O r d e r stall

One Way to “Fix” a Control Hazard beq O r d e r stall IM Reg ALU I n s t r. DM Reg stall ECE 232: Br. Predict 4 Reg DM IM Reg ALU Inst 3 IM ALU lw Fix branch hazard by waiting – introduce stalls Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Reg DM Koren

Reducing branch penalty through HW design ECE 232: Br. Predict 5 Adapted from Computer

Reducing branch penalty through HW design ECE 232: Br. Predict 5 Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Koren

Reducing Control Hazards’ Penalties § Stalls – hurts performance § Deeper pipelines have higher

Reducing Control Hazards’ Penalties § Stalls – hurts performance § Deeper pipelines have higher penalties § 1. Move decision point as early in the pipeline as possible – reduces number of stalls at the cost of additional hardware § 2. Delay decision (requires compiler support) – “Delayed Branch”: NEXT beq $1, $2, NEXT add $4, $3, $5 sub $7, $2, $8 • not effective for deeper pipes - requiring more than one delay slot to be filled § 3. Predict outcome of branch ECE 232: Br. Predict 6 Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Koren

Branch Prediction § § Easiest - static prediction • Always taken, always not taken

Branch Prediction § § Easiest - static prediction • Always taken, always not taken • Opcode based • Displacement based (forward not taken, backward taken) • Compiler directed (branch likely, branch not likely) Dynamic prediction – prediction per branch in program • 1 bit predictor – remember last taken/not taken per branch • Use a branch-history table (BHT) with 1 bit entry BHT • Use part of the PC (low-order bits) to Predictor 0 index table – Why? Predictor 1 • Multiple branches may share the same bit Branch PC • Invert the bit if • prediction is wrong • • Predictor 127 ECE 232: Br. Predict 7 Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Koren

Branch Prediction § 1 bit predictor • Backward branches for loops will be mispredicted

Branch Prediction § 1 bit predictor • Backward branches for loops will be mispredicted twice EX: If a loop branches 9 times in a row and not taken once, what is the prediction accuracy? Misprediction at the first and last loop iteration => 80% prediction accuracy, although branch is taken 90%. . . TTT T N § N TT. . . T Modern processors – multiple instructions issued per cycle, more branch hazards will occur per cycle • Cost of branch mispredicted goes up • Pentium II – 3 instructions issued per cycle, 12+ cycle misprediction penalty • Huge penalty for a misfetched path following a branch ECE 232: Br. Predict 8 Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Koren

2 -bit Branch Prediction § § 4 states instead of 2, allowing for more

2 -bit Branch Prediction § § 4 states instead of 2, allowing for more information about tendencies A prediction must miss twice before it is changed Good for backward branches of loops 2 -bit saturating counter T N Predict Taken T T Predict not taken Predict Taken N N Predict not taken T . . . TTT T N T TT. . . T N ECE 232: Br. Predict 9 Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Koren

Branch History Table - BHT § § § 2 bits by N (e. g.

Branch History Table - BHT § § § 2 bits by N (e. g. 4 K entries) Uses low-order bits of branch PC to choose entry Plot misprediction instead of prediction branch PC BH T Predictor 0 Predictor 1 01 01 • • • Predictor 4095 ECE 232: Br. Predict 10 Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Koren

Is Branch Predictor Enough? § § When is using branch prediction beneficial? • Clearly

Is Branch Predictor Enough? § § When is using branch prediction beneficial? • Clearly when the outcome is known later than the target • Otherwise - If we predict the branch is taken (and suppose it is correct), what is the target address? • Need a mechanism to provide target address as well • Use a Branch Target Buffer (BTB) that includes the target address Can we eliminate the one cycle delay for the 5 -stage pipeline? • Need to fetch from branch target immediately after branch was fetched ECE 232: Br. Predict 11 Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Koren

Branch Target Buffer (BTB) BTB is a cache that contains the predicted PC value

Branch Target Buffer (BTB) BTB is a cache that contains the predicted PC value instead of whether the branch will take place or not (Ex. Loop address) Is the current instruction a branch ? • BTB provides the answer before the current instruction is decoded and therefore enables fetching to begin after IF-stage (for branch) What is the branch target ? • BTB provides the branch target if the prediction is a taken branch (for not taken branches the target is simply PC+4 ) ECE 232: Br. Predict 12 Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Koren

BTB ECE 232: Br. Predict 13 Adapted from Computer Organization and Design, Patterson&Hennessy, UCB,

BTB ECE 232: Br. Predict 13 Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Koren

BTB operations Send PC to memory and branch-target buffer IF No No ID Is

BTB operations Send PC to memory and branch-target buffer IF No No ID Is instruction a taken branch? Enter branch instruction address and next PC into branch-target buffer ECE 232: Br. Predict 14 Yes Send out predicted PC Yes No Normal instruction execution EX Entry found in branchtarget buffer? Taken Branch? Mispredicted branch, kill fetched instruction; restart fetch at other target; update target buffer § BTB hit, prediction taken → 0 cycle delay § BTB hit, misprediction ≥ 2 cycle penalty – Correct BTB § BTB miss, branch ≥ 1 cycle penalty (Detected at the ID stage and entered in BTB) Yes Branch correctly predicted; continue execution with no stalls Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Koren

Branch Prediction Summary § § § The better we predict, the lower penalty we

Branch Prediction Summary § § § The better we predict, the lower penalty we might incur 2 -bit predictors capture tendencies well Correlating predictors improve accuracy, particularly when combined with 2 -bit predictors Accurate branch prediction does no good if we don’t know there was a branch to predict BTB identifies branches in IF stage BTB combined with branch prediction table identifies branches to predict, and predicts them well ECE 232: Br. Predict 15 Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Koren