CIS 501 Computer Organization and Design Unit 7
![CIS 501 Computer Organization and Design Unit 7: Branch Prediction Based on slides by CIS 501 Computer Organization and Design Unit 7: Branch Prediction Based on slides by](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-1.jpg)
![This Unit: Branch Prediction App App System software Mem CPU • Control hazards • This Unit: Branch Prediction App App System software Mem CPU • Control hazards •](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-2.jpg)
![Readings • P&H • Chapter 4 CIS 501: Comp. Arch. | Dr. Joe Devietti Readings • P&H • Chapter 4 CIS 501: Comp. Arch. | Dr. Joe Devietti](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-3.jpg)
![Control Dependences and Branch Prediction CIS 501: Comp. Arch. | Dr. Joe Devietti | Control Dependences and Branch Prediction CIS 501: Comp. Arch. | Dr. Joe Devietti |](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-4.jpg)
![What About Branches? PC PC D + 4 PC X << 2 M Insn What About Branches? PC PC D + 4 PC X << 2 M Insn](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-5.jpg)
![Big Idea: Speculative Execution • Speculation: “risky transactions on chance of profit” • Speculative Big Idea: Speculative Execution • Speculation: “risky transactions on chance of profit” • Speculative](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-6.jpg)
![Control Speculation Mechanics • Guess branch target, start fetching at guessed position • Doing Control Speculation Mechanics • Guess branch target, start fetching at guessed position • Doing](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-7.jpg)
![When to Perform Branch Prediction? • Option #1: During Decode • Look at instruction When to Perform Branch Prediction? • Option #1: During Decode • Look at instruction](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-8.jpg)
![Branch Recovery PC PC D + 4 PC X << 2 M Insn Mem Branch Recovery PC PC D + 4 PC X << 2 M Insn Mem](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-9.jpg)
![Branch Speculation and Recovery addi r 3�r 1, 1 bnez r 3, targ st Branch Speculation and Recovery addi r 3�r 1, 1 bnez r 3, targ st](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-10.jpg)
![Branch Performance • Back of the envelope calculation • Branch: 20%, load: 20%, store: Branch Performance • Back of the envelope calculation • Branch: 20%, load: 20%, store:](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-11.jpg)
![Dynamic Branch Prediction <> BP + 4 PC TG PC X D Insn Mem Dynamic Branch Prediction <> BP + 4 PC TG PC X D Insn Mem](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-12.jpg)
![Branch Prediction Performance • Parameters • Branch: 20%, load: 20%, store: 10%, other: 50% Branch Prediction Performance • Parameters • Branch: 20%, load: 20%, store: 10%, other: 50%](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-13.jpg)
![Dynamic Branch Prediction Components regfile I$ D$ B P • Step #1: is it Dynamic Branch Prediction Components regfile I$ D$ B P • Step #1: is it](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-14.jpg)
![Branch Prediction Steps is insn a branch? no PC+4 yes T or NT? • Branch Prediction Steps is insn a branch? no PC+4 yes T or NT? •](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-15.jpg)
![BRANCH TARGET PREDICTION CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction BRANCH TARGET PREDICTION CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-16.jpg)
![Revisiting Branch Prediction Components regfile I$ D$ B P • Step #1: is it Revisiting Branch Prediction Components regfile I$ D$ B P • Step #1: is it](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-17.jpg)
![Branch Target Buffer • Learn from past, predict the future • Record the past Branch Target Buffer • Learn from past, predict the future • Record the past](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-18.jpg)
![Branch Target Buffer (continued) • At Fetch, how does insn know it’s a branch Branch Target Buffer (continued) • At Fetch, how does insn know it’s a branch](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-19.jpg)
![Why Does a BTB Work? • Because most control insns use direct targets • Why Does a BTB Work? • Because most control insns use direct targets •](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-20.jpg)
![Return Address Stack (RAS) PC BTB + 4 == tag target predicted target RAS Return Address Stack (RAS) PC BTB + 4 == tag target predicted target RAS](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-21.jpg)
![BRANCH TARGET PREDICTION CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction BRANCH TARGET PREDICTION CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-22.jpg)
![Branch Direction Prediction • Learn from past, predict the future • Record the past Branch Direction Prediction • Learn from past, predict the future • Record the past](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-23.jpg)
![Bimodal Branch Predictor Outcome CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Bimodal Branch Predictor Outcome CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-24.jpg)
![Two-Bit Saturating Counters (2 bc) Outcome CIS 501: Comp. Arch. | Dr. Joe Devietti Two-Bit Saturating Counters (2 bc) Outcome CIS 501: Comp. Arch. | Dr. Joe Devietti](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-25.jpg)
![Branches may be correlated • Consider: for (i=0; i<1000000; i++) { if (i % Branches may be correlated • Consider: for (i=0; i<1000000; i++) { if (i %](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-26.jpg)
![Gshare History-Based Predictor • Exploits observation that branch outcomes are correlated • Maintains recent Gshare History-Based Predictor • Exploits observation that branch outcomes are correlated • Maintains recent](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-27.jpg)
![Gshare History-based Predictor Prediction Outcome Result? 1 N NNN N T wrong 2 N Gshare History-based Predictor Prediction Outcome Result? 1 N NNN N T wrong 2 N](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-28.jpg)
![Hybrid Predictor • Hybrid (tournament) predictor [Mc. Farling 1993] • Attacks correlated predictor BHT Hybrid Predictor • Hybrid (tournament) predictor [Mc. Farling 1993] • Attacks correlated predictor BHT](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-29.jpg)
![REDUCING BRANCH PENALTY CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction REDUCING BRANCH PENALTY CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-30.jpg)
![Reducing Penalty: Fast Branches PC D << 2 + 4 PC Register File Insn Reducing Penalty: Fast Branches PC D << 2 + 4 PC Register File Insn](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-31.jpg)
![Reducing Branch Penalty • Approach taken in text is to move branch testing into Reducing Branch Penalty • Approach taken in text is to move branch testing into](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-32.jpg)
![Reducing Penalty: Fast Branches • Fast branch: targets control-hazard penalty • Basically, branch insns Reducing Penalty: Fast Branches • Fast branch: targets control-hazard penalty • Basically, branch insns](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-33.jpg)
![Fast Branch Performance • Assume: Branch: 20%, 75% of branches are taken • CPI Fast Branch Performance • Assume: Branch: 20%, 75% of branches are taken • CPI](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-34.jpg)
![Putting It All Together • BTB & branch direction predictor during fetch PC BTB Putting It All Together • BTB & branch direction predictor during fetch PC BTB](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-35.jpg)
![Branch Prediction Performance • Dynamic branch prediction • 20% of instruction branches • Simple Branch Prediction Performance • Dynamic branch prediction • 20% of instruction branches • Simple](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-36.jpg)
![PREDICATION CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 41 PREDICATION CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 41](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-37.jpg)
![Predication • Instead of predicting which way we’re going, why not go both ways? Predication • Instead of predicting which way we’re going, why not go both ways?](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-38.jpg)
![Predication Performance • Predication overhead is additional insns • Sometimes overhead is zero • Predication Performance • Predication overhead is additional insns • Sometimes overhead is zero •](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-39.jpg)
![Predication Performance • What does predication actually accomplish? • In a scalar 5 -stage Predication Performance • What does predication actually accomplish? • In a scalar 5 -stage](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-40.jpg)
![PIPELINE DEPTH CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 45 PIPELINE DEPTH CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 45](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-41.jpg)
![Pipelining: Clock Frequency vs. IPC • Increase number of pipeline stages (“pipeline depth”) • Pipelining: Clock Frequency vs. IPC • Increase number of pipeline stages (“pipeline depth”) •](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-42.jpg)
![Pipeline Depth data from http: //cpudb. stanford. edu/ integer pipeline floating point pipeline CIS Pipeline Depth data from http: //cpudb. stanford. edu/ integer pipeline floating point pipeline CIS](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-43.jpg)
![Summary App App System software Mem CPU • Control hazards • Branch target prediction Summary App App System software Mem CPU • Control hazards • Branch target prediction](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-44.jpg)
- Slides: 44
![CIS 501 Computer Organization and Design Unit 7 Branch Prediction Based on slides by CIS 501 Computer Organization and Design Unit 7: Branch Prediction Based on slides by](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-1.jpg)
CIS 501 Computer Organization and Design Unit 7: Branch Prediction Based on slides by Profs. Amir Roth, Milo Martin & C. J. Taylor CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 1
![This Unit Branch Prediction App App System software Mem CPU Control hazards This Unit: Branch Prediction App App System software Mem CPU • Control hazards •](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-2.jpg)
This Unit: Branch Prediction App App System software Mem CPU • Control hazards • Branch prediction I/O CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 2
![Readings PH Chapter 4 CIS 501 Comp Arch Dr Joe Devietti Readings • P&H • Chapter 4 CIS 501: Comp. Arch. | Dr. Joe Devietti](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-3.jpg)
Readings • P&H • Chapter 4 CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 3
![Control Dependences and Branch Prediction CIS 501 Comp Arch Dr Joe Devietti Control Dependences and Branch Prediction CIS 501: Comp. Arch. | Dr. Joe Devietti |](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-4.jpg)
Control Dependences and Branch Prediction CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 4
![What About Branches PC PC D 4 PC X 2 M Insn What About Branches? PC PC D + 4 PC X << 2 M Insn](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-5.jpg)
What About Branches? PC PC D + 4 PC X << 2 M Insn Mem Register File A s 1 s 2 d B B IR IR IR S X O • Branch speculation • Could just stall to wait for branch outcome (two-cycle penalty) • Fetch past branch insns before branch outcome is known • Default: assume “not-taken” (at fetch, can’t tell it’s a branch) CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 5
![Big Idea Speculative Execution Speculation risky transactions on chance of profit Speculative Big Idea: Speculative Execution • Speculation: “risky transactions on chance of profit” • Speculative](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-6.jpg)
Big Idea: Speculative Execution • Speculation: “risky transactions on chance of profit” • Speculative execution • Execute before all parameters known with certainty • Correct speculation + Avoid stall, improve performance • Incorrect speculation (mis-speculation) – Must abort/flush/squash incorrect insns – Must undo incorrect changes (recover pre-speculation state) • Control speculation: speculation aimed at control hazards • Unknown parameter: are these the correct insns to execute next? CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 6
![Control Speculation Mechanics Guess branch target start fetching at guessed position Doing Control Speculation Mechanics • Guess branch target, start fetching at guessed position • Doing](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-7.jpg)
Control Speculation Mechanics • Guess branch target, start fetching at guessed position • Doing nothing is implicitly guessing target is PC+4 • We were already speculating before! • Can actively guess other targets: dynamic branch prediction • Execute branch to verify (check) guess • Correct speculation? keep going • Mis-speculation? Flush mis-speculated insns • Hopefully haven’t modified permanent state (Regfile, DMem) + Happens naturally in in-order 5 -stage pipeline CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 7
![When to Perform Branch Prediction Option 1 During Decode Look at instruction When to Perform Branch Prediction? • Option #1: During Decode • Look at instruction](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-8.jpg)
When to Perform Branch Prediction? • Option #1: During Decode • Look at instruction opcode to determine branch instructions • Can calculate next PC from instruction (for PC-relative branches) – One cycle “mis-fetch” penalty even if branch predictor is correct bnez r 3, targ: add r 4�r 5, r 4 1 F 2 D 3 X F 4 M D 5 W X 6 7 M W 8 9 • Option #2: During Fetch? • How do we do that? CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 8
![Branch Recovery PC PC D 4 PC X 2 M Insn Mem Branch Recovery PC PC D + 4 PC X << 2 M Insn Mem](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-9.jpg)
Branch Recovery PC PC D + 4 PC X << 2 M Insn Mem Register File A s 1 s 2 d B B IR IR IR nop S X O nop • Branch recovery: what to do when branch is actually taken • Insns that are in F and D are wrong • Flush them, i. e. , replace them with nops + They haven’t written permanent state yet (regfile, DMem) – Two cycle penalty for taken branches CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 9
![Branch Speculation and Recovery addi r 3r 1 1 bnez r 3 targ st Branch Speculation and Recovery addi r 3�r 1, 1 bnez r 3, targ st](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-10.jpg)
Branch Speculation and Recovery addi r 3�r 1, 1 bnez r 3, targ st r 6�[r 7+4] mul r 10�r 8, r 9 Correct: 1 F 2 D F 3 X D F 4 M X D F 5 W M X D 6 7 8 W M X W M W 9 speculative • Mis-speculation recovery: what to do on wrong guess • • + • Not too painful in a short, in-order pipeline Branch resolves in X Younger insns (in F, D) haven’t changed permanent state Flush insns currently in D and X (i. e. , replace with nops) Recovery: addi r 3�r 1, 1 bnez r 3, targ st r 6�[r 7+4] mul r 10�r 8, r 9 targ: add r 4�r 4, r 5 1 F 2 D F 3 X D F 4 M X D F CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 5 W M --F 6 7 8 9 W --D --X -M W 10
![Branch Performance Back of the envelope calculation Branch 20 load 20 store Branch Performance • Back of the envelope calculation • Branch: 20%, load: 20%, store:](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-11.jpg)
Branch Performance • Back of the envelope calculation • Branch: 20%, load: 20%, store: 10%, other: 50% • Say, 75% of branches are taken • CPI = 1 + 20% * 75% * 2 = 1 + 0. 20 * 0. 75 * 2 = 1. 3 – Branches cause 30% slowdown • Worse with deeper pipelines (higher mis-prediction penalty) • Can we do better than assuming branch is not taken? CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 11
![Dynamic Branch Prediction BP 4 PC TG PC X D Insn Mem Dynamic Branch Prediction <> BP + 4 PC TG PC X D Insn Mem](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-12.jpg)
Dynamic Branch Prediction <> BP + 4 PC TG PC X D Insn Mem M Register File A s 1 s 2 d B B IR IR IR nop << 2 S X O nop • Dynamic branch prediction: hardware guesses outcome • Start fetching from guessed address • Flush on mis-prediction CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 12
![Branch Prediction Performance Parameters Branch 20 load 20 store 10 other 50 Branch Prediction Performance • Parameters • Branch: 20%, load: 20%, store: 10%, other: 50%](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-13.jpg)
Branch Prediction Performance • Parameters • Branch: 20%, load: 20%, store: 10%, other: 50% • 75% of branches are taken • Dynamic branch prediction • Branches predicted with 95% accuracy • CPI = 1 + 20% * 5% * 2 = 1. 02 CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 13
![Dynamic Branch Prediction Components regfile I D B P Step 1 is it Dynamic Branch Prediction Components regfile I$ D$ B P • Step #1: is it](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-14.jpg)
Dynamic Branch Prediction Components regfile I$ D$ B P • Step #1: is it a branch? • Easy after decode. . . • Step #2: is the branch taken or not taken? • Direction predictor (applies to conditional branches only) • Predicts taken/not-taken • Step #3: if the branch is taken, where does it go? • Easy after decode… CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 14
![Branch Prediction Steps is insn a branch no PC4 yes T or NT Branch Prediction Steps is insn a branch? no PC+4 yes T or NT? •](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-15.jpg)
Branch Prediction Steps is insn a branch? no PC+4 yes T or NT? • Which insn’s behavior are we trying to predict? • Where does PC come from? Not Taken prediction source: predicted target branch target buffer direction predictor CIS 501: Comp. Arch. | Prof. Joe Devietti | Branch Prediction 15
![BRANCH TARGET PREDICTION CIS 501 Comp Arch Dr Joe Devietti Branch Prediction BRANCH TARGET PREDICTION CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-16.jpg)
BRANCH TARGET PREDICTION CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 16
![Revisiting Branch Prediction Components regfile I D B P Step 1 is it Revisiting Branch Prediction Components regfile I$ D$ B P • Step #1: is it](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-17.jpg)
Revisiting Branch Prediction Components regfile I$ D$ B P • Step #1: is it a branch? • Easy after decode. . . during fetch: predictor • Step #2: is the branch taken or not taken? • Direction predictor (later) • Step #3: if the branch is taken, where does it go? • Branch target predictor (BTB) • Supplies target PC if branch is taken CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 17
![Branch Target Buffer Learn from past predict the future Record the past Branch Target Buffer • Learn from past, predict the future • Record the past](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-18.jpg)
Branch Target Buffer • Learn from past, predict the future • Record the past in a hardware structure • Branch target buffer (BTB): • “guess” the future PC based on past behavior • “Last time the branch X was taken, it went to address Y” • “So, in the future, if address X is fetched, fetch address Y next” • PC indexes table of bits target addresses • Essentially: branch will go to same place it went last time PC [31: 10] [9: 2] 1: 0 BTB target • What about aliasing? • Two PCs with the same lower bits? • No problem, just a prediction! CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction target predicted target 18
![Branch Target Buffer continued At Fetch how does insn know its a branch Branch Target Buffer (continued) • At Fetch, how does insn know it’s a branch](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-19.jpg)
Branch Target Buffer (continued) • At Fetch, how does insn know it’s a branch & should read BTB? It doesn’t have to… • …all insns access BTB in parallel with Imem Fetch • Key idea: use BTB to predict which insn are branches • Implement by “tagging” each entry with its corresponding PC • Update BTB on every taken branch insn, record target PC: • BTB[PC]. tag = PC, BTB[PC]. target = target of branch • All insns access at Fetch in parallel with Imem • Check for tag match, signifies insn at that PC is a branch • Predicted PC = (BTB[PC]. tag == PC) ? BTB[PC]. target : PC+4 PC BTB + 4 == tag target CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction predicted target 19
![Why Does a BTB Work Because most control insns use direct targets Why Does a BTB Work? • Because most control insns use direct targets •](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-20.jpg)
Why Does a BTB Work? • Because most control insns use direct targets • Target encoded in insn itself same “taken” target every time • What about indirect targets? • Target held in a register can be different each time • Two indirect call idioms + Dynamically linked functions (DLLs): target always the same • Dynamically dispatched (virtual) functions: hard but uncommon • Also two indirect unconditional jump idioms • Switches: hard but uncommon – Function returns: hard and common but… CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 20
![Return Address Stack RAS PC BTB 4 tag target predicted target RAS Return Address Stack (RAS) PC BTB + 4 == tag target predicted target RAS](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-21.jpg)
Return Address Stack (RAS) PC BTB + 4 == tag target predicted target RAS • Return address stack (RAS) • Call instruction? RAS[Top. Of. Stack++] = PC+4 • Return instruction? Predicted-target = RAS[--Top. Of. Stack] • Q: how can you tell if an insn is a call/return before decoding it? • Accessing RAS on every insn BTB-style doesn’t work • Answer: another predictor (or put them in BTB marked as “return”) • Or, pre-decode bits in insn mem, written when first executed CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 21
![BRANCH TARGET PREDICTION CIS 501 Comp Arch Dr Joe Devietti Branch Prediction BRANCH TARGET PREDICTION CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-22.jpg)
BRANCH TARGET PREDICTION CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 22
![Branch Direction Prediction Learn from past predict the future Record the past Branch Direction Prediction • Learn from past, predict the future • Record the past](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-23.jpg)
Branch Direction Prediction • Learn from past, predict the future • Record the past in a hardware structure • Direction predictor (DIRP) • Map conditional-branch PC to taken/not-taken (T/N) decision • Individual conditional branches often biased or weakly biased • 90%+ one way or the other considered “biased” • Why? Loop back edges, checking for uncommon conditions • Bimodal predictor: simplest predictor • PC indexes Branch History Table of bits (0 = N, 1 = T), no tags • Essentially: branch will go same way it went last time PC [31: 10] [9: 2] 1: 0 BHT T or NT • What about aliasing? • Two PC with the same lower bits? • No problem, just a prediction! CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction T or NT Prediction (taken or not taken) 23
![Bimodal Branch Predictor Outcome CIS 501 Comp Arch Dr Joe Devietti Branch Bimodal Branch Predictor Outcome CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-24.jpg)
Bimodal Branch Predictor Outcome CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction Result? 1 N 2 T N T Wrong T T Correct 3 T 4 T T T Correct T N Wrong 5 N 6 T N T Wrong T T Correct 7 T 8 T T T Correct T N Wrong 9 N 10 T N T Wrong T T Correct 11 T 12 T T T Correct T N Wrong State Time • PC indexes table of bits (0 = N, 1 = T), no tags • Essentially: branch will go same way it went last time • Problem: inner loop branch below for (i=0; i<100; i++) for (j=0; j<3; j++) // whatever – Two “built-in” mis-predictions per inner loop iteration – Branch predictor “changes its mind too quickly” Prediction • simplest direction predictor 24
![TwoBit Saturating Counters 2 bc Outcome CIS 501 Comp Arch Dr Joe Devietti Two-Bit Saturating Counters (2 bc) Outcome CIS 501: Comp. Arch. | Dr. Joe Devietti](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-25.jpg)
Two-Bit Saturating Counters (2 bc) Outcome CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction • Replace each single-bit prediction • (0, 1, 2, 3) = (N, n, t, T) • Adds “hysteresis” • Force predictor to mis-predict twice before “changing its mind” • One mispredict each loop execution (rather than two) + Fixes this pathology (which is not contrived, by the way) • Can we do even better? Result? 1 N 2 n N T Wrong 3 t 4 T T T Correct T N Wrong 5 t 6 T T T Correct 7 T 8 T T T Correct T N Wrong 9 t 10 T T T Correct 11 T 12 T T T Correct T N Wrong State Time • Two-bit saturating counters (2 bc) [Smith 1981] 25
![Branches may be correlated Consider for i0 i1000000 i if i Branches may be correlated • Consider: for (i=0; i<1000000; i++) { if (i %](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-26.jpg)
Branches may be correlated • Consider: for (i=0; i<1000000; i++) { if (i % 3 == 0) { … } if (random() % 2 == 0) { … } if (i % 3 == 0) { … // Globally } } CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction // Highly biased // Locally correlated // Unpredictable correlated 28
![Gshare HistoryBased Predictor Exploits observation that branch outcomes are correlated Maintains recent Gshare History-Based Predictor • Exploits observation that branch outcomes are correlated • Maintains recent](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-27.jpg)
Gshare History-Based Predictor • Exploits observation that branch outcomes are correlated • Maintains recent branch outcomes in Branch History Register (BHR) • In addition to BHT of counters (typically 2 -bit sat. counters) • How do we incorporate history into our predictions? • Use PC xor BHR to index into BHT. Why? BHT PC BHR direction prediction (T/NT) CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 29
![Gshare Historybased Predictor Prediction Outcome Result 1 N NNN N T wrong 2 N Gshare History-based Predictor Prediction Outcome Result? 1 N NNN N T wrong 2 N](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-28.jpg)
Gshare History-based Predictor Prediction Outcome Result? 1 N NNN N T wrong 2 N NNT N T wrong 3 N NTT N T wrong 4 N TTT N N correct 5 N TTN N T wrong 6 N TNT N T wrong 7 T NTT T T correct 8 N TTT N N correct 9 T TTN T T correct 10 T 11 T TNT T T correct NTT T T correct 12 N TTT N N correct CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction BHR assume program has one branch BHT: one 1 -bit DIRP entry 3 BHR: last 3 branch outcomes train counter, and update BHR after each branch State • • Time • Gshare working example 30
![Hybrid Predictor Hybrid tournament predictor Mc Farling 1993 Attacks correlated predictor BHT Hybrid Predictor • Hybrid (tournament) predictor [Mc. Farling 1993] • Attacks correlated predictor BHT](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-29.jpg)
Hybrid Predictor • Hybrid (tournament) predictor [Mc. Farling 1993] • Attacks correlated predictor BHT capacity problem • Idea: combine two predictors • Simple bimodal predictor for history-independent branches • Correlated predictor for branches that need history • Chooser assigns branches to one predictor or the other • Branches start in simple BHT, move mis-prediction threshold + Correlated predictor can be made smaller, handles fewer branches + 90– 95% accuracy CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction chooser BHT BHR BHT PC 33
![REDUCING BRANCH PENALTY CIS 501 Comp Arch Dr Joe Devietti Branch Prediction REDUCING BRANCH PENALTY CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-30.jpg)
REDUCING BRANCH PENALTY CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 34
![Reducing Penalty Fast Branches PC D 2 4 PC Register File Insn Reducing Penalty: Fast Branches PC D << 2 + 4 PC Register File Insn](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-31.jpg)
Reducing Penalty: Fast Branches PC D << 2 + 4 PC Register File Insn Mem <> 0 S X s 1 s 2 d IR M A X B S X O B IR IR • Fast branch: can decide at D, not X • Test must be comparison to zero or equality, no time for ALU + New taken branch penalty is 1 – Additional insns (slt) for more complex tests, must bypass to D too CIS 371 (Martin): Pipelining 35
![Reducing Branch Penalty Approach taken in text is to move branch testing into Reducing Branch Penalty • Approach taken in text is to move branch testing into](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-32.jpg)
Reducing Branch Penalty • Approach taken in text is to move branch testing into the ID stage so fewer instructions are flushed on a misprediction. CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 36
![Reducing Penalty Fast Branches Fast branch targets controlhazard penalty Basically branch insns Reducing Penalty: Fast Branches • Fast branch: targets control-hazard penalty • Basically, branch insns](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-33.jpg)
Reducing Penalty: Fast Branches • Fast branch: targets control-hazard penalty • Basically, branch insns that can resolve at D, not X • Test must be comparison to zero or equality, no time for ALU + New taken branch penalty is 1 – Additional comparison insns (e. g. , cmplt, slt) for complex tests – Must bypass into decode stage now, too bnez r 3, targ st r 6�[r 7+4] targ: add r 4�r 5, r 4 1 F 2 D F 3 X D F 4 M -D CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 5 W -X 6 7 -M W 8 9 37
![Fast Branch Performance Assume Branch 20 75 of branches are taken CPI Fast Branch Performance • Assume: Branch: 20%, 75% of branches are taken • CPI](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-34.jpg)
Fast Branch Performance • Assume: Branch: 20%, 75% of branches are taken • CPI = 1 + 20% * 75% * 1 = 1 + 0. 20*0. 75*1 = 1. 15 • 15% slowdown (better than the 30% from before) • But wait, fast branches assume only simple comparisons • Fine for MIPS • But not fine for ISAs with “branch if $1 > $2” operations • In such cases, say 25% of branches require an extra insn • CPI = 1 + (20% * 75% * 1) + 20%*25%*1(extra insn) = 1. 2 • Example of ISA and micro-architecture interaction • Type of branch instructions • Another option: “Delayed branch” or “branch delay slot” • What about condition codes? CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 38
![Putting It All Together BTB branch direction predictor during fetch PC BTB Putting It All Together • BTB & branch direction predictor during fetch PC BTB](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-35.jpg)
Putting It All Together • BTB & branch direction predictor during fetch PC BTB == tag target + 4 predicted target RAS BHT taken/not-taken • If branch prediction correct, no taken branch penalty CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 39
![Branch Prediction Performance Dynamic branch prediction 20 of instruction branches Simple Branch Prediction Performance • Dynamic branch prediction • 20% of instruction branches • Simple](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-36.jpg)
Branch Prediction Performance • Dynamic branch prediction • 20% of instruction branches • Simple predictor: branches predicted with 75% accuracy • CPI = 1 + (20% * 25% * 2) = 1. 1 • More advanced predictor: 95% accuracy • CPI = 1 + (20% * 5% * 2) = 1. 02 • Branch mis-predictions still a big problem though • Pipelines are long: typical mis-prediction penalty is 10+ cycles • For cores that do more per cycle, predictions more costly (later) CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 40
![PREDICATION CIS 501 Comp Arch Dr Joe Devietti Branch Prediction 41 PREDICATION CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 41](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-37.jpg)
PREDICATION CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 41
![Predication Instead of predicting which way were going why not go both ways Predication • Instead of predicting which way we’re going, why not go both ways?](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-38.jpg)
Predication • Instead of predicting which way we’re going, why not go both ways? • compute a predicate bit indicating a condition • ISA includes predicated instructions • predicated insns either execute as normal or as NOPs, depending on the predicate bit • Examples • x 86 cmov performs conditional load/store • 32 b ARM allows almost all insns to be predicated • 64 b ARM has predicated reg-reg move, inc, dec, not • Nvidia’s CUDA ISA supports predication on most insns • predicate bits are like LC 4 NZP bits • x 86 FLAGS, ARM condition codes CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 42
![Predication Performance Predication overhead is additional insns Sometimes overhead is zero Predication Performance • Predication overhead is additional insns • Sometimes overhead is zero •](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-39.jpg)
Predication Performance • Predication overhead is additional insns • Sometimes overhead is zero • for if-then statement where condition is true – Most of the times it isn’t • if-then-else statement, only one of the paths is useful • Calculation for a given branch, predicate (vs speculate) if… • Average number of additional insns > overall mis-prediction penalty • For an individual branch • Mis-prediction penalty in a 5 -stage pipeline = 2 • Mis-prediction rate is <50%, and often <20% • Overall mis-prediction penalty <1 and often <0. 4 • So when is predication ever worth it? CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 43
![Predication Performance What does predication actually accomplish In a scalar 5 stage Predication Performance • What does predication actually accomplish? • In a scalar 5 -stage](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-40.jpg)
Predication Performance • What does predication actually accomplish? • In a scalar 5 -stage pipeline (penalty = 2): nothing • In a 4 -way superscalar 15 -stage pipeline (penalty = 60): something • Use when mis-predictions >10% and insn overhead <6 • In a 4 -way out-of-order superscalar (penalty ~ 150) • potentially useful in more situations • Still: only useful for branches that mis-predict frequently • Other predication advantages • Low-power: eliminates the need for a large branch predictor • Real-time: predicated code performs consistently • Predication disadvantages • wasted time/energy compared to correct prediction • doesn’t nest well CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 44
![PIPELINE DEPTH CIS 501 Comp Arch Dr Joe Devietti Branch Prediction 45 PIPELINE DEPTH CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 45](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-41.jpg)
PIPELINE DEPTH CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 45
![Pipelining Clock Frequency vs IPC Increase number of pipeline stages pipeline depth Pipelining: Clock Frequency vs. IPC • Increase number of pipeline stages (“pipeline depth”) •](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-42.jpg)
Pipelining: Clock Frequency vs. IPC • Increase number of pipeline stages (“pipeline depth”) • Keep cutting datapath into finer pieces + Increases clock frequency (decreases clock period) • Register overhead & unbalanced stages cause sub-linear scaling • Double the number of stages won’t quite double the frequency – Increases CPI (decreases IPC) • More pipeline “hazards”, higher branch penalty • Memory latency relatively higher (same absolute lat. , more cycles) – Result: after some point, deeper pipelining can decrease performance • “Optimal” pipeline depth is program- and technology-specific CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 46
![Pipeline Depth data from http cpudb stanford edu integer pipeline floating point pipeline CIS Pipeline Depth data from http: //cpudb. stanford. edu/ integer pipeline floating point pipeline CIS](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-43.jpg)
Pipeline Depth data from http: //cpudb. stanford. edu/ integer pipeline floating point pipeline CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 47
![Summary App App System software Mem CPU Control hazards Branch target prediction Summary App App System software Mem CPU • Control hazards • Branch target prediction](https://slidetodoc.com/presentation_image_h/312fdd52f69c6fa305e2e62590d8d45a/image-44.jpg)
Summary App App System software Mem CPU • Control hazards • Branch target prediction • Branch direction prediction I/O CIS 501: Comp. Arch. | Dr. Joe Devietti | Branch Prediction 48
Cis 501
Cis 501
Cis 501
Cis 501
Cis 501
Cis 501
Cis 501
Cis 501
Process organization in computer organization
Computer architecture and computer organization difference
Design of basic computer in computer architecture
Interrupt cycle flow chart
Interrupt cycle flow chart
Basic computer design
Basic structure of a computer system
Point-by-point arrangement
Barema 525 onderwijs
Rimskych 501
Bds 501
Hino 501
Eng m 501
Ariane 5 flight 501
Dev 501
Pix
Nia 501
Tyler sis webster groves
Hinario 501
Cs 501
Bios 501
Bios 501
Mgt 501
Mgt 501
Cas audit meaning
I 501
I 501
Mgt 501
Acilyse
Simple capm
501
Norma 501
Opwekking 501
W 501
Maurice cooper
Ubc canvs
Unit 6 review questions