Lecture on High Performance Processor Architecture CS 05162

  • Slides: 34
Download presentation
Lecture on High Performance Processor Architecture (CS 05162) Dynamic Branch Prediction Scheme An Hong

Lecture on High Performance Processor Architecture (CS 05162) Dynamic Branch Prediction Scheme An Hong han@ustc. edu. cn Fall 2007 University of Science and Technology of China Department of Computer Science and Technology CS of USTC AN Hong

Outline n Bimodal Branch Prediction Scheme n Two-Level Branch Prediction Scheme n 混合预测算法的例子:Alpha 21264的分支预测器

Outline n Bimodal Branch Prediction Scheme n Two-Level Branch Prediction Scheme n 混合预测算法的例子:Alpha 21264的分支预测器 2021/3/12 CS of USTC AN Hong 2

Dynamic Prediction(1): 利用单个分支自身历史(基于模式的预测) n Dynamic Prediction :Use run-time information to make prediction − example:

Dynamic Prediction(1): 利用单个分支自身历史(基于模式的预测) n Dynamic Prediction :Use run-time information to make prediction − example: Branch Prediction Buffer(1 -位预测器) 2021/3/12 CS of USTC AN Hong 3

Dynamic Prediction: 1 -bit BHT n Branch History Table − Lower bits of PC

Dynamic Prediction: 1 -bit BHT n Branch History Table − Lower bits of PC address index table of 1 -bit values − Says whether or not branch taken last time − No address check n Problem: in a loop, 1 -bit BHT will cause two mispredictions : − End of loop case, when it exits instead of looping as before − First time through loop on next time through code, when it predicts exit instead of looping 2021/3/12 CS of USTC AN Hong 4

Dynamic Prediction: 1 -bit BHT (Branch Prediction Buffer) n Pros: − Small. 1 bit

Dynamic Prediction: 1 -bit BHT (Branch Prediction Buffer) n Pros: − Small. 1 bit per entry l can fit lots of entries − Always returns a prediction n Cons: − aliasing between branches − one bit of state mispredicts many branches for (I =0; I<10; I++) { a = a + 1; } Two mispredictions per loop invocation 2021/3/12 CS of USTC AN Hong 5

Dynamic Prediction(2):Bimodal Branch Prediction Scheme(2 bits BHT, 2 -位饱和预测器) n Solution: 2 -bit predictor

Dynamic Prediction(2):Bimodal Branch Prediction Scheme(2 bits BHT, 2 -位饱和预测器) n Solution: 2 -bit predictor where change prediction only if get misprediction twice: Use extra state to reduce mispredictions at loop ends T BHT T = Taken N = Not Taken Predict Taken NT NT Predict Not Taken n Red: stop, not taken Predict Taken T Predict Not Taken T 11 10 00 01 NT n Green: go, taken 2 -bit Saturating Up-down Counter n Adds hysteresis to decision making process 2021/3/12 CS of USTC AN Hong 6

Bimodal Branch Prediction Scheme n Strategy: Based on the direction the branch went the

Bimodal Branch Prediction Scheme n Strategy: Based on the direction the branch went the last few times it was executed. − Based on a little self-history pattern − Based on a counter n Works well: − when each branch is strongly biased in a particular direction. − For scientific/engineering applications where program execution is dominated by inner-loops. 2021/3/12 CS of USTC AN Hong 7

Bimodal Branch Prediction Scheme n 例1:…NNNTNNN… − 1 -位预测器,出现 2次预测错 − 2 -位预测器,出现 1次预测错

Bimodal Branch Prediction Scheme n 例1:…NNNTNNN… − 1 -位预测器,出现 2次预测错 − 2 -位预测器,出现 1次预测错 n 例2:TNTNTN…. , 初始状态为 01的2 -位预测器,出现 100%预 测错 n BHT方法准确度 − Mispredict because either: l Wrong guess for that branch l Got branch history of wrong branch when index the table − 4096 entry table programs vary from 1% misprediction (nasa 7, tomcatv) to 18% (eqntott), with spice at 9% and gcc at 12% − 4096 about as good as infinite table (in Alpha 21164) − 2 -bit已经足够, n-bit (n>2)与2 -bit效果差不多 2021/3/12 CS of USTC AN Hong 8

Pros/Cons n Cons: − Now only mispredicts once on each loop − Also good

Pros/Cons n Cons: − Now only mispredicts once on each loop − Also good for data-dependent branches where most data points the same way l ex: checking for termination character at the end of a string n Pros: − Still have aliasing problem between branches − only uses information about history of current branch(self-history, or local-history) l But, sequences of branches often correlate 2021/3/12 CS of USTC AN Hong 9

Two-Level Branch Prediction (GAg) n Use history of recent branches 2021/3/12 CS of USTC

Two-Level Branch Prediction (GAg) n Use history of recent branches 2021/3/12 CS of USTC AN Hong 12

Global Branch Prediction Scheme: Global BHR/pre-address PHTs Branch Address(PC) 相应于每个路径的子 历史表 n = full

Global Branch Prediction Scheme: Global BHR/pre-address PHTs Branch Address(PC) 相应于每个路径的子 历史表 n = full address 2 -bits n-bits Denotation GAp(k) G: “Global” BHR A: adaptive p: “per-address” PHT k: BHR length 16 entries k=4 k-bits global BHR (Shift left when update) per-address PHTs GAp(k) = GAp(4),又称(k, n)-gselect预测器 2021/3/12 CS of USTC AN Hong 13

Global Branch Prediction Scheme: 设计思想 例:考虑下面的代码段: if (aa == 2) // branch b 1

Global Branch Prediction Scheme: 设计思想 例:考虑下面的代码段: if (aa == 2) // branch b 1 通向b 3的路径 A: 0 -0 B: 0 -1 C: 1 -0 D: 1 -1 // branch b 2 在b 3执行之前 可以推测 aa=0 bb=0 aa=0 bb 2 aa 2 bb=0 aa 2 bb 2 aa =0; if (bb == 2) bb =0; if (aa != bb) { // branch b 3 …… } 这段代码由编译器转换为以下汇编程序(假定aa和bb分别分配为R 1和R 2): SUBI R 3,R 1,#2 BNEZ R 3,L 1 ;branch b 1 (aa != 2) , taken ADD R 1, R 0 ;aa == 0, not taken L 1: SUBI R 3,R 2,#2 BNEZ R 3,L 2 ;branch b 2 (bb != 2) , taken ADD R 2, R 0 ;bb == 0, not taken L 2: SUB R 3,R 1,R 2 ;R 3=aa-bb BEQZ R 3,L 3 2021/3/12 ;branch b 3 (aa = = bb) , taken CS of USTC AN Hong 14

Global Branch Prediction Scheme: 设计思想 ③ ④ aa bb aa' bb' 到达b 3 的路径

Global Branch Prediction Scheme: 设计思想 ③ ④ aa bb aa' bb' 到达b 3 的路径 PHT的当前 状态 0 2 0 0 C 2 2 0 0 2 1 0 2 ⑥ b 3预测结果 b 3的实际结 果 正确:c 错误:w PHT的下一 个状态 0 N T w 1 A 1 N T w 2 1 B 2 T N w 1 0 0 B 1 N T w 2 2 0 0 A 2 T T c 3 1 0 D 3 T N w 2 1 0 D 2 T N w 1 2 0 0 0 B 1 N T w 2 0 1 D 2 T N w 1 1 1 D 1 N T w 2 1 0 C 2 T N w 1 1 2 1 0 C 1 N N c 0 2 2 0 0 A 0 N T w 1 2 0 0 0 B 1 N T w 2 0 1 D 2 T N w 1 2 2 0 0 A 1 N T w 2 0 0 C 2 T T c 3 0 1 D 3 T N w 2 1 0 D 2 T N w 1 2 2 0 0 A 1 N T w 2 ① 2021/3/12 ② aa’和bb’是b 1和b 2执 CS of USTC AN Hong 行后aa和bb的新值 ⑤ 15

Global Branch Prediction Scheme: 设计思想 路径 B 1 和 B 2的方向 B 3的实际结果 2021/3/12

Global Branch Prediction Scheme: 设计思想 路径 B 1 和 B 2的方向 B 3的实际结果 2021/3/12 CS of USTC AN Hong 16

Pros/Cons of Two-Level Branch Prediction n Pros: − Predicts correlated branch behavior that breaks

Pros/Cons of Two-Level Branch Prediction n Pros: − Predicts correlated branch behavior that breaks other predictors l eqntott example − Better overall performance than purely address-based predictors n Cons: − Interference between unrelated branches with same history l example: all loop-end branches will map to same entry in pattern history table − sometimes this is a good thing 2021/3/12 CS of USTC AN Hong 17

Global Branch Prediction Scheme GAp(k) 又称(k, n)-gselect预测器 − (0,n)- gselect 预测器,即为 2 -位(双峰)预测器 −

Global Branch Prediction Scheme GAp(k) 又称(k, n)-gselect预测器 − (0,n)- gselect 预测器,即为 2 -位(双峰)预测器 − (M,0)- gselect预测器,即为GAg(M)预测器 Branch Address(PC) n = full address 2 -bits n-bits Denotation GAp(k) G: “Global” BHR A: adaptive p: “per-address” PHT k: BHR length 16 entries k=4 k-bits global BHR 2021/3/12 CS of USTC AN Hong per-address PHTs 18

Global Branch Prediction Scheme: Global BHR/pre-set PHTs Branch Address(PC) n =10 2 -bits n-bits

Global Branch Prediction Scheme: Global BHR/pre-set PHTs Branch Address(PC) n =10 2 -bits n-bits Denotation GAs(n, 2 k) G: “Global” BHR A: adaptive s: “per-set” PHT k: BHR length 16 entries k=4 k-bits global BHR per-set PHTs 2021/3/12 GAs(4, 1024) CS of USTC AN Hong 19

Global Branch Prediction Scheme: Gselect Two-level predictor(另一种画法) by S. T. Pan 1992 PHT Branch

Global Branch Prediction Scheme: Gselect Two-level predictor(另一种画法) by S. T. Pan 1992 PHT Branch Address(PC) n-bits n n+k + k k-bits BHR 2021/3/12 CS of USTC AN Hong 20

Global Branch Prediction Scheme: Gshare Two-level predictor by Scott Mc. Farling 1993 PHT Branch

Global Branch Prediction Scheme: Gshare Two-level predictor by Scott Mc. Farling 1993 PHT Branch Address n-bits n XOR n k k k-bits BHR 2021/3/12 CS of USTC AN Hong 21

Global Branch Prediction Scheme: Global BHR/Global PHT Denotation GAg(k) G: “Global” BHR A: adaptive

Global Branch Prediction Scheme: Global BHR/Global PHT Denotation GAg(k) G: “Global” BHR A: adaptive g: “global” PHT k: BHR length Global branch predictor 2 -bits 2 k entries k-bits BHR = Branch History Register (Shift left when update) PHT = Pattern History Tables (2 -bit Saturating Up-down Counter ) 2021/3/12 GAg(k) CS of USTC AN Hong 22

Global Branch Prediction Scheme: Global BHR/Global PHT Problem: Two code sequences may have the

Global Branch Prediction Scheme: Global BHR/Global PHT Problem: Two code sequences may have the same bit pattern in the BHR and thus index the same pattern in the PHT. Global branch Path History, Shift left when update 111100001111 2 -bits 4096 entries k = 12 global BHR global PHT 2021/3/12 GAg(12) CS of USTC AN Hong 23

Global Branch Prediction Scheme: Gshare Two-level predictor by Scott Mc. Farling 1993 n=8 k=8

Global Branch Prediction Scheme: Gshare Two-level predictor by Scott Mc. Farling 1993 n=8 k=8 Branch Address BHR n=4 k=4 gselect 4/4 0000 0001 00000001 0000 0000 1111 0000 1111 1000 0000 1111 0000 01111111 n k gselect 8/8 hashed 2021/3/12 CS of USTC AN Hong 24

Global Branch Prediction Scheme n Strategy − Based on the combined history of all

Global Branch Prediction Scheme n Strategy − Based on the combined history of all recent branches − Based on a Shift register and a counter n Works well − when the direction taken by sequentially executed branches is highly correlated. − 11% additional accuracy (compared with 2 -bit scheme) at the extra hardware cost of one shift register n Example: if (x<1) {. . . } if (x>1) {. . . } 2021/3/12 if ( a = = 2) a=0; // b 1 if ( b = = 2) b = 0; if (a != b ) { … } CS of USTC AN Hong // b 2 // b 3 25

Two-level adaptive predictors: Variations 基于多个分支全局历 史的(基于相关的) 预测方法 Global PHT per-address PHTs per-set PHTs Global

Two-level adaptive predictors: Variations 基于多个分支全局历 史的(基于相关的) 预测方法 Global PHT per-address PHTs per-set PHTs Global BHR GAg GAp GAs per-address BHT PAg PAp PAs per-set BHT SAg SAp SAs 基于单个分支局部历史 的(基于模式的)预测 方法 2021/3/12 CS of USTC AN Hong 26

Local Branch Prediction Scheme: pre-address BHR/global PHT Branch Address (PC) N-bits 2 -bits full

Local Branch Prediction Scheme: pre-address BHR/global PHT Branch Address (PC) N-bits 2 -bits full address k-bits k=4 1100 16 entries 1100 per-address BHT 2021/3/12 global PHT PAg(4) CS of USTC AN Hong 27

Local Branch Prediction Scheme: SAg(n) Two-level adaptive predictor 2 -bits Branch Address (PC) n-bits

Local Branch Prediction Scheme: SAg(n) Two-level adaptive predictor 2 -bits Branch Address (PC) n-bits n 2 k entries k-bits k 2 n entries PHT = Pattern History Tables (2 -bit Saturating Up-down Counter ) BHT = Branch History Table (Shift left when update) 2021/3/12 CS of USTC AN Hong SAg(k) 28

Local Branch Prediction Scheme: SAg(n) Two-level adaptive predictor 2 -bits Branch Address (PC) n-bits

Local Branch Prediction Scheme: SAg(n) Two-level adaptive predictor 2 -bits Branch Address (PC) n-bits n=10 16 entries k-bits k=4 1100 1024 entries 1100 PHT BHT 2021/3/12 CS of USTC AN Hong SAg(4) 29

Local Branch Prediction Scheme: pre-address BHR/per-address PHTs Branch Address (PC) b 1 b 2

Local Branch Prediction Scheme: pre-address BHR/per-address PHTs Branch Address (PC) b 1 b 2 2 -bits per-address PHTs N-bits full address 2 -bits n-bits b 1 b 2 1100 per-address BHT 2021/3/12 k=4 16 entries PAp(4) CS of USTC AN Hong 30

Local Branch Prediction Scheme n Strategy: considers the history of each branch independently and

Local Branch Prediction Scheme n Strategy: considers the history of each branch independently and takes advantage of repetitive patterns. n Works well: branches with simple repetitive patterns. n Example: − for (I=1, I<=4; I++){ } 2021/3/12 //its pattern is (1110)n CS of USTC AN Hong 31

性能比较 @SPEC 89 2021/3/12 CS of USTC AN Hong 32

性能比较 @SPEC 89 2021/3/12 CS of USTC AN Hong 32

性能比较 @SPEC 89 2021/3/12 CS of USTC AN Hong 33

性能比较 @SPEC 89 2021/3/12 CS of USTC AN Hong 33

混合预测算法的例子:Alpha 21264的分支预测器 per-set BHT global PHT Global history prediction GAg(12) k =10 n=10 k

混合预测算法的例子:Alpha 21264的分支预测器 per-set BHT global PHT Global history prediction GAg(12) k =10 n=10 k = 12 Selector (1 12) Global BHR Local history prediction SAg(10) 2021/3/12 CS of USTC AN Hong 34