Lecture on High Performance Processor Architecture CS 05162
- Slides: 34
Lecture on High Performance Processor Architecture (CS 05162) Dynamic Branch Prediction Scheme An Hong han@ustc. edu. cn Fall 2007 University of Science and Technology of China Department of Computer Science and Technology CS of USTC AN Hong
Outline n Bimodal Branch Prediction Scheme n Two-Level Branch Prediction Scheme n 混合预测算法的例子:Alpha 21264的分支预测器 2021/3/12 CS of USTC AN Hong 2
Dynamic Prediction(1): 利用单个分支自身历史(基于模式的预测) n Dynamic Prediction :Use run-time information to make prediction − example: Branch Prediction Buffer(1 -位预测器) 2021/3/12 CS of USTC AN Hong 3
Dynamic Prediction: 1 -bit BHT n Branch History Table − Lower bits of PC address index table of 1 -bit values − Says whether or not branch taken last time − No address check n Problem: in a loop, 1 -bit BHT will cause two mispredictions : − End of loop case, when it exits instead of looping as before − First time through loop on next time through code, when it predicts exit instead of looping 2021/3/12 CS of USTC AN Hong 4
Dynamic Prediction: 1 -bit BHT (Branch Prediction Buffer) n Pros: − Small. 1 bit per entry l can fit lots of entries − Always returns a prediction n Cons: − aliasing between branches − one bit of state mispredicts many branches for (I =0; I<10; I++) { a = a + 1; } Two mispredictions per loop invocation 2021/3/12 CS of USTC AN Hong 5
Dynamic Prediction(2):Bimodal Branch Prediction Scheme(2 bits BHT, 2 -位饱和预测器) n Solution: 2 -bit predictor where change prediction only if get misprediction twice: Use extra state to reduce mispredictions at loop ends T BHT T = Taken N = Not Taken Predict Taken NT NT Predict Not Taken n Red: stop, not taken Predict Taken T Predict Not Taken T 11 10 00 01 NT n Green: go, taken 2 -bit Saturating Up-down Counter n Adds hysteresis to decision making process 2021/3/12 CS of USTC AN Hong 6
Bimodal Branch Prediction Scheme n Strategy: Based on the direction the branch went the last few times it was executed. − Based on a little self-history pattern − Based on a counter n Works well: − when each branch is strongly biased in a particular direction. − For scientific/engineering applications where program execution is dominated by inner-loops. 2021/3/12 CS of USTC AN Hong 7
Bimodal Branch Prediction Scheme n 例1:…NNNTNNN… − 1 -位预测器,出现 2次预测错 − 2 -位预测器,出现 1次预测错 n 例2:TNTNTN…. , 初始状态为 01的2 -位预测器,出现 100%预 测错 n BHT方法准确度 − Mispredict because either: l Wrong guess for that branch l Got branch history of wrong branch when index the table − 4096 entry table programs vary from 1% misprediction (nasa 7, tomcatv) to 18% (eqntott), with spice at 9% and gcc at 12% − 4096 about as good as infinite table (in Alpha 21164) − 2 -bit已经足够, n-bit (n>2)与2 -bit效果差不多 2021/3/12 CS of USTC AN Hong 8
Pros/Cons n Cons: − Now only mispredicts once on each loop − Also good for data-dependent branches where most data points the same way l ex: checking for termination character at the end of a string n Pros: − Still have aliasing problem between branches − only uses information about history of current branch(self-history, or local-history) l But, sequences of branches often correlate 2021/3/12 CS of USTC AN Hong 9
Two-Level Branch Prediction (GAg) n Use history of recent branches 2021/3/12 CS of USTC AN Hong 12
Global Branch Prediction Scheme: Global BHR/pre-address PHTs Branch Address(PC) 相应于每个路径的子 历史表 n = full address 2 -bits n-bits Denotation GAp(k) G: “Global” BHR A: adaptive p: “per-address” PHT k: BHR length 16 entries k=4 k-bits global BHR (Shift left when update) per-address PHTs GAp(k) = GAp(4),又称(k, n)-gselect预测器 2021/3/12 CS of USTC AN Hong 13
Global Branch Prediction Scheme: 设计思想 例:考虑下面的代码段: if (aa == 2) // branch b 1 通向b 3的路径 A: 0 -0 B: 0 -1 C: 1 -0 D: 1 -1 // branch b 2 在b 3执行之前 可以推测 aa=0 bb=0 aa=0 bb 2 aa 2 bb=0 aa 2 bb 2 aa =0; if (bb == 2) bb =0; if (aa != bb) { // branch b 3 …… } 这段代码由编译器转换为以下汇编程序(假定aa和bb分别分配为R 1和R 2): SUBI R 3,R 1,#2 BNEZ R 3,L 1 ;branch b 1 (aa != 2) , taken ADD R 1, R 0 ;aa == 0, not taken L 1: SUBI R 3,R 2,#2 BNEZ R 3,L 2 ;branch b 2 (bb != 2) , taken ADD R 2, R 0 ;bb == 0, not taken L 2: SUB R 3,R 1,R 2 ;R 3=aa-bb BEQZ R 3,L 3 2021/3/12 ;branch b 3 (aa = = bb) , taken CS of USTC AN Hong 14
Global Branch Prediction Scheme: 设计思想 ③ ④ aa bb aa' bb' 到达b 3 的路径 PHT的当前 状态 0 2 0 0 C 2 2 0 0 2 1 0 2 ⑥ b 3预测结果 b 3的实际结 果 正确:c 错误:w PHT的下一 个状态 0 N T w 1 A 1 N T w 2 1 B 2 T N w 1 0 0 B 1 N T w 2 2 0 0 A 2 T T c 3 1 0 D 3 T N w 2 1 0 D 2 T N w 1 2 0 0 0 B 1 N T w 2 0 1 D 2 T N w 1 1 1 D 1 N T w 2 1 0 C 2 T N w 1 1 2 1 0 C 1 N N c 0 2 2 0 0 A 0 N T w 1 2 0 0 0 B 1 N T w 2 0 1 D 2 T N w 1 2 2 0 0 A 1 N T w 2 0 0 C 2 T T c 3 0 1 D 3 T N w 2 1 0 D 2 T N w 1 2 2 0 0 A 1 N T w 2 ① 2021/3/12 ② aa’和bb’是b 1和b 2执 CS of USTC AN Hong 行后aa和bb的新值 ⑤ 15
Global Branch Prediction Scheme: 设计思想 路径 B 1 和 B 2的方向 B 3的实际结果 2021/3/12 CS of USTC AN Hong 16
Pros/Cons of Two-Level Branch Prediction n Pros: − Predicts correlated branch behavior that breaks other predictors l eqntott example − Better overall performance than purely address-based predictors n Cons: − Interference between unrelated branches with same history l example: all loop-end branches will map to same entry in pattern history table − sometimes this is a good thing 2021/3/12 CS of USTC AN Hong 17
Global Branch Prediction Scheme GAp(k) 又称(k, n)-gselect预测器 − (0,n)- gselect 预测器,即为 2 -位(双峰)预测器 − (M,0)- gselect预测器,即为GAg(M)预测器 Branch Address(PC) n = full address 2 -bits n-bits Denotation GAp(k) G: “Global” BHR A: adaptive p: “per-address” PHT k: BHR length 16 entries k=4 k-bits global BHR 2021/3/12 CS of USTC AN Hong per-address PHTs 18
Global Branch Prediction Scheme: Global BHR/pre-set PHTs Branch Address(PC) n =10 2 -bits n-bits Denotation GAs(n, 2 k) G: “Global” BHR A: adaptive s: “per-set” PHT k: BHR length 16 entries k=4 k-bits global BHR per-set PHTs 2021/3/12 GAs(4, 1024) CS of USTC AN Hong 19
Global Branch Prediction Scheme: Gselect Two-level predictor(另一种画法) by S. T. Pan 1992 PHT Branch Address(PC) n-bits n n+k + k k-bits BHR 2021/3/12 CS of USTC AN Hong 20
Global Branch Prediction Scheme: Gshare Two-level predictor by Scott Mc. Farling 1993 PHT Branch Address n-bits n XOR n k k k-bits BHR 2021/3/12 CS of USTC AN Hong 21
Global Branch Prediction Scheme: Global BHR/Global PHT Denotation GAg(k) G: “Global” BHR A: adaptive g: “global” PHT k: BHR length Global branch predictor 2 -bits 2 k entries k-bits BHR = Branch History Register (Shift left when update) PHT = Pattern History Tables (2 -bit Saturating Up-down Counter ) 2021/3/12 GAg(k) CS of USTC AN Hong 22
Global Branch Prediction Scheme: Global BHR/Global PHT Problem: Two code sequences may have the same bit pattern in the BHR and thus index the same pattern in the PHT. Global branch Path History, Shift left when update 111100001111 2 -bits 4096 entries k = 12 global BHR global PHT 2021/3/12 GAg(12) CS of USTC AN Hong 23
Global Branch Prediction Scheme: Gshare Two-level predictor by Scott Mc. Farling 1993 n=8 k=8 Branch Address BHR n=4 k=4 gselect 4/4 0000 0001 00000001 0000 0000 1111 0000 1111 1000 0000 1111 0000 01111111 n k gselect 8/8 hashed 2021/3/12 CS of USTC AN Hong 24
Global Branch Prediction Scheme n Strategy − Based on the combined history of all recent branches − Based on a Shift register and a counter n Works well − when the direction taken by sequentially executed branches is highly correlated. − 11% additional accuracy (compared with 2 -bit scheme) at the extra hardware cost of one shift register n Example: if (x<1) {. . . } if (x>1) {. . . } 2021/3/12 if ( a = = 2) a=0; // b 1 if ( b = = 2) b = 0; if (a != b ) { … } CS of USTC AN Hong // b 2 // b 3 25
Two-level adaptive predictors: Variations 基于多个分支全局历 史的(基于相关的) 预测方法 Global PHT per-address PHTs per-set PHTs Global BHR GAg GAp GAs per-address BHT PAg PAp PAs per-set BHT SAg SAp SAs 基于单个分支局部历史 的(基于模式的)预测 方法 2021/3/12 CS of USTC AN Hong 26
Local Branch Prediction Scheme: pre-address BHR/global PHT Branch Address (PC) N-bits 2 -bits full address k-bits k=4 1100 16 entries 1100 per-address BHT 2021/3/12 global PHT PAg(4) CS of USTC AN Hong 27
Local Branch Prediction Scheme: SAg(n) Two-level adaptive predictor 2 -bits Branch Address (PC) n-bits n 2 k entries k-bits k 2 n entries PHT = Pattern History Tables (2 -bit Saturating Up-down Counter ) BHT = Branch History Table (Shift left when update) 2021/3/12 CS of USTC AN Hong SAg(k) 28
Local Branch Prediction Scheme: SAg(n) Two-level adaptive predictor 2 -bits Branch Address (PC) n-bits n=10 16 entries k-bits k=4 1100 1024 entries 1100 PHT BHT 2021/3/12 CS of USTC AN Hong SAg(4) 29
Local Branch Prediction Scheme: pre-address BHR/per-address PHTs Branch Address (PC) b 1 b 2 2 -bits per-address PHTs N-bits full address 2 -bits n-bits b 1 b 2 1100 per-address BHT 2021/3/12 k=4 16 entries PAp(4) CS of USTC AN Hong 30
Local Branch Prediction Scheme n Strategy: considers the history of each branch independently and takes advantage of repetitive patterns. n Works well: branches with simple repetitive patterns. n Example: − for (I=1, I<=4; I++){ } 2021/3/12 //its pattern is (1110)n CS of USTC AN Hong 31
性能比较 @SPEC 89 2021/3/12 CS of USTC AN Hong 32
性能比较 @SPEC 89 2021/3/12 CS of USTC AN Hong 33
混合预测算法的例子:Alpha 21264的分支预测器 per-set BHT global PHT Global history prediction GAg(12) k =10 n=10 k = 12 Selector (1 12) Global BHR Local history prediction SAg(10) 2021/3/12 CS of USTC AN Hong 34
- Principles of high-performance processor design
- 01:640:244 lecture notes - lecture 15: plat, idah, farad
- Intel pentium processor architecture
- Application of dsp
- Basic processor architecture
- Dsp algorithms and architecture notes
- Intel core processor architecture
- Risc
- Cell processor architecture
- Scalable processor architecture
- Performance management lecture
- Wbb99
- Architecture lecture notes
- Microarchitecture vs isa
- Performance management vs performance appraisal
- Performance appraisal process
- All performance attributes designated as joint performance
- Computer architecture performance evaluation methods
- Response time in computer architecture
- Cloud gaming architecture
- Decruitment options
- Sand: towards high-performance serverless computing
- Mhpcc
- Linux os high performance
- High performance work practices examples
- Hplc principles
- Laptops for high performance computing
- High performance nutrition
- Mttf
- Ceph distributed file system
- Ceph: a scalable, high-performance distributed file system
- Anatomy of high-performance matrix multiplication
- High performance development model
- High performance organization principles
- Adaptive insertion policies for high performance caching