Microprocessor Microarchitecture Branch Predictor Optimization Lynn Choi School
Microprocessor Microarchitecture Branch Predictor Optimization Lynn Choi School of Electrical Engineering
Global vs. Local History q Global history schemes The last k conditional branches encountered Works well when the direction taken by sequentially executed branches is highly correlated - EX) if (x >1) then. . If (x<=1) then. . These are also called correlating predictors q Local history schemes The last k occurrences of the same branch Works well for branches with simple repetitive patterns Two types of contention - Branch history may reflect a mix of histories of all the branches that map to the same history entry - With 3 bits of history, cannot distinguish patterns of 0110 and 1110 6 However, if the first pattern is executed many times then followed by the second pattern many times, the counters can dynamically adjust
Local History Structure History Counts 110 11 PC Predict taken
Global History Structure 2 b counter arrays 11 GHR Predict taken
Global/Local/Bimodal Performance
Global Predictors with Index Sharing q Global predictor with index selection (gselect) Counter array is indexed with a concatenation of global history and branch address bits For small sizes, gselect parallels bimodal prediction Once there are enough address bits to identify most branches, more global history bits can be used, resulting in much better performance than global predictor q Global predictor with index sharing (gshare) Counter array is indexed with a hashing (XOR) of the branch address and global history - Eliminate redundancy in the counter index used by gselect
Gshare vs. Gselect
Gshare/Gselect Structure gshare GHR m m n XOR n n m+n PC gselect 11 Predict taken
Global History with Index Sharing Performance Mc. Fraling, Scott, 1993, Combining Branch Predictors , WRL Technical Note NT-36, Western Research Laboratory, California, USA
Combined Predictor Structure q These are also called tournament predictors Adaptively combine global and local predictors Mc. Fraling, Scott, 1993, Combining Branch Predictors , WRL Technical Note NT-36, Western Research Laboratory, California, USA
Combined Predictor Performance Scott Mc. Farling, DEC WRL, All rights reserved Mc. Fraling, Scott, 1993, Combining Branch Predictors , WRL Technical Note NT-36, Western Research Laboratory, California, USA
Exercises and Discussion Intel’s Xscale processor uses bimodal predictor? What state would you initialize? q Y/N Questions. Explain why. q Branch prediction is more important for FP applications. (Y/N) Why or Why not? Branch prediction is more difficult for conditional branches than indirect branches. (Y/N) Why or Why not? To predict branch targets, an instruction must be decoded first. (Y/N) Why or Why not? RSB stores target address of call instructions. (Y/N) Why or Why not? At the beginning of program execution, static branch prediction is more effective than dynamic branch prediction (Y/N) Why or Why not?
- Slides: 12