1 TAGESCL Branch Predictors Andr Seznec INRIAIRISA The

1 TAGE-SC-L Branch Predictors André Seznec INRIA/IRISA

The TAGE-SC-L branch predictor Sorry, nothing really new. . • TAGE, JILP 2006 § • Considered as state-of-the-art global history predictor Can be augmented with small adjunct predictors Loop predictor: CBP-2 (2006) Statistical Corrector + Loop Predictor, Global history CBP-3 (2011) Local history Micro 2011 2

3 Optimized all parameters • Number, size, width of the tables • Types of the histories for the statistical components All that for decreasing the misprediction number by 3% !!

4 PPC +Global history Global, local, skeleton histories (Main) TAGE Predictor Prediction + Confidence Stat. Cor. Loop Predictor

TAGE: multiple tables, global history predictor 5 The set of history lengths forms a geometric series Capture correlation on very long histories {0, 2, 4, 8, 16, 32, 64, 128} most of the storage for short history !!

TAGE: 6 Tagged and prediction by the longest history matching entry pc h[0: L 1] pc ctr tag u pc h[0: L 2] ctr tag =? 1 1 u ctr tag =? 1 1 1 1 Tagless base predictor pc h[0: L 3] prediction 1 u

7 Miss Hit Pred 1 1 = 1 ? 1 = ? 1 1 1 Hit Altpred 1 = ? 1

8 Prediction computation • General case: § • Longest matching component provides the prediction Special case: § Many mispredictions on newly allocated entries: weak Ctr On many applications, Altpred more accurate than Pred § Property dynamically monitored through 4 -bit counters

9 A tagged table entry • Ctr: 3 -bit prediction counter • U: 2 -bit counters § • Was the entry recently useful ? Tag: partial tag U Tag Ctr

10 Allocate entries on mispredictions • Allocate entries in longer history length tables § On tables with U unset • Set Ctr to Weak and U to 0 • Limited storage budget: § § • Allocate 2 entries for 256 Kbits Allocate 1 or 2 for 32 Kbits UNLIMITED STORAGE BUDGET: § multiple entries allocated in different tables

11 Managing the (U)seful counter • Increment when avoids a misprediction § (Pred = taken) & (Alt ≠ taken) • 256 K: Global decrement if « difficult » to allocate • 32 K: Probabilistic decrement when conflict • Unlimited: don’t care

12 Adjunct predictors • TAGE tracks strong correlation with the global branch history • Small adjunct predictors to capture some missed correlation: § § Loop predictor Statistical Corrector

13 The loop predictor • Predict loop with constant number of iterations: § § § 16/32 entries less than 5 bytes per entry Capture loops with long bodies and/or irregular internal branches S: 1. 2 % M: 1 % U: 0. 4% Good tradeoff for the Championship Implementation: Not that great

14 The Statistical Corrector predictor • Branches with poor correlation with global history: § • Sometimes better predicted by a single wide PC indexed counter than by TAGE More generally, track cases such that: § « In this case (PC, history, prediction), TAGE is likely (>50 %) to mispredict »

Small predictor: very limited budget for the SC predictor • Just track the statistically PC biased branches § • 15 « TAGE predicts this direction on this branch, but in most cases this was wrong » The corrector filter: A small partially tagged associative table 1. 5 % misp. reduction: Much simpler than a loop predictor

16 Medium predictor « Statistically » correlated branches: • Not strongly correlated with the global history, but exhibit a bias • better predicted by averaging than tags neural tags Branches correlated with local history, but irregular global history pattern (on other branches) • TAGE does not learn the pattern

17 Multi. Gehl Statistical Correlator Predictor H TAGE PC Local hist. + H + LH PC Pred Gehl-like Prediction + ctr value Stat. Corr.

18 Why does it work • The bias table indexed with PC+TAGE output: § Correct (most of the time) High counter value Dominates, not many updates § Wrong Other counters can be trained Correlation (if it exists) can be captured

Multi. Gehl Statistical Correlator Predictor for the Championship 19 + RAS associated history + 2 different local histories + simple choser 6. 8 % misp reduction H TAGE PC Local hist. Prediction + ctr value Stat. Corr.

20 « Realistic » 256 Kbits TAGE-SC-L « Only » • 12 equal size TAGE tables + • (local hist. , global hist. ) 4 -tables SC • + loop predictor • No history tuning Only 2. 8 % extra mispredictions

21 SC for Unlimited predictor • GEHL based SC predictor: § Use any form of history information § Very long global § Mutiple local § « Skeleton » global history § § ignore some branches Recycle old ideas from the MAC-RHSP predictor (2004)

22 SC for unlimited predictor • 460 predictor tables + 10 choser tables § • Globally about 20 % less misp. than TAGE alone If one removes only : § § The bias: 1. 6 % for a single table All global history components: 3. 7 % All local history components: 3. 9 % The choser: 3. 2 %

23 Conclusion • • TAGE-SC-L fits (nearly) all storage sizes § 32 Kbits ≈ 64 Kbits CBP 1 champion on CBP 1 traces § 256 Kbits ≈ 512 Kbits CBP 3 champion on CBP 4 traces Unlimited predictor: § po. TAGE-SC does better
- Slides: 23