A 256 Kbits LTAGE branch predictor Andr Seznec
A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC 1 André Seznec Caps Team IRISA/INRIA
Directly derived from: A case for (partially) tagged branch predictors, A. Seznec and P. Michaud JILP Feb. 2006 + Tricks: Loop predictor Kernel/user histories 2 André Seznec Caps Team Irisa
TAGE: TAgged GEometric history length predictors The genesis 3 André Seznec Caps Team Irisa
Back around 2003 § 2 bcgskew was state-of-the-art, but: è but was lagging behind neural inspired predictors on a few benchmarks § Just wanted to get best of both behaviors and maintain: è Reasonable implementation cost: • Use only global history • Medium number of tables è In-time response 4 André Seznec Caps Team Irisa
The basis : A Multiple length global history predictor TO T 1 T 2 L(0) L(1) L(2) T 3 ? T 4 L(3) L(4) 5 André Seznec Caps Team Irisa
GEometric History Length predictor The set of history lengths forms a geometric series Capture correlation on very long histories {0, 2, 4, 8, 16, 32, 64, 128} most of the storage What is important: for short history !!L(i)-L(i-1) is drastically increasing 6 André Seznec Caps Team Irisa
Combining multiple predictions ? § Classical solution: è Use of a meta predictor “wasting” storage !? ! chosing among 5 or 10 predictions ? ? § Neural inspired predictors, Jimenez and Lin 2001 è Use an adder tree instead of a meta-predictor § Partial matching è Use tagged tables and the longest matching history Chen et al 96, Michaud 2005 7 André Seznec Caps Team Irisa
CBP-1 (2004): OGEHL Final computation through a sum TO T 1 T 2 L(0) L(1) L(2) T 3 ∑ T 4 L(3) L(4) 12 components 3. 670 misp/KI 8 Prediction=Sign André Seznec Caps Team Irisa
TAGE Geometric history length + PPM-like + optimized update policy pc h[0: L 1] pc hash tag ctr pc h[0: L 2] u hash ctr =? 1 pc h[0: L 3] hash tag u hash ctr =? 1 1 1 tag hash u =? 1 1 Tagless base predictor 1 9 prediction André Seznec Caps Team Irisa
Miss Hit Pred =? 1 1 =? 1 1 Hit Altpred 10 1 André Seznec Caps Team Irisa
Prediction computation § General case: è Longest matching component provides the prediction § Special case: è Many mispredictions on newly allocated entries: weak Ctr On many applications, Altpred more accurate than Pred è Property dynamically monitored through a single 4 -bit counter 11 André Seznec Caps Team Irisa
TAGE update policy § General principle: Minimize the footprint of the prediction. èJust update the longest history matching component and allocate at most one entry on mispredictions 12 André Seznec Caps Team Irisa
A tagged table entry § Ctr: 3 -bit prediction counter § U: 2 -bit useful counter è Was the entry recently useful ? § Tag: partial tag U Tag 13 Ctr André Seznec Caps Team Irisa
Updating the U counter If (Altpred ≠ Pred) then • Pred = taken : U= U + 1 • Pred ≠ taken : U = U - 1 Graceful aging: Periodic shift of all U counters • implemented through the reset of a single bit 14 André Seznec Caps Team Irisa
Allocating a new entry on a misprediction § Find a single “useless” entry with a longer history: è Priviledge the smallest possible history • To minimize footprint è But not too much • To avoid ping-pong phenomena § Initialize Ctr as weak and U as zero 15 André Seznec Caps Team Irisa
Improve the global history § Address + conditional branch history: è path confusion on short histories § Address + path: è Direct hashing leads to path confusion 1. Represent all branches in branch history 2. Use also path history ( 1 bit per branch, limited to 16 bits) 16 André Seznec Caps Team Irisa
Design tradeoff for CBP 2 (1) § 13 components: è Bring the best accuracy on distributed traces • 8 components not very far ! § History length: è Min=4 , Max = 640 Could use any Min in [2, 6] and any Max in [300, 2000] André Seznec 17 Caps Team Irisa
Design tradeoff for CBP 2 (2) § Tag width tradeoff: è (destructive) false match is better tolerated on shorter history è 7 bits on T 1 to 15 bits on T 12 § Tuning the number of table entries: è Smaller number for very long histories è Smaller number for very short histories 18 André Seznec Caps Team Irisa
Adding a loop predictor § The loop predictor captures the number of iterations of a loop è When successively encounters 4 times the same number of iterations, the loop predictor provides the prediction. § Advantages: è Very reliable è Small storage budget: 256 52 -bit entries § Complexity ? è Might be difficult to manage speculative iteration numbers on deep pipelines 19 André Seznec Caps Team Irisa
Using a kernel history and a user history § Traces mix user and kernel activities: è Kernel activity after exception • Global history pollution § Solution: use two separate global histories è User history is updated only in user mode è Kernel history is updated in both modes 20 André Seznec Caps Team Irisa
L-TAGE submission accuracy (distributed traces) 3. 314 misp/KI 21 André Seznec Caps Team Irisa
Reducing L-TAGE complexity § Included 241, 5 Kbits TAGE predictor: è 3. 368 misp/KI è Loop predictor beneficial only on gzip: Might not be worth the extra complexity 22 André Seznec Caps Team Irisa
Using less tables § 8 components 256 Kbits TAGE predictor: è 3. 446 misp/KI 23 André Seznec Caps Team Irisa
TAGE prediction computation time ? § 3 successive steps: è Index computation è Table read è Partial match + multiplexor § Does not fit on a single cycle: è But can be ahead pipelined ! 24 André Seznec Caps Team Irisa
Ahead pipelining a global history branch predictor (principle) § Initiate branch prediction X+1 cycles in advance to provide the prediction in time è Use information available: • X-block ahead instruction address • X-block ahead history § To ensure accuracy: è Use intermediate path information 25 André Seznec Caps Team Irisa
Practice A B C bc Ha Ahead pipelined TAGE: 4// prediction computations A 26 André Seznec Caps Team Irisa
3 -branch ahead pipelined 8 component 256 Kbits TAGE 3. 552 misp/KI 27 André Seznec Caps Team Irisa
A final case for the Geometric History Length predictors § delivers state-of-the-art accuracy § uses only global information: è Very long history: 300+ bits !! § can be ahead pipelined § many effective design points è OGEHL or TAGE è Nb of tables, history lengths 28 André Seznec Caps Team Irisa
The End 29 André Seznec Caps Team Irisa
- Slides: 29