Branch Tap Improving Performance With Very Few Checkpoints
Branch. Tap Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control Patrick Akl and Andreas Moshovos AENAO Research Group Department of Electrical and Computer Engineering University of Toronto June 28 th, 2006 Branch. Tap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control 1/25
What Happens on a Branch Misprediction? Execution Timeline Predict a Branch Outcome Predicted Path Misprediction Discovered Correct Path Recover Processor State Redirect Fetch Resume Execution • We wish to make the recovery fast June 28 th, 2006 Branch. Tap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control 2/25
State-of-the-art recovery • Existing mechanisms – Reorder buffer based: slow – Instantaneous checkpoints: faster • Problem: can’t have enough checkpoints • State-of-the-art solution: checkpoint prediction – Allocate the few checkpoints judiciously • Another degree of freedom: speculation control – Sometimes deeper speculation = higher recovery cost • Can hurt performance – Throttle speculation June 28 th, 2006 Branch. Tap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control 3/25
Branch. Tap Results / Benefits • No additional checkpoints are needed • Dynamically adapts to application behavior • Improves performance for most programs – Misprediction performance penalty reduced by 28% on AVG • Branch. Tap comes “for free” – Very simple to implement – Better than more accurate checkpoint predictors June 28 th, 2006 Branch. Tap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control 4/25
Outline • Background • Branch. Tap • Methodology and Results • Summary June 28 th, 2006 Branch. Tap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control 5/25
State Recovery Example: Register Alias Table A add r 1, r 2, 100 B breq r 1, E C sub r 1, r 2 RAT Lg(# arch. regs) p 1 p 4 p 5 p 4 Architectural Register p 2 p 3 Renamed Code # arch. regs Original Code A add p 4, p 2, 100 B breq p 4, E C sub r 5, p 2 Physical Register June 28 th, 2006 Branch. Tap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control 6/25
ROB: Slow, Fine-Grain Recovery Each entry contains 1. Architectural destination register 2. Its previous RAT map Program Order 3. Undo RAT updates in reverse order B B 1. Misprediction discovered D I L RAT A V N I B B Reorder Buffer 2. Locate newest instruction • Too slow: recovery latency proportional to number of instructions to squash June 28 th, 2006 Branch. Tap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control 7/25
Global Checkpoints: Fast, Coarse-Grain Recovery Program Order checkpoint B Reorder Buffer 1. Misprediction discovered D I L RAT A V N I • Branch w/ GC: Recovery is “Instantaneous” June 28 th, 2006 Branch. Tap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control 8/25
Impact of More Checkpoints Concept Actual Implementation physical register ch Working Copy architectural register ec k po in ts RAT • More checkpoints ? – Power hungry structure – Increased delay • Only a few checkpoints can practically be implemented – Cannot always cover all branches June 28 th, 2006 Branch. Tap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control 9/25
Intelligent Checkpointing • State of the art solution – Checkpoint allocation: Allocate checkpoints at hard-topredict branches – Checkpoint management: Release checkpoints as soon as they are no longer needed • Use few checkpoints efficiently June 28 th, 2006 Branch. Tap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control 10/25
Conventional Mechanisms: Recovery Scenarios • Mispeculation on a branch w/ a GC: Direct recovery B B B ROB Fast Recovery checkpoint • Mispeculation on a branch w/o a GC: Indirect recovery B B B checkpoint ROB Slow Recovery • With intelligent checkpointing: • 30% Indirect recoveries 75% of performance loss June 28 th, 2006 Branch. Tap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control 11/25
Outline • Background • Branch. Tap • Methodology and Results • Summary June 28 th, 2006 Branch. Tap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control 12/25
Branch. Tap Motivation Low confidence branch No Wait Scenario B B ~ Recovery Cost B ROB checkpoint Misprediction discovered Wait Scenario B B checkpoint B ROB ~ Recovery Cost Sometimes, it is better to wait if no checkpoint is available June 28 th, 2006 Branch. Tap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control 13/25
Branch. Tap Concept • Key idea: stall when speculation is likely to deteriorate performance – Count the number of low confidence branches w/o a checkpoint – If it exceeds a threshold, stall • Threshold selection – Fixed • Varies greatly across programs • Can deteriorate performance significantly – Adaptive • Robust performance • Minimize recovery cost while conserving good speculation opportunities June 28 th, 2006 Branch. Tap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control 14/25
Threshold Adaptation Policy • Branch. Tap adapts across and within applications June 28 th, 2006 Branch. Tap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control 15/25
Outline • Background • Branch. Tap • Methodology and Results • Summary June 28 th, 2006 Branch. Tap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control 16/25
Results Overview • Performance w/o Checkpoints – Branch. Tap improves even with just an ROB • Performance w/ 4 Checkpoints – Branch. Tap improves over conventional recovery methods • Performance w/ Larger Checkpoint Predictors – Branch. Tap offers better performance than a 64 x larger predictor June 28 th, 2006 Branch. Tap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control 17/25
Methodology • Simulator based on Simplescalar • 24 SPEC CPU 2000 benchmarks • Reference Inputs • Processor configurations – 8 -way Oo. O core – Up to 1 K in-flight instructions – 1 K-entry confidence table for low confidence branch identification • 1 B committed instructions after skipping 100 B June 28 th, 2006 Branch. Tap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control 18/25
“Perfect Checkpointing” Configuration • A checkpoint is auto-magically taken at all mispredicted branches – All recoveries are fast • We report the “deterioration relative to perfect checkpointing” June 28 th, 2006 Branch. Tap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control 19/25
Performance with No Checkpoints better deterioration • Deterioration relative to “perfect checkpointing” -39% • Branch. Tap improves over conventional mechanisms • Adaptation leads to robust performance improvements June 28 th, 2006 Branch. Tap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control 20/25
Performance Evaluation with 4 Checkpoints better deterioration • Deterioration relative to “perfect checkpointing” -28% • Branch. Tap with 4 checkpoints is better than 6 checkpoints alone June 28 th, 2006 Branch. Tap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control 21/25
better deterioration Branch. Tap vs. Larger Checkpoint Predictors Branch. Tap confidence table size • Branch. Tap with a 1 K-entry confidence table and 4 GCs: – Higher performance than a 64 K-entry confidence table with 4 GCs – Lower complexity, virtually comes “for free” June 28 th, 2006 Branch. Tap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control 22/25
Outline • Background • Branch. Tap • Methodology and Results • Summary June 28 th, 2006 Branch. Tap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control 23/25
Summary • Performance with 4 (no) checkpoints – ~28 (39) % of misprediction penalty removed – Branch. Tap is robust: • Up to 6 (13) % better and max 1. 2 (0. 1) % worse than conventional mechanisms • Branch. Tap is very simple to implement – Few counters and comparators • Branch. Tap is better than other alternatives – BT + 1 K predictor better than a 64 K predictor alone – BT + 4 GCs better than 6 GCs alone June 28 th, 2006 Branch. Tap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control 24/25
Branch. Tap Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control Patrick Akl and Andreas Moshovos AENAO Research Group Department of Electrical and Computer Engineering University of Toronto {pakl, moshovos}@eecg. toronto. edu June 28 th, 2006 Branch. Tap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control 25/25
- Slides: 25