Transparent Control Independence TCI Ahmed S AlZawawi Vimal
- Slides: 34
Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer Engineering *North Carolina State University, Raleigh, NC *Digital Enterprise Group *Intel Corporation, Hillsboro, OR NC STATE UNIVERSITY
Harmonic mean IPC Effect 18 of branch mispredictions 16 14 12 10 8 6 4 2 0 Perfect SS 4 n n 8 12 16 20 24 Issue Width 28 32 Branch misprediction rate of 5%-10% still a problem Each misprediction squash’s 100 s of inst. Reduces performance: limits window size Increases power: useless speculative work © 2007 Ahmed S. Al-Zawawi ISCA 34 2 NC STATE UNIVERSITY
Control independence basics © 2007 Ahmed S. Al-Zawawi ISCA 34 3 NC STATE UNIVERSITY
Control independence basics © 2007 Ahmed S. Al-Zawawi ISCA 34 4 NC STATE UNIVERSITY
Control independence basics © 2007 Ahmed S. Al-Zawawi ISCA 34 5 NC STATE UNIVERSITY
Control independence basics © 2007 Ahmed S. Al-Zawawi ISCA 34 6 NC STATE UNIVERSITY
Four steps for exploiting CI © 2007 Ahmed S. Al-Zawawi ISCA 34 7 NC STATE UNIVERSITY
Four steps for exploiting CI 1. Identify reconv. point © 2007 Ahmed S. Al-Zawawi ISCA 34 8 NC STATE UNIVERSITY
Four steps for exploiting CI 1. Identify reconv. point 2. Remove/Insert CD inst. © 2007 Ahmed S. Al-Zawawi ISCA 34 9 NC STATE UNIVERSITY
Four steps for exploiting CI 1. Identify reconv. point 2. Remove/Insert CD inst. 3. Identify CIDD inst. © 2007 Ahmed S. Al-Zawawi ISCA 34 10 NC STATE UNIVERSITY
Four steps for exploiting CI 1. Identify reconv. point 2. Remove/Insert CD inst. 3. Identify CIDD inst. 4. Repair CIDD inst. a) b) Fix data dependencies Re-execute CIDD inst. © 2007 Ahmed S. Al-Zawawi ISCA 34 11 NC STATE UNIVERSITY
Conventional CI misprediction recovery Wrong CD instructions CIDD instructions R CIDI-supplied CI value inst. CD inst. source Insert correct Identify Squash Re-execute CD wrong instructions wrong CD CIDD inst. CDin instructions: instructions and middle CIDD of inst. the window: Re-reference Repair values program from CIDI orderinstructions © 2007 Ahmed S. Al-Zawawi ISCA 34 12 NC STATE UNIVERSITY
Conventional CI limitations 1. Program order between CD & CI inst: Fine-grain retirement using ROB requires reordering the correct CD inst. with the CI inst. 2. Dependence order between CIDD & CIDI inst. : Re-executing CIDD instructions requires preserving referenced CIDI instructions Goal of selective misprediction recovery: Fully decouple CIDI instructions from CD & CIDD instructions © 2007 Ahmed S. Al-Zawawi ISCA 34 13 NC STATE UNIVERSITY
TCI misprediction recovery Recovery Duplicate Correctprogram CD inst. CIDD inst. R CI inst. CD inst. Repair program state using self-sufficient recovery program No need to Insert identify duplicate correct wrong. CIDD CD CD instructions and instructions CIDD instructions while order like relaxing any newprogram instructions © 2007 Ahmed S. Al-Zawawi ISCA 34 14 NC STATE UNIVERSITY
TCI misprediction recovery R branch checkpoint CIDI-supplied source values Checkpoint 2 Checkpoint 1 CIDD Duplicate instructions Recovery Correctprogram CD inst. CIDD inst. source value Checkpoint-based retirement enables aggressive register reclamation (e. g. , CPR): Leverage In-order Exploit branch coarse-grain retirement checkpointed checkpoint is checkpoint-based for notsource correct possible values CDwhen instructions to Completed instructions free their resources retirement instructions mimic the to effect are relax out ordering ofof program constraints order © 2007 Ahmed S. Al-Zawawi ISCA 34 15 NC STATE UNIVERSITY
Transparent Control Independence n n TCI repairs program state, not program order TCI pipeline is recovery-free q n Transparent recovery by fetching additional instructions with checkpointed source values TCI pipeline is free-flowing q q Leverage conventional speculation to execute correct and incorrect instructions quickly and efficiently Completed instructions free their resources © 2007 Ahmed S. Al-Zawawi ISCA 34 16 NC STATE UNIVERSITY
TCI microarchitecture n n Add repair rename map Add selective re-execution buffer (RXB) © 2007 Ahmed S. Al-Zawawi ISCA 34 17 NC STATE UNIVERSITY
Predict the branch Instructions execute and leave the pipeline when done © 2007 Ahmed S. Al-Zawawi ISCA 34 18 NC STATE UNIVERSITY
Construct recovery program Copy duplicate of CIDD inst. with their source values into RXB © 2007 Ahmed S. Al-Zawawi ISCA 34 19 NC STATE UNIVERSITY
Insert correct CD instructions Load branch checkpoint into repair rename map, then fetch correct CD inst. © 2007 Ahmed S. Al-Zawawi ISCA 34 20 NC STATE UNIVERSITY
Repair & re-execute CIDD instructions Inject duplicate CIDD inst. with their checkpointed source values © 2007 Ahmed S. Al-Zawawi ISCA 34 21 NC STATE UNIVERSITY
Merge repair & spec. rename maps Copy corrected register mappings from repair map to spec. map © 2007 Ahmed S. Al-Zawawi ISCA 34 22 NC STATE UNIVERSITY
TCI implementation details 1. Identifying CIDD instructions: q q Control-flow stack (CFS) detects nested reconv. points Influenced register set (IRS) and branch-sets 2. RXB reconstruction: q q CIDD inst. of multiple branches are co-mingled A misprediction may require repairing RXB 3. Renaming partial programs: q Re-rename recovery program despite its CIDI gaps 4. Merging repair/speculative rename maps © 2007 Ahmed S. Al-Zawawi ISCA 34 23 NC STATE UNIVERSITY
Example: construct the RXB n n B 1 & B 2 are branches R 1 & R 2 are reconvergent points Rectangular inst. are CIDD on B 1 Oval inst. are CIDD on B 2 © 2007 Ahmed S. Al-Zawawi ISCA 34 24 NC STATE UNIVERSITY
Example: reconstructing the RXB n Objective Rollbackof. RXB this n n example: tail, like complete squash Fetch Dispatch correct 11 pre-read 12 CD: 11 and 12 B 2 Initiate Inject recovery RXB program pointer for Don’t Meanwhile Insert insert 12 into pre-read 11 the into RXB: the 16 RXB: to Temp Buffer Start Reconstruct fetching RXB correct for B 1 CD CIDI w. r. t. CIDD w. r. t. B 1& B 2 © 2007 Ahmed S. Al-Zawawi ISCA 34 25 NC STATE UNIVERSITY
Example: reconstructing the RXB n n Fetch correct Reconvergence Dispatch 13 CD: 14 point 13 and detected 14 Don’t insert Correct Insert Meanwhile 14 CD into pre-read complete 13 the into RXB: the 18 RXB: to Temp Buffer CIDI w. r. t. CIDD w. r. t. B 1& B 2 © 2007 Ahmed S. Al-Zawawi ISCA 34 26 NC STATE UNIVERSITY
Example: reconstructing the RXB n n n Don’t dispatch Dispatch 18: 16: 20: CIDD Not CIDD w. r. t. B 2 Begin B 2 recovery renaming program CIDDinjection instructions complete from Temp Buffer Don’t Insert insert 20 16 into 18 the into RXB: theis. RXB: B 1 recovery program maintained and Not CIDD w. r. t. B 1 20 into Temp Buffer Meanwhile compressed pre-read © 2007 Ahmed S. Al-Zawawi ISCA 34 27 NC STATE UNIVERSITY
Simulation methodology n Baseline: q q q q n Checkpoint-based superscalar processor Issue width: 4 Perceptron branch predictor Register file: 256 registers Branch checkpoints: 16 Load store queue: 512 entries L 1 I & L 1 D: 64 KB 4 -way (Hit: 1 cycle) L 2: 2 MB 8 -way (Hit: 10 cycles, Miss: 200 cycles) Benchmarks: 11 SPEC 2000 INT + 4 SPEC 95 INT Sim. Point: 10 M inst. warm-up + 100 M inst. simulated © 2007 Ahmed S. Al-Zawawi ISCA 34 28 NC STATE UNIVERSITY
CIDD inst. re-renaming models n Seq CIDD (TCI): q n Seq CI: [Akkary et al. ] [Chou et al. ] [Rotenberg et al. ] q n All CI inst. are re-renamed, but only CIDD inst. re-execute Proxy: [Cher et al. ] [Gandhi et al. ] q q q n Only CIDD inst. are re-renamed and re-executed Uses proxy move instructions to insulate CIDD inst. from source name changes Only proxies are re-renamed Both proxies and CIDD inst. re-execute by holding issue queue entries All models have relaxed order through checkpoint-based substrate © 2007 Ahmed S. Al-Zawawi ISCA 34 29 NC STATE UNIVERSITY
Results for 32 & 64 entries issue queue TCI Proxy TCImaximum average Seq Proxy CI%IPC can %IPC degrade improvement performance isisis 16%(16%) 61%(64%) 6%(11%) © 2007 Ahmed S. Al-Zawawi ISCA 34 30 NC STATE UNIVERSITY
Varying the issue queue size Harmonic mean IPC 2. 8 2. 6 2. 4 2. 2 2. 0 Seq CIDD (TCI) Seq CI 1. 8 Proxy Base 1. 6 16 32 64 Issue Queue Size 128 256 Seq Proxy CIisisisbandwidth efficient, inefficient, but resource inefficient TCI both bandwidth and resource efficient © 2007 Ahmed S. Al-Zawawi ISCA 34 31 NC STATE UNIVERSITY
Varying the RXB size TCI overcomes onlythe buffering In Seq CI, problem the RXB by limits window. CIDD size inst. © 2007 Ahmed S. Al-Zawawi ISCA 34 32 NC STATE UNIVERSITY
Conclusion n Recover program state, not program order q n Resource efficient q n Transparent branch misprediction recovery using fully decoupled recovery program All instructions execute, drain, and free resources quickly based on conventional speculation Bandwidth efficient q TCI only re-sequences CIDD instructions © 2007 Ahmed S. Al-Zawawi ISCA 34 33 NC STATE UNIVERSITY
Questions NC STATE UNIVERSITY
- Tci chapter 6 answers
- Ahmed muhudiin ahmed
- Vimal jerald
- Example of indo aryan jain temple is
- Iescape tci
- Lesson 19 the worlds of north and south answer key
- Tci an era of reform answer key
- Chapter 13: a growing sense of nationhood answer key pdf
- Tci
- North china plain map
- Tci map of china
- Tci life in the colonies
- Tci questionnaire
- Tci approach
- Dsl adalah
- Is lampshade transparent translucent or opaque
- The light is a form of
- Opaque translucent
- Transparent memory offloading
- Ibm cloud tiering
- Transparent bridging
- Transparent assignment template
- Vitreous humor composition
- Ray model of transparent
- Clearly explain what is meant by the term geometric optics
- Transparent layer that protects iris and pupil
- These are massless bundles of concentrated energy
- Refractive index of slimline 750
- Milieu translucide
- Opaque ray model
- Effective frequent-shopper programs are transparent.
- Transparent syllables
- Disadvantages of transparent concrete
- Disadvantages of transparent concrete
- Transparent bridge mode