Transparent Control Independence TCI Ahmed S AlZawawi Vimal

  • Slides: 34
Download presentation
Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H.

Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer Engineering *North Carolina State University, Raleigh, NC *Digital Enterprise Group *Intel Corporation, Hillsboro, OR NC STATE UNIVERSITY

Harmonic mean IPC Effect 18 of branch mispredictions 16 14 12 10 8 6

Harmonic mean IPC Effect 18 of branch mispredictions 16 14 12 10 8 6 4 2 0 Perfect SS 4 n n 8 12 16 20 24 Issue Width 28 32 Branch misprediction rate of 5%-10% still a problem Each misprediction squash’s 100 s of inst. Reduces performance: limits window size Increases power: useless speculative work © 2007 Ahmed S. Al-Zawawi ISCA 34 2 NC STATE UNIVERSITY

Control independence basics © 2007 Ahmed S. Al-Zawawi ISCA 34 3 NC STATE UNIVERSITY

Control independence basics © 2007 Ahmed S. Al-Zawawi ISCA 34 3 NC STATE UNIVERSITY

Control independence basics © 2007 Ahmed S. Al-Zawawi ISCA 34 4 NC STATE UNIVERSITY

Control independence basics © 2007 Ahmed S. Al-Zawawi ISCA 34 4 NC STATE UNIVERSITY

Control independence basics © 2007 Ahmed S. Al-Zawawi ISCA 34 5 NC STATE UNIVERSITY

Control independence basics © 2007 Ahmed S. Al-Zawawi ISCA 34 5 NC STATE UNIVERSITY

Control independence basics © 2007 Ahmed S. Al-Zawawi ISCA 34 6 NC STATE UNIVERSITY

Control independence basics © 2007 Ahmed S. Al-Zawawi ISCA 34 6 NC STATE UNIVERSITY

Four steps for exploiting CI © 2007 Ahmed S. Al-Zawawi ISCA 34 7 NC

Four steps for exploiting CI © 2007 Ahmed S. Al-Zawawi ISCA 34 7 NC STATE UNIVERSITY

Four steps for exploiting CI 1. Identify reconv. point © 2007 Ahmed S. Al-Zawawi

Four steps for exploiting CI 1. Identify reconv. point © 2007 Ahmed S. Al-Zawawi ISCA 34 8 NC STATE UNIVERSITY

Four steps for exploiting CI 1. Identify reconv. point 2. Remove/Insert CD inst. ©

Four steps for exploiting CI 1. Identify reconv. point 2. Remove/Insert CD inst. © 2007 Ahmed S. Al-Zawawi ISCA 34 9 NC STATE UNIVERSITY

Four steps for exploiting CI 1. Identify reconv. point 2. Remove/Insert CD inst. 3.

Four steps for exploiting CI 1. Identify reconv. point 2. Remove/Insert CD inst. 3. Identify CIDD inst. © 2007 Ahmed S. Al-Zawawi ISCA 34 10 NC STATE UNIVERSITY

Four steps for exploiting CI 1. Identify reconv. point 2. Remove/Insert CD inst. 3.

Four steps for exploiting CI 1. Identify reconv. point 2. Remove/Insert CD inst. 3. Identify CIDD inst. 4. Repair CIDD inst. a) b) Fix data dependencies Re-execute CIDD inst. © 2007 Ahmed S. Al-Zawawi ISCA 34 11 NC STATE UNIVERSITY

Conventional CI misprediction recovery Wrong CD instructions CIDD instructions R CIDI-supplied CI value inst.

Conventional CI misprediction recovery Wrong CD instructions CIDD instructions R CIDI-supplied CI value inst. CD inst. source Insert correct Identify Squash Re-execute CD wrong instructions wrong CD CIDD inst. CDin instructions: instructions and middle CIDD of inst. the window: Re-reference Repair values program from CIDI orderinstructions © 2007 Ahmed S. Al-Zawawi ISCA 34 12 NC STATE UNIVERSITY

Conventional CI limitations 1. Program order between CD & CI inst: Fine-grain retirement using

Conventional CI limitations 1. Program order between CD & CI inst: Fine-grain retirement using ROB requires reordering the correct CD inst. with the CI inst. 2. Dependence order between CIDD & CIDI inst. : Re-executing CIDD instructions requires preserving referenced CIDI instructions Goal of selective misprediction recovery: Fully decouple CIDI instructions from CD & CIDD instructions © 2007 Ahmed S. Al-Zawawi ISCA 34 13 NC STATE UNIVERSITY

TCI misprediction recovery Recovery Duplicate Correctprogram CD inst. CIDD inst. R CI inst. CD

TCI misprediction recovery Recovery Duplicate Correctprogram CD inst. CIDD inst. R CI inst. CD inst. Repair program state using self-sufficient recovery program No need to Insert identify duplicate correct wrong. CIDD CD CD instructions and instructions CIDD instructions while order like relaxing any newprogram instructions © 2007 Ahmed S. Al-Zawawi ISCA 34 14 NC STATE UNIVERSITY

TCI misprediction recovery R branch checkpoint CIDI-supplied source values Checkpoint 2 Checkpoint 1 CIDD

TCI misprediction recovery R branch checkpoint CIDI-supplied source values Checkpoint 2 Checkpoint 1 CIDD Duplicate instructions Recovery Correctprogram CD inst. CIDD inst. source value Checkpoint-based retirement enables aggressive register reclamation (e. g. , CPR): Leverage In-order Exploit branch coarse-grain retirement checkpointed checkpoint is checkpoint-based for notsource correct possible values CDwhen instructions to Completed instructions free their resources retirement instructions mimic the to effect are relax out ordering ofof program constraints order © 2007 Ahmed S. Al-Zawawi ISCA 34 15 NC STATE UNIVERSITY

Transparent Control Independence n n TCI repairs program state, not program order TCI pipeline

Transparent Control Independence n n TCI repairs program state, not program order TCI pipeline is recovery-free q n Transparent recovery by fetching additional instructions with checkpointed source values TCI pipeline is free-flowing q q Leverage conventional speculation to execute correct and incorrect instructions quickly and efficiently Completed instructions free their resources © 2007 Ahmed S. Al-Zawawi ISCA 34 16 NC STATE UNIVERSITY

TCI microarchitecture n n Add repair rename map Add selective re-execution buffer (RXB) ©

TCI microarchitecture n n Add repair rename map Add selective re-execution buffer (RXB) © 2007 Ahmed S. Al-Zawawi ISCA 34 17 NC STATE UNIVERSITY

Predict the branch Instructions execute and leave the pipeline when done © 2007 Ahmed

Predict the branch Instructions execute and leave the pipeline when done © 2007 Ahmed S. Al-Zawawi ISCA 34 18 NC STATE UNIVERSITY

Construct recovery program Copy duplicate of CIDD inst. with their source values into RXB

Construct recovery program Copy duplicate of CIDD inst. with their source values into RXB © 2007 Ahmed S. Al-Zawawi ISCA 34 19 NC STATE UNIVERSITY

Insert correct CD instructions Load branch checkpoint into repair rename map, then fetch correct

Insert correct CD instructions Load branch checkpoint into repair rename map, then fetch correct CD inst. © 2007 Ahmed S. Al-Zawawi ISCA 34 20 NC STATE UNIVERSITY

Repair & re-execute CIDD instructions Inject duplicate CIDD inst. with their checkpointed source values

Repair & re-execute CIDD instructions Inject duplicate CIDD inst. with their checkpointed source values © 2007 Ahmed S. Al-Zawawi ISCA 34 21 NC STATE UNIVERSITY

Merge repair & spec. rename maps Copy corrected register mappings from repair map to

Merge repair & spec. rename maps Copy corrected register mappings from repair map to spec. map © 2007 Ahmed S. Al-Zawawi ISCA 34 22 NC STATE UNIVERSITY

TCI implementation details 1. Identifying CIDD instructions: q q Control-flow stack (CFS) detects nested

TCI implementation details 1. Identifying CIDD instructions: q q Control-flow stack (CFS) detects nested reconv. points Influenced register set (IRS) and branch-sets 2. RXB reconstruction: q q CIDD inst. of multiple branches are co-mingled A misprediction may require repairing RXB 3. Renaming partial programs: q Re-rename recovery program despite its CIDI gaps 4. Merging repair/speculative rename maps © 2007 Ahmed S. Al-Zawawi ISCA 34 23 NC STATE UNIVERSITY

Example: construct the RXB n n B 1 & B 2 are branches R

Example: construct the RXB n n B 1 & B 2 are branches R 1 & R 2 are reconvergent points Rectangular inst. are CIDD on B 1 Oval inst. are CIDD on B 2 © 2007 Ahmed S. Al-Zawawi ISCA 34 24 NC STATE UNIVERSITY

Example: reconstructing the RXB n Objective Rollbackof. RXB this n n example: tail, like

Example: reconstructing the RXB n Objective Rollbackof. RXB this n n example: tail, like complete squash Fetch Dispatch correct 11 pre-read 12 CD: 11 and 12 B 2 Initiate Inject recovery RXB program pointer for Don’t Meanwhile Insert insert 12 into pre-read 11 the into RXB: the 16 RXB: to Temp Buffer Start Reconstruct fetching RXB correct for B 1 CD CIDI w. r. t. CIDD w. r. t. B 1& B 2 © 2007 Ahmed S. Al-Zawawi ISCA 34 25 NC STATE UNIVERSITY

Example: reconstructing the RXB n n Fetch correct Reconvergence Dispatch 13 CD: 14 point

Example: reconstructing the RXB n n Fetch correct Reconvergence Dispatch 13 CD: 14 point 13 and detected 14 Don’t insert Correct Insert Meanwhile 14 CD into pre-read complete 13 the into RXB: the 18 RXB: to Temp Buffer CIDI w. r. t. CIDD w. r. t. B 1& B 2 © 2007 Ahmed S. Al-Zawawi ISCA 34 26 NC STATE UNIVERSITY

Example: reconstructing the RXB n n n Don’t dispatch Dispatch 18: 16: 20: CIDD

Example: reconstructing the RXB n n n Don’t dispatch Dispatch 18: 16: 20: CIDD Not CIDD w. r. t. B 2 Begin B 2 recovery renaming program CIDDinjection instructions complete from Temp Buffer Don’t Insert insert 20 16 into 18 the into RXB: theis. RXB: B 1 recovery program maintained and Not CIDD w. r. t. B 1 20 into Temp Buffer Meanwhile compressed pre-read © 2007 Ahmed S. Al-Zawawi ISCA 34 27 NC STATE UNIVERSITY

Simulation methodology n Baseline: q q q q n Checkpoint-based superscalar processor Issue width:

Simulation methodology n Baseline: q q q q n Checkpoint-based superscalar processor Issue width: 4 Perceptron branch predictor Register file: 256 registers Branch checkpoints: 16 Load store queue: 512 entries L 1 I & L 1 D: 64 KB 4 -way (Hit: 1 cycle) L 2: 2 MB 8 -way (Hit: 10 cycles, Miss: 200 cycles) Benchmarks: 11 SPEC 2000 INT + 4 SPEC 95 INT Sim. Point: 10 M inst. warm-up + 100 M inst. simulated © 2007 Ahmed S. Al-Zawawi ISCA 34 28 NC STATE UNIVERSITY

CIDD inst. re-renaming models n Seq CIDD (TCI): q n Seq CI: [Akkary et

CIDD inst. re-renaming models n Seq CIDD (TCI): q n Seq CI: [Akkary et al. ] [Chou et al. ] [Rotenberg et al. ] q n All CI inst. are re-renamed, but only CIDD inst. re-execute Proxy: [Cher et al. ] [Gandhi et al. ] q q q n Only CIDD inst. are re-renamed and re-executed Uses proxy move instructions to insulate CIDD inst. from source name changes Only proxies are re-renamed Both proxies and CIDD inst. re-execute by holding issue queue entries All models have relaxed order through checkpoint-based substrate © 2007 Ahmed S. Al-Zawawi ISCA 34 29 NC STATE UNIVERSITY

Results for 32 & 64 entries issue queue TCI Proxy TCImaximum average Seq Proxy

Results for 32 & 64 entries issue queue TCI Proxy TCImaximum average Seq Proxy CI%IPC can %IPC degrade improvement performance isisis 16%(16%) 61%(64%) 6%(11%) © 2007 Ahmed S. Al-Zawawi ISCA 34 30 NC STATE UNIVERSITY

Varying the issue queue size Harmonic mean IPC 2. 8 2. 6 2. 4

Varying the issue queue size Harmonic mean IPC 2. 8 2. 6 2. 4 2. 2 2. 0 Seq CIDD (TCI) Seq CI 1. 8 Proxy Base 1. 6 16 32 64 Issue Queue Size 128 256 Seq Proxy CIisisisbandwidth efficient, inefficient, but resource inefficient TCI both bandwidth and resource efficient © 2007 Ahmed S. Al-Zawawi ISCA 34 31 NC STATE UNIVERSITY

Varying the RXB size TCI overcomes onlythe buffering In Seq CI, problem the RXB

Varying the RXB size TCI overcomes onlythe buffering In Seq CI, problem the RXB by limits window. CIDD size inst. © 2007 Ahmed S. Al-Zawawi ISCA 34 32 NC STATE UNIVERSITY

Conclusion n Recover program state, not program order q n Resource efficient q n

Conclusion n Recover program state, not program order q n Resource efficient q n Transparent branch misprediction recovery using fully decoupled recovery program All instructions execute, drain, and free resources quickly based on conventional speculation Bandwidth efficient q TCI only re-sequences CIDD instructions © 2007 Ahmed S. Al-Zawawi ISCA 34 33 NC STATE UNIVERSITY

Questions NC STATE UNIVERSITY

Questions NC STATE UNIVERSITY