CSE 490590 Computer Architecture ILP II Steve Ko
- Slides: 19
CSE 490/590 Computer Architecture ILP II Steve Ko Computer Sciences and Engineering University at Buffalo CSE 490/590, Spring 2011
Last time… • Register renaming – Overcoming the restriction caused by the # of registers – Reorder buffer & renaming table • Precise interrupts – It must appear as if an interrupt has occurred in-between two instructions CSE 490/590, Spring 2011 2
Precise Interrupts It must appear as if an interrupt is taken between two instructions (say Ii and Ii+1) • the effect of all instructions up to and including Ii is totally complete • no effect of any instruction after Ii has taken place The interrupt handler either aborts the program or restarts it at Ii+1. CSE 490/590, Spring 2011 3
Phases of Instruction Execution PC I-cache Fetch Buffer Issue Buffer Func. Units Result Buffer Arch. State Fetch: Instruction bits retrieved from cache. Decode: Instructions placed in appropriate issue (aka “dispatch”) stage buffer Execute: Instructions and operands sent to execution units. When execution completes, all results and exception flags are available. Commit: Instruction irrevocably updates architectural state (aka “graduation” or “completion”). CSE 490/590, Spring 2011 4
In-Order Commit for Precise Exceptions In-order Fetch Out-of-order Kill Commit Reorder Buffer Decode In-order Kill Execute Inject handler PC Exception? • Instructions fetched and decoded into instruction reorder buffer in-order • Execution is out-of-order ( out-of-order completion) • Commit (write-back to architectural state, i. e. , regfile & memory, is in-order Temporary storage needed to hold results before commit (shadow registers and store buffers) CSE 490/590, Spring 2011 5
Extensions for Precise Exceptions Inst# use exec op p 1 src 1 p 2 src 2 pd dest data cause ptr 2 next to commit ptr 1 next available Reorder buffer • add <pd, dest, data, cause> fields in the instruction template • commit instructions to reg file and memory in program order buffers can be maintained circularly • on exception, clear reorder buffer by resetting ptr 1=ptr 2 (stores must wait for commit before updating memory) CSE 490/590, Spring 2011 6
Rollback and Renaming Register File (now holds only committed state) Ins# use exec op p 1 src 1 p 2 src 2 pd dest t 1 t 2. . tn data Reorder buffer Load Unit FU FU FU Store Unit Commit < t, result > Register file does not contain renaming tags any more. How does the decode stage find the tag of a source register? Search the “dest” field in the reorder buffer CSE 490/590, Spring 2011 7
Renaming Table Rename Table r 1 t r 2 tag valid bit v Ins# use exec op p 1 Register File src 1 p 2 src 2 pd dest t 1 t 2. . tn data Reorder buffer Load Unit FU FU FU Store Unit Commit < t, result > Renaming table is a cache to speed up register name look up. It needs to be cleared after each exception taken. When else are valid bits cleared? Control transfers CSE 490/590, Spring 2011 8
Control Flow Penalty Next fetch started PC I-cache Modern processors may have > 10 pipeline stages between next PC calculation and branch resolution ! Fetch Buffer Fetch Decode Issue Buffer Func. Units Branch executed Result Buffer Execute Commit Arch. State CSE 490/590, Spring 2011 9
MIPS Branches and Jumps Each instruction fetch depends on one or two pieces of information from the preceding instruction: 1) Is the preceding instruction a taken branch? 2) If so, what is the target address? Instruction Taken known? Target known? J After Inst. Decode JR After Inst. Decode After Reg. Fetch BEQZ/BNEZ After Reg. Fetch* After Inst. Decode *Assuming zero detect on register read CSE 490/590, Spring 2011 10
Branch Penalties in Modern Pipelines Ultra. SPARC-III instruction fetch pipeline stages (in-order issue, 4 -way superscalar, 750 MHz, 2000) A Branch Target Address Known Branch Direction & Jump Register Target Known J R PC Generation/Mux Instruction Fetch Stage 1 Instruction Fetch Stage 2 Branch Address Calc/Begin Decode Complete Decode Steer Instructions to Functional units Register File Read E Integer Execute P F B I Remainder of execute pipeline (+ another 6 stages) CSE 490/590, Spring 2011 11
Reducing Control Flow Penalty Software solutions • Eliminate branches - loop unrolling Increases the run length • Reduce resolution time - instruction scheduling Compute the branch condition as early as possible (of limited value) Hardware solutions • Find something else to do - delay slots Replaces pipeline bubbles with useful work (requires software cooperation) • Speculate - branch prediction Speculative execution of instructions beyond the branch CSE 490/590, Spring 2011 12
CSE 490/590 Administrivia • Project 1 & midterm grading mostly done – Will distribute on Wed – Regrading -> Jangyoung • Project 2 – Start early! CSE 490/590, Spring 2011 13
Branch Prediction Motivation: Branch penalties limit performance of deeply pipelined processors Modern branch predictors have high accuracy (>95%) and can reduce branch penalties significantly Required hardware support: Prediction structures: • Branch history tables, branch target buffers, etc. Mispredict recovery mechanisms: • Keep result computation separate from commit • Kill instructions following branch in pipeline • Restore state to state following branch CSE 490/590, Spring 2011 14
Static Branch Prediction Overall probability a branch is taken is ~60 -70% but: backward 90% forward 50% JZ JZ ISA can attach preferred direction semantics to branches, e. g. , Motorola MC 88110 bne 0 (preferred taken) beq 0 (not taken) ISA can allow arbitrary choice of statically predicted direction, e. g. , HP PA-RISC, Intel IA-64 typically reported as ~80% accurate CSE 490/590, Spring 2011 15
Dynamic Branch Prediction learning based on past behavior Temporal correlation The way a branch resolves may be a good predictor of the way it will resolve at the next execution Spatial correlation Several branches may resolve in a highly correlated manner (a preferred path of execution) CSE 490/590, Spring 2011 16
Branch Prediction Bits • Assume 2 BP bits per instruction • Change the prediction after two consecutive mistakes! taken take right ¬ taken ¬take wrong taken ¬take right ¬ taken take wrong ¬ taken BP state: (predict take/¬take) x (last prediction right/wrong) CSE 490/590, Spring 2011 17
Branch History Table Fetch PC 00 k I-Cache BHT Index 2 k-entry BHT, 2 bits/entry Instruction Opcode offset + Branch? Target PC Taken/¬Taken? 4 K-entry BHT, 2 bits/entry, ~80 -90% correct predictions CSE 490/590, Spring 2011 18
Acknowledgements • These slides heavily contain material developed and copyright by – Krste Asanovic (MIT/UCB) – David Patterson (UCB) • And also by: – – Arvind (MIT) Joel Emer (Intel/MIT) James Hoe (CMU) John Kubiatowicz (UCB) • MIT material derived from course 6. 823 • UCB material derived from course CS 252 CSE 490/590, Spring 2011 19
- Ilp computer architecture
- Ilp in computer architecture
- Dlp fo-plp
- Ilp computer architecture
- 490590
- Cse 490
- Steve jobs, steve wozniak, and ronald wayne
- Bus architecture in computer organization
- Tvsilp
- Pentium 4 block diagram
- Ilp machine learning
- Career cruising ilp
- Isolierte extremitätenperfusion ilp
- Ilp
- Ilp
- Compiler techniques for exposing ilp
- Ilp
- Http //ilp/fp2
- Ilp/fp/generic
- Difference computer organization and architecture