Lecture Outoforder Processors Topics a basic outoforder processor
Lecture: Out-of-order Processors • Topics: a basic out-of-order processor with issue queue, register renaming, and reorder buffer 1
Problem 0 • For the code snippet below, estimate the steady-state bpred accuracies for the default PC+4 prediction, the 1 -bit bimodal, 2 -bit bimodal, global, and local predictors. Assume that the global/local preds use 5 -bit histories. do { for (i=0; i<4; i++) { increment something } for (j=0; j<8; j++) { increment something } k++; } while (k < some large number) 2
Problem 0 • For the code snippet below, estimate the steady-state bpred accuracies for the default PC+4 prediction, the 1 -bit bimodal, 2 -bit bimodal, global, and local predictors. Assume that the global/local preds use 5 -bit histories. PC+4: 2/13 = 15% do { 1 b Bim: (2+6+1)/(4+8+1) for (i=0; i<4; i++) { = 9/13 = 69% increment something 2 b Bim: (3+7+1)/13 } = 11/13 = 85% for (j=0; j<8; j++) { Global: (4+7+1)/13 = 12/13 = 92% increment something (gets confused by 01111 } unless you take branch-PC k++; into account while indexing) } while (k < some large number) Local: (4+7+1)/13 = 12/13 = 92% 3
An Out-of-Order Processor Implementation Reorder Buffer (ROB) Instr 1 Instr 2 Instr 3 Instr 4 Instr 5 Instr 6 Branch prediction and instr fetch R 1+R 2 R 1+R 3 BEQZ R 2 R 3 R 1+R 2 R 1 R 3+R 2 Instr Fetch Queue Decode & Rename T 1 T 2 T 3 T 4 T 5 T 6 T 1 R 1+R 2 T 2 T 1+R 3 BEQZ T 2 T 4 T 1+T 2 T 5 T 4+T 2 Register File R 1 -R 32 ALU ALU Results written to ROB and tags broadcast to IQ Issue Queue (IQ) 4
Problem 1 • Show the renamed version of the following code: Assume that you have 4 rename registers T 1 -T 4 R 1 R 2+R 3 R 4+R 5 BEQZ R 1 + R 3 R 1 + R 3 5
Problem 1 • Show the renamed version of the following code: Assume that you have 4 rename registers T 1 -T 4 R 1 R 2+R 3 R 4+R 5 BEQZ R 1 + R 3 R 1 + R 3 T 1 R 2+R 3 T 2 R 4+R 5 BEQZ T 1 T 4 T 1+T 2 T 1 T 4+T 2 T 1 +R 3 6
Design Details - I • Instructions enter the pipeline in order • No need for branch delay slots if prediction happens in time • Instructions leave the pipeline in order – all instructions that enter also get placed in the ROB – the process of an instruction leaving the ROB (in order) is called commit – an instruction commits only if it and all instructions before it have completed successfully (without an exception) • To preserve precise exceptions, a result is written into the register file only when the instruction commits – until then, the result is saved in a temporary register in the ROB 7
Design Details - II • Instructions get renamed and placed in the issue queue – some operands are available (T 1 -T 6; R 1 -R 32), while others are being produced by instructions in flight (T 1 -T 6) • As instructions finish, they write results into the ROB (T 1 -T 6) and broadcast the operand tag (T 1 -T 6) to the issue queue – instructions now know if their operands are ready • When a ready instruction issues, it reads its operands from T 1 -T 6 and R 1 -R 32 and executes (out-of-order execution) • Can you have WAW or WAR hazards? By using more names (T 1 -T 6), name dependences can be avoided 8
Design Details - III • If instr-3 raises an exception, wait until it reaches the top of the ROB – at this point, R 1 -R 32 contain results for all instructions up to instr-3 – save registers, save PC of instr-3, and service the exception • If branch is a mispredict, flush all instructions after the branch and start on the correct path – mispredicted instrs will not have updated registers (the branch cannot commit until it has completed and the flush happens as soon as the branch completes) • Potential problems: ? 9
Managing Register Names Temporary values are stored in the register file and not the ROB Logical Registers R 1 -R 32 Physical Registers P 1 -P 64 At the start, R 1 -R 32 can be found in P 1 -P 32 Instructions stop entering the pipeline when P 64 is assigned R 1+R 2 R 1+R 3 BEQZ R 2 R 3 R 1+R 2 P 33 P 1+P 2 P 34 P 33+P 3 BEQZ P 34 P 35 P 33+P 34 What happens on commit? 10
The Commit Process • On commit, no copy is required • The register map table is updated – the “committed” value of R 1 is now in P 33 and not P 1 – on an exception, P 33 is copied to memory and not P 1 • An instruction in the issue queue need not modify its input operand when the producer commits • When instruction-1 commits, we no longer have any use for P 1 – it is put in a free pool and a new instruction can now enter the pipeline for every instr that commits, a new instr can enter the pipeline number of in-flight instrs is a constant = number of extra (rename) registers 11
The Alpha 21264 Out-of-Order Implementation Reorder Buffer (ROB) Instr 1 Instr 2 Instr 3 Instr 4 Instr 5 Instr 6 Branch prediction and instr fetch R 1+R 2 R 1+R 3 BEQZ R 2 R 3 R 1+R 2 R 1 R 3+R 2 Instr Fetch Queue Committed Reg Map R 1 P 1 R 2 P 2 Register File P 1 -P 64 Decode & Rename Speculative Reg Map R 1 P 36 R 2 P 34 P 33 P 1+P 2 P 34 P 33+P 3 BEQZ P 34 P 35 P 33+P 34 P 36 P 35+P 34 Issue Queue (IQ) ALU ALU Results written to regfile and tags broadcast to IQ 12
Problem 2 • Show the renamed version of the following code: Assume that you have 36 physical registers and 32 architected registers R 1 R 2+R 3 R 4+R 5 BEQZ R 1 + R 3 R 1 + R 3 R 4 R 3 + R 1 13
Problem 2 • Show the renamed version of the following code: Assume that you have 36 physical registers and 32 architected registers R 1 R 2+R 3 R 4+R 5 BEQZ R 1 + R 3 R 1 + R 3 R 4 R 3 + R 1 P 33 P 2+P 3 P 34 P 4+P 5 BEQZ P 33 P 35 P 33+P 34 P 36 P 35+P 34 P 1 P 36+P 34 P 3 P 1+P 36 14
Problem 3 • Show the renamed version of the following code: Assume that you have 36 physical registers and 32 architected registers. When does each instr leave the IQ? R 1 R 2+R 3 R 1+R 5 BEQZ R 1 R 4 + R 5 R 4 R 1 + R 7 R 1 R 6 + R 8 R 4 R 3 + R 1 R 5 + R 9 15
Problem 3 • Show the renamed version of the following code: Assume that you have 36 physical registers and 32 architected registers. When does each instr leave the IQ? R 1 R 2+R 3 P 33 P 2+P 3 cycle i R 1+R 5 P 34 P 33+P 5 i+1 BEQZ R 1 BEQZ P 34 i+2 R 1 R 4 + R 5 P 35 P 4+P 5 i R 4 R 1 + R 7 P 36 P 35+P 7 i+1 R 1 R 6 + R 8 P 1 P 6+P 8 j R 4 R 3 + R 1 P 33 P 3+P 1 j+1 R 1 R 5 + R 9 P 34 P 5+P 9 j+2 Width is assumed to be 4. j depends on the #stages between issue and commit. 16
OOO Example IQ • Assume there are 36 physical registers and 32 logical registers, and width is 4 • Estimate the issue time, completion time, and commit time for the sample code 17
Assumptions IQ • Perfect branch prediction, instruction fetch, caches • ADD dep has no stall; LD dep has one stall • An instr is placed in the IQ at the end of its 5 th stage, an instr takes 5 more stages after leaving the IQ (ld/st instrs take 6 more stages after leaving the IQ) 18
OOO Example IQ Original code ADD R 1, R 2, R 3 LD R 2, 8(R 1) ADD R 2, 8 ST R 1, (R 3) SUB R 1, R 5 LD R 1, 8(R 2) ADD R 1, R 2 Renamed code 19
OOO Example IQ Original code ADD R 1, R 2, R 3 LD R 2, 8(R 1) ADD R 2, 8 ST R 1, (R 3) SUB R 1, R 5 LD R 1, 8(R 2) ADD R 1, R 2 Renamed code ADD P 33, P 2, P 3 LD P 34, 8(P 33) ADD P 35, P 34, 8 ST P 33, (P 3) SUB P 36, P 33, P 5 Must wait 20
OOO Example IQ Original code ADD R 1, R 2, R 3 LD R 2, 8(R 1) ADD R 2, 8 ST R 1, (R 3) SUB R 1, R 5 LD R 1, 8(R 2) ADD R 1, R 2 Renamed code ADD P 33, P 2, P 3 LD P 34, 8(P 33) ADD P 35, P 34, 8 ST P 33, (P 3) SUB P 36, P 33, P 5 LD P 1, 8(P 35) ADD P 2, P 1, P 35 In. Q Iss Comp Comm 21
OOO Example IQ Original code ADD R 1, R 2, R 3 LD R 2, 8(R 1) ADD R 2, 8 ST R 1, (R 3) SUB R 1, R 5 LD R 1, 8(R 2) ADD R 1, R 2 Renamed code ADD P 33, P 2, P 3 LD P 34, 8(P 33) ADD P 35, P 34, 8 ST P 33, (P 3) SUB P 36, P 33, P 5 LD P 1, 8(P 35) ADD P 2, P 1, P 35 In. Q i i i+1 i+7 i+9 Iss Comp Comm i+1 i+6 i+2 i+8 i+4 i+9 i+2 i+8 i+9 i+2 i+7 i+9 i+8 i+14 i+10 i+15 22
Title • Bullet 23
- Slides: 23