Lecture 8 Dynamic ILP Topics outoforder processors See

  • Slides: 13
Download presentation
Lecture 8: Dynamic ILP • Topics: out-of-order processors (See class notes) • HW 3

Lecture 8: Dynamic ILP • Topics: out-of-order processors (See class notes) • HW 3 is posted, due on Tuesday 1

An Out-of-Order Processor Implementation Reorder Buffer (ROB) Instr 1 Instr 2 Instr 3 Instr

An Out-of-Order Processor Implementation Reorder Buffer (ROB) Instr 1 Instr 2 Instr 3 Instr 4 Instr 5 Instr 6 Branch prediction and instr fetch R 1+R 2 R 1+R 3 BEQZ R 2 R 3 R 1+R 2 R 1 R 3+R 2 Instr Fetch Queue Decode & Rename T 1 T 2 T 3 T 4 T 5 T 6 T 1 R 1+R 2 T 2 T 1+R 3 BEQZ T 2 T 4 T 1+T 2 T 5 T 4+T 2 Register File R 1 -R 32 ALU ALU Results written to ROB and tags broadcast to IQ Issue Queue (IQ) 2

Design Details - I • Instructions enter the pipeline in order • No need

Design Details - I • Instructions enter the pipeline in order • No need for branch delay slots if prediction happens in time • Instructions leave the pipeline in order – all instructions that enter also get placed in the ROB – the process of an instruction leaving the ROB (in order) is called commit – an instruction commits only if it and all instructions before it have completed successfully (without an exception) • To preserve precise exceptions, a result is written into the register file only when the instruction commits – until then, the result is saved in a temporary register in the ROB 3

Design Details - II • Instructions get renamed and placed in the issue queue

Design Details - II • Instructions get renamed and placed in the issue queue – some operands are available (T 1 -T 6; R 1 -R 32), while others are being produced by instructions in flight (T 1 -T 6) • As instructions finish, they write results into the ROB (T 1 -T 6) and broadcast the operand tag (T 1 -T 6) to the issue queue – instructions now know if their operands are ready • When a ready instruction issues, it reads its operands from T 1 -T 6 and R 1 -R 32 and executes (out-of-order execution) • Can you have WAW or WAR hazards? By using more names (T 1 -T 6), name dependences can be avoided 4

Design Details - III • If instr-3 raises an exception, wait until it reaches

Design Details - III • If instr-3 raises an exception, wait until it reaches the top of the ROB – at this point, R 1 -R 32 contain results for all instructions up to instr-3 – save registers, save PC of instr-3, and service the exception • If branch is a mispredict, flush all instructions after the branch and start on the correct path – mispredicted instrs will not have updated registers (the branch cannot commit until it has completed and the flush happens as soon as the branch completes) • Potential problems: ? 5

Managing Register Names Temporary values are stored in the register file and not the

Managing Register Names Temporary values are stored in the register file and not the ROB Logical Registers R 1 -R 32 Physical Registers P 1 -P 64 At the start, R 1 -R 32 can be found in P 1 -P 32 Instructions stop entering the pipeline when P 64 is assigned R 1+R 2 R 1+R 3 BEQZ R 2 R 3 R 1+R 2 P 33 P 1+P 2 P 34 P 33+P 3 BEQZ P 34 P 35 P 33+P 34 What happens on commit? 6

The Commit Process • On commit, no copy is required • The register map

The Commit Process • On commit, no copy is required • The register map table is updated – the “committed” value of R 1 is now in P 33 and not P 1 – on an exception, P 33 is copied to memory and not P 1 • An instruction in the issue queue need not modify its input operand when the producer commits • When instruction-1 commits, we no longer have any use for P 1 – it is put in a free pool and a new instruction can now enter the pipeline for every instr that commits, a new instr can enter the pipeline number of in-flight instrs is a constant = number of extra (rename) registers 7

The Alpha 21264 Out-of-Order Implementation Reorder Buffer (ROB) Instr 1 Instr 2 Instr 3

The Alpha 21264 Out-of-Order Implementation Reorder Buffer (ROB) Instr 1 Instr 2 Instr 3 Instr 4 Instr 5 Instr 6 Branch prediction and instr fetch R 1+R 2 R 1+R 3 BEQZ R 2 R 3 R 1+R 2 R 1 R 3+R 2 Instr Fetch Queue Committed Reg Map R 1 P 1 R 2 P 2 Register File P 1 -P 64 Decode & Rename Speculative Reg Map R 1 P 36 R 2 P 34 P 33 P 1+P 2 P 34 P 33+P 3 BEQZ P 34 P 35 P 33+P 34 P 36 P 35+P 34 Issue Queue (IQ) ALU ALU Results written to regfile and tags broadcast to IQ 8

Out-of-Order Loads/Stores Ld R 1 [R 2] Ld R 3 [R 4] St R

Out-of-Order Loads/Stores Ld R 1 [R 2] Ld R 3 [R 4] St R 5 [R 6] Ld R 7 [R 8] Ld R 9 [R 10] What if the issue queue also had load/store instructions? Can we continue executing instructions out-of-order? 9

Memory Dependence Checking Ld 0 x abcdef Ld St Ld Ld 0 x abcdef

Memory Dependence Checking Ld 0 x abcdef Ld St Ld Ld 0 x abcdef St 0 x abcd 00 Ld 0 x abc 000 Ld 0 x abcd 00 • The issue queue checks for register dependences and executes instructions as soon as registers are ready • Loads/stores access memory as well – must check for RAW, WAW, and WAR hazards for memory as well • Hence, first check for register dependences to compute effective addresses; then check for memory dependences 10

Memory Dependence Checking Ld 0 x abcdef Ld St Ld Ld 0 x abcdef

Memory Dependence Checking Ld 0 x abcdef Ld St Ld Ld 0 x abcdef St 0 x abcd 00 Ld 0 x abc 000 Ld 0 x abcd 00 • Load and store addresses are maintained in program order in the Load/Store Queue (LSQ) • Loads can issue if they are guaranteed to not have true dependences with earlier stores • Stores can issue only if we are ready to modify memory (can not recover if an earlier instr raises an exception) 11

The Alpha 21264 Out-of-Order Implementation Reorder Buffer (ROB) Instr 1 Committed Instr 2 Reg

The Alpha 21264 Out-of-Order Implementation Reorder Buffer (ROB) Instr 1 Committed Instr 2 Reg Map Instr 3 R 1 P 1 Instr 4 R 2 P 2 Instr 5 Instr 6 Instr 7 Branch prediction and instr fetch R 1+R 2 R 1+R 3 BEQZ R 2 R 3 R 1+R 2 R 1 R 3+R 2 LD R 4 8[R 3] ST R 4 8[R 1] Instr Fetch Queue Decode & Rename Speculative Reg Map R 1 P 36 R 2 P 34 P 33 P 1+P 2 P 34 P 33+P 3 BEQZ P 34 P 35 P 33+P 34 P 36 P 35+P 34 P 37 8[P 35] P 37 8[P 36] Issue Queue (IQ) P 37 [P 35 + 8] P 37 [P 36 + 8] LSQ Register File P 1 -P 64 ALU ALU Results written to regfile and tags broadcast to IQ ALU D-Cache 12

Title • Bullet 13

Title • Bullet 13