Lecture 10 Outoforder Processors Topics more ooo design

  • Slides: 21
Download presentation
Lecture 10: Out-of-order Processors • Topics: more ooo design details, timing, load-store queue •

Lecture 10: Out-of-order Processors • Topics: more ooo design details, timing, load-store queue • Midterm on Oct 18 th ? (or Oct 4 th) 1

OOO Example IQ • Assume there are 36 physical registers and 32 logical registers,

OOO Example IQ • Assume there are 36 physical registers and 32 logical registers, and width is 4 • Estimate the issue time, completion time, and commit time for the sample code 2

Assumptions IQ • Perfect branch prediction, instruction fetch, caches • ADD dep has no

Assumptions IQ • Perfect branch prediction, instruction fetch, caches • ADD dep has no stall; LD dep has one stall • An instr is placed in the IQ at the end of its 5 th stage, an instr takes 5 more stages after leaving the IQ (ld/st instrs take 6 more stages after leaving the IQ) 3

OOO Example IQ Original code ADD R 1, R 2, R 3 LD R

OOO Example IQ Original code ADD R 1, R 2, R 3 LD R 2, 8(R 1) ADD R 2, 8 ST R 1, (R 3) SUB R 1, R 5 LD R 1, 8(R 2) ADD R 1, R 2 Renamed code ADD P 33, P 2, P 3 LD P 34, 8(P 33) ADD P 35, P 34, 8 ST P 33, (P 3) SUB P 36, P 33, P 5 Must wait 4

OOO Example IQ Original code ADD R 1, R 2, R 3 LD R

OOO Example IQ Original code ADD R 1, R 2, R 3 LD R 2, 8(R 1) ADD R 2, 8 ST R 1, (R 3) SUB R 1, R 5 LD R 1, 8(R 2) ADD R 1, R 2 Renamed code ADD P 33, P 2, P 3 LD P 34, 8(P 33) ADD P 35, P 34, 8 ST P 33, (P 3) SUB P 36, P 33, P 5 In. Q i i i+1 Iss i+1 i+2 i+4 i+2 Comp Comm i+6 i+8 i+9 i+7 i+9 5

OOO Example IQ Original code ADD R 1, R 2, R 3 LD R

OOO Example IQ Original code ADD R 1, R 2, R 3 LD R 2, 8(R 1) ADD R 2, 8 ST R 1, (R 3) SUB R 1, R 5 LD R 1, 8(R 2) ADD R 1, R 2 Renamed code ADD P 33, P 2, P 3 LD P 34, 8(P 33) ADD P 35, P 34, 8 ST P 33, (P 3) SUB P 36, P 33, P 5 LD P 1, 8(P 35) ADD P 2, P 1, P 35 In. Q i i i+1 i+7 i+9 Iss Comp Comm i+1 i+6 i+2 i+8 i+4 i+9 i+2 i+8 i+9 i+2 i+7 i+9 i+8 i+14 i+10 i+15 6

The Alpha 21264 Out-of-Order Implementation Reorder Buffer (ROB) Instr 1 Instr 2 Instr 3

The Alpha 21264 Out-of-Order Implementation Reorder Buffer (ROB) Instr 1 Instr 2 Instr 3 Instr 4 Instr 5 Instr 6 Branch prediction and instr fetch R 1+R 2 R 1+R 3 BEQZ R 2 R 3 R 1+R 2 R 1 R 3+R 2 Instr Fetch Queue Committed Reg Map R 1 P 1 R 2 P 2 Register File P 1 -P 64 Decode & Rename Speculative Reg Map R 1 P 36 R 2 P 34 P 33 P 1+P 2 P 34 P 33+P 3 BEQZ P 34 P 35 P 33+P 34 P 36 P 35+P 34 Issue Queue (IQ) ALU ALU Results written to regfile and tags broadcast to IQ 7

Additional Details • When does the decode stage stall? When we either run out

Additional Details • When does the decode stage stall? When we either run out of registers, or ROB entries, or issue queue entries • Issue width: the number of instructions handled by each stage in a cycle. High issue width high peak ILP • Window size: the number of in-flight instructions in the pipeline. Large window size high ILP • No more WAR and WAW hazards because of rename registers – must only worry about RAW hazards 8

Branch Mispredict Recovery • On a branch mispredict, must roll back the processor state:

Branch Mispredict Recovery • On a branch mispredict, must roll back the processor state: throw away IFQ contents, ROB/IQ contents after branch • Committed map table is correct and need not be fixed • The speculative map table needs to go back to an earlier state • To facilitate this spec-map-table rollback, it is checkpointed at every branch 9

Waking Up a Dependent • In an in-order pipeline, an instruction leaves the decode

Waking Up a Dependent • In an in-order pipeline, an instruction leaves the decode stage when it is known that the inputs can be correctly received, not when the inputs are computed • Similarly, an instruction leaves the issue queue before its inputs are known, i. e. , wakeup is speculative based on the expected latency of the producer instruction 10

Out-of-Order Loads/Stores Ld R 1 [R 2] Ld R 3 [R 4] St R

Out-of-Order Loads/Stores Ld R 1 [R 2] Ld R 3 [R 4] St R 5 [R 6] Ld R 7 [R 8] Ld R 9 [R 10] What if the issue queue also had load/store instructions? Can we continue executing instructions out-of-order? 11

Memory Dependence Checking Ld 0 x abcdef Ld St Ld Ld 0 x abcdef

Memory Dependence Checking Ld 0 x abcdef Ld St Ld Ld 0 x abcdef St 0 x abcd 00 Ld 0 x abc 000 Ld 0 x abcd 00 • The issue queue checks for register dependences and executes instructions as soon as registers are ready • Loads/stores access memory as well – must check for RAW, WAW, and WAR hazards for memory as well • Hence, first check for register dependences to compute effective addresses; then check for memory dependences 12

Memory Dependence Checking Ld 0 x abcdef Ld St Ld Ld 0 x abcdef

Memory Dependence Checking Ld 0 x abcdef Ld St Ld Ld 0 x abcdef St 0 x abcd 00 Ld 0 x abc 000 Ld 0 x abcd 00 • Load and store addresses are maintained in program order in the Load/Store Queue (LSQ) • Loads can issue if they are guaranteed to not have true dependences with earlier stores • Stores can issue only if we are ready to modify memory (can not recover if an earlier instr raises an exception) – happens at commit 13

The Alpha 21264 Out-of-Order Implementation Reorder Buffer (ROB) Instr 1 Committed Instr 2 Reg

The Alpha 21264 Out-of-Order Implementation Reorder Buffer (ROB) Instr 1 Committed Instr 2 Reg Map Instr 3 R 1 P 1 Instr 4 R 2 P 2 Instr 5 Instr 6 Instr 7 Branch prediction and instr fetch R 1+R 2 R 1+R 3 BEQZ R 2 R 3 R 1+R 2 R 1 R 3+R 2 LD R 4 8[R 3] ST R 4 8[R 1] Instr Fetch Queue Decode & Rename Speculative Reg Map R 1 P 36 R 2 P 34 P 33 P 1+P 2 P 34 P 33+P 3 BEQZ P 34 P 35 P 33+P 34 P 36 P 35+P 34 P 37 8[P 35] P 37 8[P 36] Issue Queue (IQ) P 37 [P 35 + 8] P 37 [P 36 + 8] LSQ Register File P 1 -P 64 ALU ALU Results written to regfile and tags broadcast to IQ ALU D-Cache 14

Problem 1 • Consider the following LSQ and when operands are available. Estimate when

Problem 1 • Consider the following LSQ and when operands are available. Estimate when the address calculation and memory accesses happen for each ld/st. Assume no memory dependence prediction. Ad. Op St. Op Ad. Val Ad. Cal Mem. Acc LD R 1 [R 2] 3 abcd LD R 3 [R 4] 6 adde ST R 5 [R 6] 4 7 abba LD R 7 [R 8] 2 abce ST R 9 [R 10] 8 3 abba LD R 11 [R 12] 1 abba 15

Problem 1 • Consider the following LSQ and when operands are available. Estimate when

Problem 1 • Consider the following LSQ and when operands are available. Estimate when the address calculation and memory accesses happen for each ld/st. Assume no memory dependence prediction. Ad. Op St. Op Ad. Val Ad. Cal Mem. Acc LD R 1 [R 2] 3 abcd 4 5 LD R 3 [R 4] 6 adde 7 8 ST R 5 [R 6] 4 7 abba 5 commit LD R 7 [R 8] 2 abce 3 6 ST R 9 [R 10] 8 3 abba 9 commit LD R 11 [R 12] 1 abba 2 10 16

Problem 2 • Consider the following LSQ and when operands are available. Estimate when

Problem 2 • Consider the following LSQ and when operands are available. Estimate when the address calculation and memory accesses happen for each ld/st. Assume no memory dependence prediction. Ad. Op St. Op Ad. Val Ad. Cal Mem. Acc LD R 1 [R 2] 3 abcd LD R 3 [R 4] 6 adde ST R 5 [R 6] 5 7 abba LD R 7 [R 8] 2 abce ST R 9 [R 10] 1 4 abba LD R 11 [R 12] 2 abba 17

Problem 2 • Consider the following LSQ and when operands are available. Estimate when

Problem 2 • Consider the following LSQ and when operands are available. Estimate when the address calculation and memory accesses happen for each ld/st. Assume no memory dependence prediction. Ad. Op St. Op Ad. Val Ad. Cal Mem. Acc LD R 1 [R 2] 3 abcd 4 5 LD R 3 [R 4] 6 adde 7 8 ST R 5 [R 6] 5 7 abba 6 commit LD R 7 [R 8] 2 abce 3 7 ST R 9 [R 10] 1 4 abba 2 commit LD R 11 [R 12] 2 abba 3 5 18

Problem 3 • Consider the following LSQ and when operands are available. Estimate when

Problem 3 • Consider the following LSQ and when operands are available. Estimate when the address calculation and memory accesses happen for each ld/st. Assume memory dependence prediction. Ad. Op St. Op Ad. Val Ad. Cal Mem. Acc LD R 1 [R 2] 3 abcd LD R 3 [R 4] 6 adde ST R 5 [R 6] 4 7 abba LD R 7 [R 8] 2 abce ST R 9 [R 10] 8 3 abba LD R 11 [R 12] 1 abba 19

Problem 3 • Consider the following LSQ and when operands are available. Estimate when

Problem 3 • Consider the following LSQ and when operands are available. Estimate when the address calculation and memory accesses happen for each ld/st. Assume memory dependence prediction. Ad. Op St. Op Ad. Val Ad. Cal Mem. Acc LD R 1 [R 2] 3 abcd 4 5 LD R 3 [R 4] 6 adde 7 8 ST R 5 [R 6] 4 7 abba 5 commit LD R 7 [R 8] 2 abce 3 4 ST R 9 [R 10] 8 3 abba 9 commit LD R 11 [R 12] 1 abba 2 3/10 20

Title • Bullet 21

Title • Bullet 21