Sequential Execution Semantics Contract How the machine appears













































- Slides: 45

Sequential Execution Semantics • Contract: How the machine appears to behave A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto

Sequential Semantics - Review • Instructions appear as if they executed: – In “program order” • As if they executed one after the other Program Order A. Moshovos © Pipelining Superscalar ECE 1773 - Fall ‘ 07 ECE Toronto Out-of-Order

Execution Order? addi r 2, r 1, 10 addi r 3, r 2, 20 add r 5, r 4, 30 add r 6, r 5, 40 A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto

Execution Order? Dependencies addi r 2, r 1, 10 addi r 3, r 2, 20 add r 5, r 4, 30 add r 6, r 5, 40 A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto

Execution Order? Pipelined and Superscalar Pipelining addi r 2, r 1, 10 addi r 3, r 2, 20 add r 5, r 4, 30 add r 6, r 5, 40 Superscalar addi r 2, r 1, 10 A. Moshovos © addi r 3, r 2, 20 add r 5, r 4, 30 add r 6, r 5, 40 ECE 1773 - Fall ‘ 07 ECE Toronto

Execution Order? Out-of-Order addi r 2, r 1, 10 Superscalar addi r 3, r 2, 20 add r 5, r 4, 30 add r 6, r 5, 40 Out-of-Order addi r 2, r 1, 10 addi r 3, r 2, 20 add r 5, r 4, 30 A. Moshovos © add r 6, r 5, 40 ECE 1773 - Fall ‘ 07 ECE Toronto

Out-of-Order Execution loop: add ld add sub bne r 4, r 2, r 3, r 1, r 4, 10(r 4) r 3, r 1, r 0, 1 r 2 1 loop Superscalar fetch decode add fetch decode sub fetch decode fetch decode A. Moshovos © sum += a[++m]; i--; } while (i != 0); add ld bne out-of-order fetch do { add ld add sub bne ECE 1773 - Fall ‘ 07 ECE Toronto

Sequential Semantics? • Execution does NOT adhere to sequential semantics inconsistent fetch decode fetch decode add ld add sub • To be precise: Eventually it may • Simplest solution: • Define problem away • IBM 360 did this • Not acceptable today: e. g. , Virtual Memory • Three-phase Instruction execution – In-Progress, Completed and Committed A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto bne consistent

Sequential Semantics? inconsistent Fetch I 1 decode Fetch I 2 decode Fetch I 3 decode Fetch I 4 decode Fetch I 5 decode add ld add sub consistent Enter empty empty Retire Program order Reorder buffer A. Moshovos © bne ECE 1773 - Fall ‘ 07 ECE Toronto

Sequential Semantics? inconsistent Fetch I 1 decode Fetch I 2 decode Fetch I 3 decode Fetch I 4 decode Fetch I 5 decode add ld add sub Enter I 1 F I 2 F I 3 F I 4 F I 5 F empty Retire Program order Reorder buffer A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto bne consistent

Sequential Semantics? inconsistent Fetch I 1 decode Fetch I 2 decode Fetch I 3 decode Fetch I 4 decode Fetch I 5 decode add ld add sub Enter (dest. Reg, old value) I 1 D I 2 D I 3 D I 4 D I 5 D empty Retire Program order Reorder buffer A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto bne consistent

Sequential Semantics? inconsistent Fetch I 1 decode Fetch I 2 decode Fetch I 3 decode Fetch I 4 decode Fetch I 5 decode add ld add sub Enter Retire I 1 c I 2 D I 3 D I 4 c I 5 D empty completed Program order Reorder buffer A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto bne consistent

Sequential Semantics? inconsistent Fetch I 1 decode Fetch I 2 decode Fetch I 3 decode Fetch I 4 decode Fetch I 5 decode add ld add sub Enter Retire committed empty I 2 c I 3 D I 4 c I 5 c empty completed Program order Reorder buffer A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto bne consistent

Sequential Semantics? inconsistent Fetch I 1 decode Fetch I 2 decode Fetch I 3 decode Fetch I 4 decode Fetch I 5 decode add ld add sub Enter Retire committed empty I 4 c I 5 c empty completed Program order Reorder buffer A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto bne consistent

Sequential Semantics? inconsistent Fetch I 1 decode Fetch I 2 decode Fetch I 3 decode Fetch I 4 decode Fetch I 5 decode add ld add sub Enter empty empty Retire Program order Reorder buffer A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto bne consistent

Sequential Semantics? WHAT IF SOMEONE STOPS US HERE? inconsistent Fetch I 1 decode Fetch I 2 decode Fetch I 3 decode Fetch I 4 decode Fetch I 5 decode add ld add sub Enter Retire committed empty I 2 c I 3 D I 4 c I 5 c empty completed Program order Reorder buffer A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto bne consistent

Sequential Semantics? Fetch I 1 decode Fetch I 2 decode Fetch I 3 decode Fetch I 4 decode Fetch I 5 decode add ld add sub bne consistent Enter A. Moshovos © empty I 2 c I 3 D I 4 c I 5 c empty empty Retire Undo changes Enter Program order Reorder buffer ECE 1773 - Fall ‘ 07 ECE Toronto

Preserving Sequential Semantics: Three phases • Instr. exec. in 3 phases: – In-progress, Completed, Committed – OOO for in-progress and Completed – In-order Commits • Completed - out-of-order: ”Visible only inside” – Results visible to subsequent instructions – Results not visible to outsiders • On interrupts completed results are discarded • Committed - in-order: ”Visible to all” – Results visible to subsequent instructions – Results visible to outsiders • On interrupt committed results are preserved A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto

Commit vs. Complete DIV R 3, _, _ ADD R 1, _, _ ADD _, R 1, _ fetch decode fetch decode out-of-order completes in-order commits Time in-order completes In-order commits add commit ld commit add sub commit bne complete A. Moshovos © commit ECE 1773 - Fall ‘ 07 ECE Toronto commit

Implementing Completes/Commits • Key idea: – Maintain sufficient state around to be able to roll-back when necessary – Roll-back: • Discard (aka Squash) all not committed • One solution (conceptual): History File – Upon Complete instruction records previous value of target register – Upon Discard, instruction restores target value – Upon Commit, nothing to do • Focus on scheduling mechanisms A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto

Solution #2: Future File • Two states: – Architectural • Updated only on commit – Speculative • Updated on complete • Normally use Speculative • On exception: – Flush Speculative – Use Architectural • Implementing precise interrupts in pipelined processors • J. E. Smith and A. Pleszkun, • http: //www. eecg. toronto. edu/~moshovos/ACA 07/reading s/smith-interrupts. pdf • A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto

Out-of-Order Execution Overview Processing Phase Static program In-Progress Program Form Dispatch/ dependences dynamic inst. Stream (trace) inst execution inst. Reorder & commit Committed completed instructions A. Moshovos © Completed execution window inst. Issue ECE 1773 - Fall ‘ 07 ECE Toronto

Out-Of-Order Exec. Example loop: add ld add sub bne r 4, r 2, r 3, r 1, r 4, 10(r 4) r 3, r 1, r 0, op 4 4 cycles lat r 2 1 loop src 1 Reservation station A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto src 2 tgt status

Out-Of-Order Exec. Example loop: add ld add sub bne r 4, r 2, r 3, r 1, r 4, 10(r 4) r 3, r 1, r 0, 4 4 cycles lat r 2 1 loop RAV r 1 r 2 r 3 r 4 1 1 op src 1 Register Availability Vector AKA Scoreboard A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto src 2 tgt status

Out-Of-Order Exec. Example loop: add ld add sub bne r 4, r 2, r 3, r 1, r 4, 10(r 4) r 3, r 1, r 0, 4 4 cycles lat r 2 1 loop RAV r 1 r 2 r 3 r 4 1 1 op src 1 Cycle 0 A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto src 2 tgt status

Out-Of-Order Exec. Example: Cycle 0 loop: add ld add sub bne r 4, r 2, r 3, r 1, r 4, 10(r 4) r 3, r 1, r 0, 4 r 2 1 loop Ready to be executed RAV r 1 r 2 r 3 r 4 op src 1 src 2 tgt status 1 1 1 0 add r 4/1 NA/1 r 4/0 Rdy Cycle 0 A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto

Cycle 1 loop: add ld add sub bne r 4, r 2, r 3, r 1, r 4, 10(r 4) r 3, r 1, r 0, 4 r 2 1 loop Notify those waiting for R 4 RAV r 1 r 2 r 3 r 4 op src 1 src 2 tgt status 1 0 1 1 add r 4/1 NA/1 r 4 Exec ld r 4/1 NA/1 r 2 Rdy R 4 gets produced now A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto

Cycle 2 loop: add ld add sub bne r 4, r 2, r 3, r 1, r 4, 10(r 4) r 3, r 1, r 0, 4 r 2 1 loop Result available @ cycle 6 RAV r 1 r 2 r 3 r 4 op src 1 src 2 tgt status 1 0 0 1 add r 4/1 NA/1 r 4 Cmtd ld r 4/1 NA/1 r 2 Exec add r 3/1 r 2/0 r 3 Wait for r 2 A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto

Cycle 3 loop: add ld add sub bne r 4, r 2, r 3, r 1, r 4, 10(r 4) r 3, r 1, r 0, 4 r 2 1 loop Result available @ cycle 6 RAV r 1 r 2 r 3 r 4 op src 1 src 2 tgt status 0 0 0 1 add r 4/1 NA/1 r 4 Cmtd ld r 4/1 NA/1 r 2 Exec add r 3/1 r 2/0 r 3 Wait sub r 1/1 NA/1 r 1 Rdy Wait for r 2 No dependences A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto

Cycle 4 loop: add ld add sub bne r 4, r 2, r 3, r 1, r 4, 10(r 4) r 3, r 1, r 0, 4 r 2 1 loop Result available @ cycle 6 RAV r 1 r 2 r 3 r 4 op src 1 src 2 tgt status 1 0 0 1 add r 4/1 NA/1 r 4 Cmtd ld r 4/1 NA/1 r 2 Exec add r 3/1 r 2/0 r 3 Wait sub r 1/1 NA/1 r 1 Exec bne r 1/1 r 0/1 NA Rdy Wait for r 2 r 1 produced now Notify consumers r 1 will be available next cycle A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto

Cycle 5 loop: add ld add sub bne r 4, r 2, r 3, r 1, r 4, 10(r 4) r 3, r 1, r 0, 4 r 2 1 loop Result available @ cycle 6 RAV r 1 r 2 r 3 r 4 op src 1 src 2 tgt status 1 0 0 1 add r 4/1 NA/1 r 4 Cmtd ld r 4/1 NA/1 r 2 Exec add r 3/1 r 2/0 r 3 Wait sub r 1/1 NA/1 r 1 Compl bne r 1/1 r 0/1 NA Exec Wait for r 2 Completed executing A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto

Cycle 6 loop: add ld add sub bne r 4, r 2, r 3, r 1, r 4, 10(r 4) r 3, r 1, r 0, 4 r 2 1 loop RAV Result available @ cycle 6 Notify consumers r 1 r 2 r 3 r 4 op src 1 src 2 tgt status 1 1 0 1 add r 4/1 NA/1 r 4 Cmtd ld r 4/1 NA/1 r 2 Exec add r 3/1 r 2/1 r 3 Rdy sub r 1/1 NA/1 r 1 Compl bne r 1/1 r 0/1 NA Exec Wait for r 2 Completed executing A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto

Cycle 7 loop: add ld add sub bne r 4, r 2, r 3, r 1, r 4, 10(r 4) r 3, r 1, r 0, 4 r 2 1 loop Notify consumers RAV r 1 r 2 r 3 r 4 op src 1 src 2 tgt status 1 1 add r 4/1 NA/1 r 4 Cmtd ld r 4/1 NA/1 r 2 Cmtd add r 3/1 r 2/1 r 3 Exec sub r 1/1 NA/1 r 1 Compl bne r 1/1 r 0/1 NA Compl Executing Completed A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto

Cycle 8 loop: add ld add sub bne r 4, r 2, r 3, r 1, r 4, 10(r 4) r 3, r 1, r 0, 4 r 2 1 loop RAV r 1 r 2 r 3 r 4 op src 1 src 2 tgt status 1 1 add r 4/1 NA/1 r 4 Cmtd ld r 4/1 NA/1 r 2 Cmtd add r 3/1 r 2/1 r 3 Cmtd sub r 1/1 NA/1 r 1 Cmtd bne r 1/1 r 0/1 NA Cmtd A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto

instructions Window vs. Scheduler A. Moshovos © • Window – Distance between oldest and youngest instruction that can co-exist inside the CPU – Larger window Potential for more ILP • Scheduler – Number of instructions that are waiting to be issued • Window – Instructions enter at Fetch – Exit at Commit • Scheduler – Instructions enter at Decode – Leave at writeback/complete • Window >= Scheduler – Can be the same structure • In window but not in scheduler completed ECE 1773 - Fall ‘ 07 ECE Toronto

Speculative Execution • Why? – 1 every 5 insts is a branch (control flow) – Today’s processors have a window of 128 insts – Can’t wait until a branch is resolved to fetch next set of insts – Solution: Speculate A. Moshovos © nc bra lve so Re Fe tch bra nc h h Wrong squash ECE 1773 - Fall ‘ 07 ECE Toronto

Beyond Simple Oo. O A: LF B: LF C: MULF D: SUBF E: ADDF F 6, F 2, F 0, F 2, 34(R 2) 45(R 3) F 2, F 4 F 8, F 2, F 7, F 4 F 6 A B D C E • • E will wait for B, C and D. WAR w/ C and D WAW w/ B Can we do better? A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto

What if we had infinite registers A: LF B: LF C: MULF D: SUBF E: ADDF F 6, F 2, F 0, F 2, F 9, 34(R 2) 45(R 3) F 2, F 4 F 8, F 2, F 7, F 4 F 6 No false dependences anymore Since we do not reuse a name we can’t have WAW and WAR A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto

Register Renaming • Register Version – Every Write creates a new version – Uses read the last version – Need to keep a version until all uses have read it. • Register Renaming: – Architectural vs. Physical Registers • more phys. than arch. – Maintain a map of arch. to phys. regs. – Use in-order decoding to properly identify dependences. – Instructions wait only for input op. availability. – Only last version is written to reg. file. A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto

Register renaming example A add r 1, r 2, 100 B add r 3, r 1 C sub r 1, r 2 RAT Lg(# arch. regs) p 1 p 4 p 5 p 1 Architectural Register p 2 p 3 Renamed Code A add p 4, p 2, 100 B add p 5, p 4 C sub p 6 p 2, p 2 Physical Register A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto # arch. regs Original Code

Register renaming example • First rename input operands: A add r 1, r 2, 100 B add r 3, r 1 C sub r 1, r 2 RAT Lg(# arch. regs) p 1 p 4 p 5 p 1 Architectural Register p 2 p 3 Renamed Code A add p 4, p 2, 100 B add p 5, p 4 C sub p 6 p 2, p 2 Physical Register A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto # arch. regs Original Code

Register renaming example • Then rename destination A add r 1, r 2, 100 B add r 3, r 1 C sub r 1, r 2 RAT Lg(# arch. regs) p 1 p 4 p 5 p 4 Architectural Register p 2 p 3 Renamed Code A add p 4, p 2, 100 B add p 5, p 4 C sub p 6 p 2, p 2 ROB p 1 r 1 For recovery purposes A. Moshovos © From free list ECE 1773 - Fall ‘ 07 ECE Toronto Physical Register # arch. regs Original Code

Register renaming example • Rename inputs A add r 1, r 2, 100 B add r 3, r 1 C sub r 1, r 2 RAT Lg(# arch. regs) p 1 p 4 p 5 p 4 Architectural Register p 2 p 3 Renamed Code A add p 4, p 2, 100 B add p 5, p 4 C sub p 6 p 2, p 2 ROB p 1 r 1 For recovery purposes A. Moshovos © From free list ECE 1773 - Fall ‘ 07 ECE Toronto Physical Register # arch. regs Original Code

Register renaming example • Rename destination Original Code RAT A add r 1, r 2, 100 B add r 3, r 1 C sub r 1, r 2 Lg(# arch. regs) Architectural Register p 2 p 5 ROB p 1 r 3 p 3 For recovery purposes A. Moshovos © ECE 1773 - Fall ‘ 07 ECE Toronto Physical Register # arch. regs p 1 p 4 p 5 p 4 Renamed Code A add p 4, p 2, 100 B add p 5, p 4 C sub p 6 p 2, p 2 From free list

Bird’s Eye View of a Modern CPU Fetch Decode & Rename Dispatch I Cache Execution Units D Cache Reorder Buffer A. Moshovos © Execution ECE 1773 - Fall ‘ 07 ECE Toronto In order retirement Misspeculation handling