CS 5100 Advanced Computer Architecture HardwareBased Speculation Prof
CS 5100 Advanced Computer Architecture Hardware-Based Speculation Prof. Chung-Ta King Department of Computer Science National Tsing Hua University, Taiwan (Slides are from textbook, Prof. Hsien-Hsin Lee, Prof. Yasun Hsu) National Tsing Hua University
About This Lecture • Goal: - To understand the issues remained unsolved by Tomasulo Algorithm - To understand the concepts and techniques of hardwarebased speculation • Outline: - Hardware-based speculation (Sec. 3. 6) National Tsing Hua University 1
OOO Commit in Tomasulo Algorithm In-order issue, but OOO execution, completion, commitment National Tsing Hua University 2
What’s Wrong with OOO Commitment? • OOO commit across branch speculative execution What if BNEZ after 1 st SD is evaluated to not-taken at cycle 17? but 2 nd MULTD has updated program state (F 4) at cycle 16! - Problem: program state is updated speculatively even before the dependent control statement is known to take National Tsing Hua University 3
What’s Wrong with OOO Commitment? • OOO commit under interrupt and resume F 4 - Suppose 1 st SD causes a page fault at cycle 17 - An interrupt is raised and hardware saves its PC - When return from interrupt, execution starts from SD (PC points to it) and all following instructions (LD, MULTD, SD) - LD and MULTD will be executed again and MULTD will use wrong value in F 4 imprecise interrupt National Tsing Hua University 4
Desired Features • Want speculative OOO execution beyond branches, but without any consequence - Speculation: execute an instruction before knowing it should be executed, e. g. , beyond a branch - Without consequences: speculative instructions (if wrongly speculated) must not alter the program states • Want OOO execution with precise interrupt - All instructions before the interrupted instruction must be completed and committed - The program state should appear as if no instruction issued after the interrupted instruction - Maintain “program state” as in sequential execution restart from saved PC with saved PC state National Tsing Hua University 5
Hardware-Based Speculation • Basic idea: - Execute instructions along predicted execution paths but only commit the results if prediction is correct - Allow OOO execution but commit in-order • Combine three ideas: - Dynamic OOO instruction scheduling (Tomasulo Algo. ) - Dynamic branch prediction - Speculative execution: execute instructions before all control dependencies are resolved • Extra hardware requirement: - Temporary storage to buffer speculative execution result until commit reorder buffer (ROB) National Tsing Hua University 6
Tomasulo Algorithm with Speculation • Key idea: add a commit phase - Issue - Execute - Write result: write results to CDB and store results in a hardware buffer (reorder buffer) - Commit: update register file or memory if no longer speculative National Tsing Hua University 7
Reorder Buffer (ROB) • HW buffer for results of uncommitted instructions - At least 3 fields: instruction, destination, value - Can be operand source virtual registers - Use reorder buffer # instead of reservation station # as tag ID on CDB - Supplies operands between FP execution complete and commit Op Queue - After instruction commits, result is put into register - Easy to undo speculated instr. Res Stations on mispredicted branches FP Adder or exceptions National Tsing Hua University Reorder Buffer FP Regs Res Stations FP Adder 8
Speculative Tomasulo Algorithm • Issue (IS): get instruction from FP op queue - If RS and ROB slot free, issue instruction and send operands and ROB no. for destination - Operands may come from register file or ROB (if not committed yet) • Execution (EX): operate on operands - When either operands not ready, watch CDB for result (check RAW) - When both operands in RS, execute National Tsing Hua University 9
Speculative Tomasulo Algorithm • Write result (WB): execution complete - Write on CDB to all awaiting RSs and ROB • Commit: update registers from reorder buffer - When instruction at head of ROB and result present, update register with result (or store to memory) - Remove instruction from ROB - Mispredicted branch flushes ROB National Tsing Hua University 10
Commit Step • New step for making instruction execution “visible” to the outside world - It “commits” the changes to the architectural state ROB head tail ROB A B C D E F G H J K commit National Tsing Hua University Outside World “sees”: ARF Architecture register file A executed B executed C executed D executed E executed Instructions executed out of program order, but outside world still “believes” it is in-order 11
Handling Incorrect Speculation • Instructions following mispredicted branch, i. e. those in decode/issue buffers & RSs, are invalidated • ROB entries of these instructions are deallocated • Restart fetch at correct branch successor Inject correct PC Branch Prediction Kill update Kill PC Fetch Decode Branch Resolution Kill Reorder Buffer Commit Complete Execute National Tsing Hua University 12
Handling Precise Exception/Interrupt • Must ensure exceptions/interrupts in program order for precise interrupt • Idea: take care of exceptions at commit time - If an instruction raises exception, wait until it reaches head of ROB and then takes interrupt, flushes any other pending instructions - Instructions behind it are re-executed - Because instructions commit in order, this yields a precise exception National Tsing Hua University 13
Handling Precise Exception/Interrupt In-order Fetch Decode Kill Inject handler PC Out-of-order In-order Reorder Buffer Commit Kill Execute Exception? • Instructions fetched and decoded into ROB in-order • Execution is out-of-order OOO completion • Commit (write-back to architectural state, i. e. , register file and memory) is in-order National Tsing Hua University 14
Tomasulo without Speculation Data addr Addr National Tsing Hua University 15
Tomasulo with Speculation via ROB Load addr National Tsing Hua University 16
Speculative Tomasulo with ROB dest value instn Done? ROB 7 FP Op Queue ROB 6 Newest ROB 5 Reorder Buffer ROB 4 ROB 3 ROB 2 F 0 LD F 0, 10(R 2) Registers Dest National Tsing Hua University ROB 1 For store To Memory From Memory Dest FP adders N Oldest For load ROB 1 10+R 2 Stall FP multipliers 17
Speculative Tomasulo with ROB dest value instn Done? ROB 7 FP Op Queue ROB 6 Newest ROB 5 Reorder Buffer ROB 4 ROB 3 F 10 F 0 ADDD F 10, F 4, F 0 LD F 0, 10(R 2) Registers National Tsing Hua University ROB 1 Oldest To Memory From Memory Dest FP adders ROB 2 For store ROB 2 ADDD R(F 4), LD 1 Dest N N For load ROB 1 10+R 2 FP multipliers 18
Speculative Tomasulo with ROB dest value instn Done? ROB 7 FP Op Queue ROB 6 Newest ROB 5 Reorder Buffer ROB 4 F 2 F 10 F 0 DIVD F 2, F 10, F 6 ADDD F 10, F 4, F 0 LD F 0, 10(R 2) Registers ROB 2 ADDD R(F 4), LD 1 Dest National Tsing Hua University ROB 3 ROB 2 ROB 1 Oldest For store ROB 3 DIVD ADD 1, R(F 6) To Memory From Memory Dest FP adders N N N For load ROB 1 10+R 2 FP multipliers 19
Speculative Tomasulo with ROB dest value instn Done? ROB 7 FP Op Queue Reorder Buffer F 0 F 4 -F 2 F 10 F 0 ADDD F 0, F 4, F 6 LD F 4, 0(R 3) BNE F 0, <…> DIVD F 2, F 10, F 6 ADDD F 10, F 4, F 0 LD F 0, 10(R 2) Registers ROB 3 DIVD Dest National Tsing Hua University ROB 6 ROB 5 ROB 4 ROB 3 ROB 2 ROB 1 Oldest For store ROB 2 ADDD R(F 4), LD 1 ROB 6 ADDD LD 2, R(F 6) FP adders N N N Newest ADD 1, R(F 6) To Memory From Memory Dest FP multipliers For load ROB 1 10+R 2 ROB 5 0+R 3 20
Speculative Tomasulo with ROB FP Op Queue Reorder Buffer dest value -F 0 F 4 -F 2 F 10 F 0 instn ST 0(R 3), F 4 ADDD F 0, F 4, F 6 LD F 4, 0(R 3) BNE F 0, <…> DIVD F 2, F 10, F 6 ADDD F 10, F 4, F 0 LD F 0, 10(R 2) Registers ROB 3 DIVD Dest National Tsing Hua University Newest Oldest For store ROB 2 ADDD R(F 4), LD 1 ROB 6 ADDD LD 2, R(F 6) FP adders Done? N ROB 7 N ROB 6 N ROB 5 N ROB 4 N ROB 3 N ROB 2 N ROB 1 ADD 1, R(F 6) To Memory From Memory Dest FP multipliers For load ROB 1 10+R 2 ROB 5 0+R 3 21
Speculative Tomasulo with ROB FP Op Queue Reorder Buffer dest value -F 0 F 4 M[10] -F 2 F 10 F 0 instn ST 0(R 3), F 4 ADDD F 0, F 4, F 6 LD F 4, 0(R 3) BNE F 0, <…> DIVD F 2, F 10, F 6 ADDD F 10, F 4, F 0 LD F 0, 10(R 2) Registers ROB 3 DIVD Dest National Tsing Hua University Newest Oldest For store ROB 2 ADDD R(F 4), LD 1 ROB 6 ADDD M[10], R(F 6) FP adders Done? N ROB 7 N ROB 6 Y ROB 5 N ROB 4 N ROB 3 N ROB 2 N ROB 1 ADD 1, R(F 6) To Memory From Memory Dest For load ROB 1 10+R 2 FP multipliers 22
Speculative Tomasulo with ROB FP Op Queue Reorder Buffer dest value -F 0 F 4 M[10] -F 2 F 10 F 0 M[50] instn ST 0(R 3), F 4 ADDD F 0, F 4, F 6 LD F 4, 0(R 3) BNE F 0, <…> DIVD F 2, F 10, F 6 ADDD F 10, F 4, F 0 LD F 0, 10(R 2) Registers ROB 2 ADDD R(F 4), M[50] Dest National Tsing Hua University Newest Oldest For store ROB 3 DIVD ADD 1, R(F 6) To Memory From Memory Dest FP adders Done? N ROB 7 Ex ROB 6 Y ROB 5 N ROB 4 N ROB 3 N ROB 2 Y ROB 1 For load FP multipliers 23
Speculative Tomasulo with ROB FP Op Queue Reorder Buffer dest value -F 0 <val 2> F 4 M[10] -F 2 F 10 instn ST 0(R 3), F 4 ADDD F 0, F 4, F 6 LD F 4, 0(R 3) BNE F 0, <…> DIVD F 2, F 10, F 6 ADDD F 10, F 4, F 0 ROB 1 Program state changed Registers M[50] F 0 ROB 3 DIVD Dest ADD 1, R(F 6) National Tsing Hua University Newest Oldest For store To Memory From Memory Dest FP adders Done? N ROB 7 Y ROB 6 Y ROB 5 N ROB 4 N ROB 3 Ex ROB 2 For load FP multipliers 24
Speculative Tomasulo with ROB FP Op Queue Reorder Buffer BNE mispredicted dest -F 0 F 4 -F 2 F 10 value <val 2> M[10] <val 1> instn ST 0(R 3), F 4 ADDD F 0, F 4, F 6 LD F 4, 0(R 3) BNE F 0, <…> DIVD F 2, F 10, F 6 ADDD F 10, F 4, F 0 Done? N ROB 7 Y ROB 6 Y ROB 5 N ROB 4 N ROB 3 Y ROB 2 ROB 1 Registers M[50] F 0 ROB 3 DIVD Dest <val 1>, R(F 6) FP adders National Tsing Hua University Oldest For store To Memory From Memory Dest Newest For load FP multipliers 25
Speculative Tomasulo with ROB dest value instn Done? ROB 7 FP Op Queue ROB 6 Newest ROB 5 Reorder Buffer All speculative instructions are flushed! -F 2 F 10 <val 1> M[50] F 0 ROB 3 DIVD <val 1>, R(F 6) National Tsing Hua University ROB 4 ROB 3 ROB 2 Oldest For store To Memory From Memory Dest FP adders Y N Y ROB 1 Registers Dest BNE F 0, <…> DIVD F 2, F 10, F 6 ADDD F 10, F 4, F 0 For load FP multipliers 26
Speculative Tomasulo with ROB dest value instn Done? ROB 7 FP Op Queue ROB 6 New instructions fetched from correct path Reorder Buffer F 5 -F 2 ADDD F 5, F 6, F 2 BNE F 0, <…> DIVD F 2, F 10, F 6 N Y Ex ROB 5 ROB 4 ROB 3 ROB 2 ROB 1 Dest M[50] F 0 National Tsing Hua University For store To Memory From Memory Dest FP adders Oldest F[10] <val 1> Registers ROB 5 ADDD R(F 6), MULT 1 Newest For load FP multipliers 27
V Head Tail Spec? Done? ROB Handling Precise Interrupts 01 0 1 0 0 PC x. A 000 x. A 004 x. A 008 . . . Exp event 0000 Reg. Dst R 1 R 2 FR 1 Data (physical register) 11 R 1=R 1+10 R 2=R 2*2 FR 1=FR 2/0. 0 . . . ARF 1 11 1 2 1 3 1 4 R 1 R 2 R 3 R 4 1 R 31 National Tsing Hua University 28
V Head Spec? Done? ROB Handling Precise Interrupts PC Exp event Reg. Dst 0 1 0 0 x. A 004 x. A 008 0000 R 2 FR 1 1 0 0 x. A 00 C 0000 R 3 Data (physical register) R 2=R 2*2 FR 1=FR 2/0. 0 R 3=R 3+1 Tail . . . R 1 R 2 R 3 R 4 ARF 1 11 1 2 1 3 1 4 1 R 31 National Tsing Hua University 29
V Head Spec? Done? ROB Handling Precise Interrupts PC Exp event Reg. Dst 0 1 0 0 x. A 004 x. A 008 0000 R 2 FR 1 1 0 0 x. A 00 C x. A 010 0000 R 3 R 4 Data (physical register) 4 R 2=R 2*2 FR 1=FR 2/0. 0 R 3=R 3+1 R 4=R 4*2 Tail . . . R 1 R 2 R 3 R 4 ARF 1 11 1 2 1 3 1 4 1 R 31 National Tsing Hua University 30
V Head Tail Spec? Done? ROB Handling Precise Interrupts PC Exp event Reg. Dst Data (physical register) 0 1 0 1 0 0 x. A 004 x. A 008 0000 R 2 FR 1 4 1 0 1 1 0 0 x. A 00 C x. A 010 x. A 014 0000 0101 R 3 R 4 FR 4 4 8 . . . Exception raised. R 2=R 2*2 FR 1=FR 2/0. 0 R 3=R 3+1 R 4=R 4*2 LD FR 4, M[50] R 1 R 2 R 3 R 4 ARF 1 11 4 1 2 1 3 1 4 1 R 31 National Tsing Hua University 31
V Head Tail Spec? Done? ROB Handling Precise Interrupts PC Exp event Reg. Dst 0 1 0 0 1 1 0 0 x. A 004 x. A 008 0000 R 2 0000 0010 FR 1 1 0 0 x. A 00 C x. A 010 x. A 014 0000 0101 R 3 R 4 FR 4 . . . Data (physical register) 4 R 2=R 2*2 4 8 FR 1=FR 2/0. 0 R 3=R 3+1 R 4=R 4*2 . . Exception raised. LD FR 4, M[50] R 1 R 2 R 3 R 4 ARF 1 11 1 4 1 3 1 4 1 R 31 National Tsing Hua University 32
V Head Spec? Done? ROB Handling Precise Interrupts PC Exp event Reg. Dst 0 0 1 0 0 x. A 008 0010 FR 1 1 0 0 x. A 00 C x. A 010 x. A 014 0000 0101 R 3 R 4 FR 4 Tail Push “PC” and current RF into stack . . . These values were Data (physical not register) committed into RF but flushed 4 8 . . Exception detected. Depending on the exception, process will either abort or instructions will be resumed from this excepting instruction National Tsing Hua University FR 1=FR 2/0. 0 R 3=R 3+1 R 4=R 4*2 LD FR 4, M[50] R 1 R 2 R 3 R 4 ARF 1 11 1 4 1 3 1 4 1 R 31 33
V Head Tail Spec? Done? ROB Handling Precise Interrupts PC Exp event Reg. Dst 0 0 1 0 0 x. A 008 0000 FR 1 1 0 0 x. A 00 C x. A 010 x. A 014 0000 R 3 R 4 FR 4 Data (physical register) . . After exception, “PC” and RF are pooped. back and all following instructions. are executed again. . FR 1=FR 2/0. 0 R 3=R 3+1 R 4=R 4*2 LD FR 4, M[50] R 1 R 2 R 3 R 4 ARF 1 11 1 4 1 3 1 4 1 R 31 National Tsing Hua University 34
Recap • Problems with dynamic scheduling - OOO commit on speculative execution - OOO commit on precise interrupt • Hardware-based speculation - Execute instructions before knowing whether they should be executed - OOO execution and completion, in-order commit through ROB, dynamic scheduling, branch prediction - Solve both precise interrupt and speculative execution at the same time National Tsing Hua University 35
Program State: Basic Idea • Suppose initially i = 0 and x = 0. 5 code. . . i = i+1; x = 1. 5. . . program state … i=0 x=0. 5. . . S 1 … i=1 x=0. 5. . . S 2 … i=1 x=1. 5. . . S 3 Program can be interrupted and resumed as long as state is preserved i=i+1 x=1. 5 Finite-state machine • Machine code can be viewed similarly 1000 1004 1008 1016. . . ld add sd ld National Tsing Hua University R 1, 0(Ri) R 1, #1 R 1, 0(Ri) F 2, 0(Rx) Architectural state includes registers and control registers, e. g. PC and status reg. 36
- Slides: 37