Recap Scoreboarding 1 Dynamic Scheduling Dynamic Scheduling by
Recap (Scoreboarding) 1
Dynamic Scheduling • Dynamic Scheduling by Hardware – Allow Out-of-order execution, Out-of-order completion – Even though an instruction is stalled, later instructions, with no data dependencies with the instructions which are stalled and causing the stall, can proceed – Efficient utilization of functional unit with multiple units 2
Dynamic Pipeline Scheduling: The Concept • Instruction are allowed to start executing out-of-order as soon as their operands are available. • Example: In the case of in-order execution SUBD must wait for DIVD to complete which stalled ADDD before starting execution In out-of-order execution SUBD can start as soon as the values of its operands F 8, F 14 are available. DIVD F 0, F 2, F 4 ADDD F 10, F 8 SUBD F 12, F 8, F 14 This implies allowing out-of-order instruction commit (completion). 3
Dynamic Pipeline Scheduling • Dynamic instruction scheduling is accomplished by: – Dividing the Instruction Decode ID stage into two stages: • Issue: Decode instructions, check for structural hazards. • Read operands: Wait until data hazard conditions, if any, are resolved, then read operands when available. (All instructions pass through the issue stage in order but can be stalled or pass each other in the read operands stage). 4
Scoreboard Implications • Out-of-order execution ==> WAR, WAW hazards? DIVD F 0, F 2, F 4 ADDD F 10, F 8 SUBD F 8, F 14 • If the pipeline executes SUBD before ADDD, it will yield incorrect execution • A WAW hazard would occur. We must detect the hazard and stall until other completes. DIVD F 0, F 2, F 4 ADDD F 10, F 8 SUBD F 10, F 8, F 14 5
Scoreboard Specifics • Several functional units – several floating-point units, integer units, and memory reference units • Data dependencies (hazards) are detected when an instruction reaches the scoreboard – corresponding to instruction issue replacing part of the ID stage • Scoreboard determines – when the instruction is ready for execution – based on when its operands and functional unit become available – where results are written 6
The basic structure of a MIPS processor with a scoreboard 7
1 2 Three Parts of the Scoreboard Instruction status: Which of 4 steps the instruction is in. Functional unit status: Indicates the state of the functional unit (FU). Nine fields for each functional unit: – – – Busy Op Fi Fj, Fk Qj, Qk Rj, Rk Indicates whether the unit is busy or not Operation to perform in the unit (e. g. , + or –) Destination register Source-register numbers Functional units producing source registers Fj, Fk Flags indicating when Fj, Fk are ready (set to Yes after operand is available to read) 3 Register result status: Indicates which functional unit will write to each register, if one exists. Blank when no pending instructions will write that register. 8
A Scoreboard Example The following code is run on the MIPS with a scoreboard given earlier with: Functional Unit (FU) Integer Floating Point Multiply Floating Point add Floating point Divide L. D F 6, 34(R 2) L. D F 2, 45(R 3) MUL. D # of FUs 1 2 1 1 EX Latency 0 10 2 40 All functional units are not pipelined F 0, F 2, F 4 SUB. D F 8, F 6, F 2 DIV. D F 10, F 6 ADD. D F 6, F 8, F 2 9
Dependency Graph For Example Code 1 1 2 3 4 5 6 L. D F 6, 34 (R 2) 2 L. D F 2, 45 (R 3) 3 MUL. D F 0, F 2, F 4 4 SUB. D F 8, F 6, F 2 5 DIV. D F 10, F 6 L. D MUL. D SUB. D DIV. D ADD. D F 6, 34(R 2) F 2, 45(R 3) F 0, F 2, F 4 F 8, F 6, F 2 F 10, F 6 F 6, F 8, F 2 Date Dependence: (1, 4) (1, 5) (2, 3) (2, 4) (2, 6) (3, 5) (4, 6) Output Dependence: (1, 6) Anti-dependence: (5, 6) Real Data Dependence (RAW) 6 ADD. D F 6, F 8, F 2 Anti-dependence (WAR) Output Dependence (WAW) 10
Scoreboard Example: Cycle 1 FP Latency: Add = 2 cycles, Multiply = 10, Divide = 40 Instruction status Instruction j k L. D F 6 34+ R 2 L. D F 2 45+ R 3 MUL. DF 0 F 2 F 4 SUB. DF 8 F 6 F 2 DIV. D F 10 F 6 ADD. DF 6 F 8 F 2 Functional unit status Time Name Integer Mult 1 Mult 2 Add Divide Register result status Clock 1 FU Read Issue Execution Write operands complete Result 1 Busy Yes No No Op Load dest Fi F 6 F 0 F 2 F 4 S 1 Fj S 2 Fk R 2 FU for j FU for k Fj? Qj Qk Rj F 6 F 8 F 10 F 12 . . . Fk? Rk Yes F 30 Integer 11
Scoreboard Example: Cycle 2 FP Latency: Add = 2 cycles, Multiply = 10, Divide = 40 Instruction status Instruction j k L. D F 6 34+ R 2 L. D F 2 45+ R 3 MUL. DF 0 F 2 F 4 SUB. DF 8 F 6 F 2 DIV. D F 10 F 6 ADD. DF 6 F 8 F 2 Functional unit status Time Name Integer Mult 1 Mult 2 Add Divide Register result status Clock 2 • Read Issue 1 Execution Write operands complete Result 2 Busy Yes No No Op Load dest Fi F 6 F 0 F 2 F 4 FU Issue second L. D? S 1 Fj S 2 Fk R 2 FU for j FU for k Fj? Qj Qk Rj F 6 F 8 F 10 F 12 . . . Fk? Rk Yes F 30 Integer No, stall on structural hazard 12
Scoreboard Example: Cycle 3 Instruction status Instruction j k L. D F 6 34+ R 2 L. D F 2 45+ R 3 MUL. DF 0 F 2 F 4 SUB. DF 8 F 6 F 2 DIV. D F 10 F 6 ADD. DF 6 F 8 F 2 Functional unit status Time Name Integer Mult 1 Mult 2 Add Divide Register result status Clock 3 • Read Issue 1 Execution Write operands complete Result 2 3 Busy Yes No No Op Load dest Fi F 6 S 1 Fj F 0 F 2 F 4 F 6 F 8 F 10 ? FU Issue MUL. D? S 2 Fk R 2 FU for j FU for k Fj? Qj Qk Rj F 12 . . . Fk? Rk Yes F 30 Integer In-order issue !!! 13
Scoreboard Example: Cycle 4 Instruction status Instruction j k L. D F 6 34+ R 2 L. D F 2 45+ R 3 MUL. DF 0 F 2 F 4 SUB. DF 8 F 6 F 2 DIV. D F 10 F 6 ADD. DF 6 F 8 F 2 Functional unit status Time Name Integer Mult 1 Mult 2 Add Divide Register result status Clock 4 FU Read Issue 1 Execution Write operands complete Result 2 3 4 Busy Yes No No Op Load dest Fi F 6 S 1 Fj F 0 F 2 F 4 F 6 F 8 F 10 S 2 Fk R 2 FU for j FU for k Fj? Qj Qk Rj F 12 . . . Fk? Rk Yes F 30 Integer 14
Scoreboard Example: Cycle 5 Instruction status Instruction j k L. D F 6 34+ R 2 L. D F 2 45+ R 3 MUL. D F 0 F 2 F 4 SUB. D F 8 F 6 F 2 DIV. D F 10 F 6 F 8 F 2 ADD. D F 6 Functional unit status Time Name Integer Mult 1 Mult 2 Add Divide Register result status Clock 5 FU Read Issue 1 5 Execution Write operands complete Result 2 3 Busy Yes No No Op Load dest Fi F 2 F 0 F 2 F 4 4 S 1 Fj S 2 Fk R 3 FU for j FU for k Fj? Qj Qk Rj F 6 F 8 F 10 F 12 . . . Fk? Rk Yes F 30 Integer 15
Scoreboard Example: Cycle 6 Instruction status Instruction j k L. D F 6 34+ R 2 L. D F 2 45+ R 3 MUL. D F 0 F 2 F 4 F 6 F 2 SUB. D F 8 F 6 DIV. D F 10 F 0 ADD. D F 6 F 8 F 2 Functional unit status Time Name Integer Mult 1 Mult 2 Add Divide Register result status Issue 1 5 6 FU Execution. Write operands complete Result 2 6 Busy Op Yes Load Yes Mult No No No F 0 Clock 6 Read F 2 3 4 dest Fi F 2 F 0 S 1 Fj F 4 F 6 F 8 F 10 F 2 S 2 FU for j FU for k Fj? Fk Qj Qk Rj R 3 F 4 Integer No F 12 . . . Fk? Rk Yes F 30 Mult 1 Integer 16
Scoreboard Example: Cycle 7 Instruction status Instruction j k L. D F 6 34+ R 2 L. D F 2 45+ R 3 MUL. D F 0 F 2 F 4 SUB. D F 8 F 6 F 2 DIV. D F 10 F 6 F 8 F 2 ADD. D F 6 Functional unit status Time Name Integer Mult 1 Mult 2 Add Divide Register result status Clock 7 FU Read Issue 1 5 6 7 Execution Write operands complete Result 2 6 3 7 Busy Yes No Op Load Mult dest Fi F 2 F 0 Sub F 8 F 0 F 2 F 4 4 S 1 Fj S 2 FU for j FU for k Fj? Fk Qj Qk Rj R 3 F 2 F 4 Integer No F 6 F 2 F 6 F 8 F 10 Mult 1 Integer F 12 Yes . . . Fk? Rk Yes No F 30 Add • Read multiply operands? 17
Scoreboard Example: Cycle 8 a (First half of cycle 8) Instruction status Instruction j k L. D F 6 34+ R 2 L. D F 2 45+ R 3 MUL. D F 0 F 2 F 4 SUB. D F 8 F 6 F 2 DIV. D F 10 F 6 ADD. D F 6 F 8 F 2 Functional unit status Time Name Integer Mult 1 Mult 2 Add Divide Register result status Clock 8 FU Read Issue 1 5 6 7 8 Execution Write operands complete Result 2 6 3 7 Busy Yes No Yes Op Load Mult dest Fi F 2 F 0 Sub Div F 8 F 10 F 2 Mult 1 Integer 4 F 4 S 1 Fj S 2 FU for j FU for k Fj? Fk Qj Qk Rj Rk R 3 Yes F 2 F 4 Integer No F 6 F 0 F 2 F 6 Integer Mult 1 F 6 F 8 F 10 Add F 12 . . . Yes No No Yes F 30 Divide 18
Scoreboard Example: Cycle 8 b (Second half of cycle 8) Instruction status Instruction j k F 6 34+ R 2 L. D F 2 45+ R 3 L. D F 2 F 4 MUL. D F 0 F 6 F 2 SUB. D F 8 F 6 DIV. D F 10 F 8 F 2 ADD. D F 6 Functional unit status Time Name Integer Mult 1 Mult 2 Add Divide Register result status Clock 8 FU Read Issue operands complete Result 1 5 6 7 8 2 6 Busy No Yes Yes Op F 0 F 2 Mult 1 Execution Write 3 4 7 dest Fi Mult F 0 Sub Div F 8 F 10 F 4 8 S 1 Fj S 2 Fk FU for j FU for k Fj? Qj Qk Rj F 2 F 4 F 6 F 0 F 2 F 6 Yes F 12 Yes No Mult 1 F 6 F 8 F 10 Add Fk? Rk . . . Yes F 30 Divide 19
Scoreboard Example: Cycle 9 FP Latency: Add = 2 cycles, Multiply = 10, Divide = 40 Instruction status Instruction j k L. D F 6 34+ R 2 L. D F 2 45+ R 3 MUL. D F 0 F 2 F 4 SUB. D F 8 F 6 F 2 DIV. D F 10 F 6 ADD. D F 6 F 8 F 2 Functional unit status Time Name Integer 10 Mult 1 Mult 2 2 Add Divide Register result status Clock 9 • FU Read Issue Execution Write operands complete Result 1 5 6 7 8 2 6 9 9 3 4 7 8 ? Busy No Yes Yes Op F 0 F 2 Mult 1 dest Fi Mult F 0 Sub Div F 8 F 10 F 4 S 1 Fj S 2 Fk FU for j FU for k Fj? Qj Qk Rj Fk? Rk F 2 F 4 Yes F 6 F 0 F 2 F 6 Yes No Yes Mult 1 F 6 F 8 F 10 Add F 12 . . . F 30 Divide Read operands for MUL. D & SUB. D? Issue ADD. D? 20
Scoreboard Example: Cycle 11 Instruction status Instruction j k L. D F 6 34+ R 2 L. D F 2 45+ R 3 MUL. D F 0 F 2 F 4 SUB. D F 8 F 6 F 2 DIV. D F 10 F 6 ADD. D F 6 F 8 F 2 Functional unit status Time Name Integer 8 Mult 1 Mult 2 0 Add Divide Register result status Clock 11 FU Read Issue operands complete Result 1 5 6 7 8 2 6 9 9 Busy No Yes Yes Op F 0 F 2 Mult 1 Execution Write 3 4 7 8 11 dest Fi Mult F 0 Sub Div F 8 F 10 F 4 S 1 Fj S 2 Fk FU for j FU for k Fj? Qj Qk Rj F 2 F 4 F 6 F 0 F 2 F 6 Yes F 12 Yes No Mult 1 F 6 F 8 F 10 Add Fk? Rk . . . Yes F 30 Divide 21
Scoreboard Example: Cycle 12 Instruction status Instruction j k L. D F 6 34+ R 2 L. D F 2 45+ R 3 MUL. D F 0 F 2 F 4 SUB. D F 8 F 6 F 2 DIV. D F 10 F 6 ADD. D F 6 F 8 F 2 Functional unit status Time Name Integer 7 Mult 1 Mult 2 Add Divide Register result status Clock 12 FU Read Issue Execution Write operands complete Result 1 5 6 7 8 2 6 9 9 Busy No Yes No No Yes Op F 0 F 2 3 4 7 8 11 dest Fi 12 S 1 Fj S 2 Fk FU for j FU for k Fj? Qj Qk Rj Mult F 0 F 2 F 4 Div F 10 F 6 F 4 Yes Mult 1 F 6 F 8 F 10 Mult 1 Fk? Rk F 12 Yes No . . . Yes F 30 Divide • Read operands for DIV. D? 22
Scoreboard Example: Cycle 13 Instruction status Instruction j k L. D F 6 34+ R 2 L. D F 2 45+ R 3 MUL. D F 0 F 2 F 4 SUB. D F 8 F 6 F 2 DIV. D F 10 F 6 ADD. D F 6 F 8 F 2 Functional unit status Time Name Integer 6 Mult 1 Mult 2 Add Divide Register result status Clock 13 FU Read Issue operands complete Result 1 5 6 7 8 13 2 6 9 9 Busy No Yes Yes Op F 0 F 2 Mult 1 Execution Write 3 4 7 8 11 dest Fi Mult F 0 Add Div F 6 F 10 F 4 12 S 1 Fj S 2 Fk F 2 F 8 F 0 FU for j FU for k Fj? Qj Qk Rj F 4 F 2 F 6 Mult 1 F 6 F 8 F 10 Add F 12 . . . Fk? Rk Yes Yes No Yes F 30 Divide 23
Scoreboard Example: Cycle 17 Instruction status Instruction j k L. D F 6 34+ R 2 L. D F 2 45+ R 3 MUL. D F 0 F 2 F 4 SUB. D F 8 F 6 F 2 DIV. D F 10 F 6 ADD. D F 6 F 8 F 2 Functional unit status Time Name Integer 2 Mult 1 Mult 2 Add Divide Register result status Clock 17 • FU Read Issue 1 5 6 7 8 13 Execution Write operands complete Result 2 6 9 9 4 8 11 12 Busy No Yes Yes Op 16 dest Fi Mult F 0 F 2 F 4 Yes Add Div F 6 F 10 F 8 F 0 F 2 F 6 Yes No Yes F 0 F 2 F 4 F 6 F 8 F 10 . . . F 30 Mult 1 14 3 7 S 1 Fj S 2 Fk Add FU for j FU for k Fj? Qj Qk Rj Mult 1 F 12 Fk? Rk Divide Write result of ADD. D? No, WAR hazard 24
Scoreboard Example: Cycle 20 Instruction status Instruction j k L. D F 6 34+ R 2 L. D F 2 45+ R 3 MUL. D F 0 F 2 F 4 SUB. D F 8 F 6 F 2 DIV. D F 10 F 6 ADD. D F 6 F 8 F 2 Functional unit status Time Name Integer Mult 1 Mult 2 Add Divide Register result status Clock 20 FU Read Issue 1 5 6 7 8 13 Execution Write operands complete Result 2 6 9 9 3 7 19 11 4 8 20 12 14 S 1 Fj S 2 Fk F 2 F 6 Busy No No No Yes Op 16 dest Fi Add Div F 6 F 10 F 8 F 0 F 2 F 4 F 6 F 8 F 10 Add FU for j FU for k Fj? Qj Qk Rj F 12 Fk? Rk Yes Yes . . . F 30 Divide 25
Scoreboard Example: Cycle 21 Instruction status Instruction j k L. D F 6 34+ R 2 L. D F 2 45+ R 3 MUL. D F 0 F 2 F 4 SUB. D F 8 F 6 F 2 DIV. D F 10 F 6 ADD. D F 6 F 8 F 2 Functional unit status Time Name Integer Mult 1 Mult 2 Add Divide Register result status Clock 21 FU Read Issue Execution Write operands complete Result 1 5 6 7 8 13 2 6 9 9 21 14 3 7 19 11 4 8 20 12 16 dest S 1 S 2 FU for j FU for k Fj? Fk? Qj Rj Rk Yes Yes . . . F 30 Busy No No No Yes Op Fi Fj Fk Add Div F 6 F 10 F 8 F 0 F 2 F 6 F 0 F 2 F 4 F 6 F 8 F 10 Add Qk F 12 Divide 26
Scoreboard Example: Cycle 22 Instruction status Instruction j k L. D F 6 34+ R 2 L. D F 2 45+ R 3 MUL. D F 0 F 2 F 4 SUB. D F 8 F 6 F 2 DIV. D F 10 F 6 ADD. D F 6 F 8 F 2 Functional unit status Time Name Integer Mult 1 Mult 2 Add 40 Divide Register result status Clock 22 FU Read Issue Execution Write operands complete Result 1 5 6 7 8 13 2 6 9 9 21 14 3 7 19 11 4 8 20 12 16 22 dest S 1 S 2 FU for j FU for k Fj? Fk? Qj Qk Rj Rk Yes . . . F 30 Busy No No Yes Op Fi Fj Fk Div F 10 F 6 F 0 F 2 F 4 F 6 F 8 F 10 F 12 Divide 27
Scoreboard Example: Cycle 61 Instruction status Instruction j k L. D F 6 34+ R 2 L. D F 2 45+ R 3 MUL. D F 0 F 2 F 4 SUB. D F 8 F 6 F 2 DIV. D F 10 F 6 ADD. D F 6 F 8 F 2 Functional unit status Time Name Integer Mult 1 Mult 2 Add 0 Divide Register result status Clock 61 FU Read Issue Execution Write operands complete Result 1 5 6 7 8 13 2 6 9 9 21 14 3 7 19 11 61 16 4 8 20 12 22 dest S 1 S 2 FU for j FU for k Fj? Fk? Qj Qk Rj Rk Yes . . . F 30 Busy No No Yes Op Fi Fj Fk Div F 10 F 6 F 0 F 2 F 4 F 6 F 8 F 10 F 12 Divide 28
Scoreboard Example: Cycle 62 Instruction status Instruction j k L. D F 6 34+ R 2 L. D F 2 45+ R 3 MUL. D F 0 F 2 F 4 SUB. D F 8 F 6 F 2 DIV. D F 10 F 6 ADD. D F 6 F 8 F 2 Functional unit status Time Name Integer Mult 1 Mult 2 Add 0 Divide Register result status Clock 62 Read Issue Execution Write operands complete Result 1 5 6 7 8 13 2 6 9 9 21 14 3 7 19 11 61 16 Instruction Block done 4 8 20 12 62 22 dest S 1 S 2 FU for j FU for k Fj? Fk Qj Qk Rj Rk F 12 . . . F 30 Busy No No No Op Fi Fj F 0 F 2 F 4 F 6 F 8 F 10 FU • We have: • In-oder issue, • Out-of-order execute and commit 29
Where have all the transistors gone? • Superscalar (multiple instructions per clock cycle) • 3 levels of cache • Branch prediction (predict outcome of decisions) • Out-of-order execution (executing instructions in different order than programmer wrote them) Execution 2 Bus Intf D TLB cache Out-Of-Order branch Icache SS Intel Pentium III (10 M transistors) 30
- Slides: 30