CS 152 Computer Architecture and Engineering Lecture 16

  • Slides: 40
Download presentation
CS 152 Computer Architecture and Engineering Lecture 16 Compiler Optimizations (Cont) Dynamic Scheduling with

CS 152 Computer Architecture and Engineering Lecture 16 Compiler Optimizations (Cont) Dynamic Scheduling with Scoreboards CS 152 Lec 16. 1

The Big Picture: Where are We Now? ° The Five Classic Components of a

The Big Picture: Where are We Now? ° The Five Classic Components of a Computer Processor Input Control Memory Datapath Output ° Today’s Topics: • • • Recap last lecture Hardware loop unrolling with Tomasulo algorithm Administrivia Speculation, branch prediction Reorder buffers CS 152 Lec 16. 2

Scoreboard: a bookkeeping technique ° Out-of-order execution divides ID stage: 1. 2. Issue—decode instructions,

Scoreboard: a bookkeeping technique ° Out-of-order execution divides ID stage: 1. 2. Issue—decode instructions, check for structural hazards Read operands—wait until no data hazards, then read operands ° Scoreboards date to CDC 6600 in 1963 ° Instructions execute whenever not dependent on previous instructions and no hazards. ° CDC 6600: In order issue, out-of-order execution, out-of -order commit (or completion) • No forwarding! • Imprecise interrupt/exception model for now CS 152 Lec 16. 3

Registers FP Mult FP Divide FP Add Integer SCOREBOARD Functional Units Scoreboard Architecture(CDC 6600)

Registers FP Mult FP Divide FP Add Integer SCOREBOARD Functional Units Scoreboard Architecture(CDC 6600) Memory CS 152 Lec 16. 4

Scoreboard Implications ° Out-of-order completion => WAR, WAW hazards? ° Solutions for WAR: •

Scoreboard Implications ° Out-of-order completion => WAR, WAW hazards? ° Solutions for WAR: • Stall writeback until registers have been read • Read registers only during Read Operands stage ° Solution for WAW: • Detect hazard and stall issue of new instruction until other instruction completes ° No register renaming! ° Need to have multiple instructions in execution phase => multiple execution units or pipelined execution units ° Scoreboard keeps track of dependencies between instructions that have already issued. ° Scoreboard replaces ID, EX, WB with 4 stages CS 152 Lec 16. 5

Four Stages of Scoreboard Control ° Issue—decode instructions & check for structural hazards (ID

Four Stages of Scoreboard Control ° Issue—decode instructions & check for structural hazards (ID 1) • Instructions issued in program order (for hazard checking) • Don’t issue if structural hazard • Don’t issue if instruction is output dependent on any previously issued but uncompleted instruction (no WAW hazards) ° Read operands—wait until no data hazards, then read operands (ID 2) • All real dependencies (RAW hazards) resolved in this stage, since we wait for instructions to write back data. • No forwarding of data in this model! CS 152 Lec 16. 6

Four Stages of Scoreboard Control ° Execution—operate on operands (EX) • The functional unit

Four Stages of Scoreboard Control ° Execution—operate on operands (EX) • The functional unit begins execution upon receiving operands. When the result is ready, it notifies the scoreboard that it has completed execution. ° Write result—finish execution (WB) • Stall until no WAR hazards with previous instructions: Example: DIVD ADDD SUBD F 0, F 2, F 4 F 10, F 8 F 8, F 14 CDC 6600 scoreboard would stall SUBD until ADDD reads operands CS 152 Lec 16. 7

Three Parts of the Scoreboard ° Instruction status: Which of 4 steps the instruction

Three Parts of the Scoreboard ° Instruction status: Which of 4 steps the instruction is in ° Functional unit status: —Indicates the state of the functional unit (FU). 9 fields for each functional unit Busy: Op: Fi: Fj, Fk: Qj, Qk: Rj, Rk: Indicates whether the unit is busy or not Operation to perform in the unit (e. g. , + or –) Destination register Source-register numbers Functional units producing source registers Fj, Fk Flags indicating when Fj, Fk are ready ° Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register CS 152 Lec 16. 8

Scoreboard Example CS 152 Lec 16. 9

Scoreboard Example CS 152 Lec 16. 9

Detailed Scoreboard Pipeline Control Instruction status Issue Wait until Busy(FU) yes; Op(FU) op; Fi(FU)

Detailed Scoreboard Pipeline Control Instruction status Issue Wait until Busy(FU) yes; Op(FU) op; Fi(FU) `D’; Fj(FU) `S 1’; Not busy (FU) Fk(FU) `S 2’; Qj Result(‘S 1’); and not result(D) Qk Result(`S 2’); Rj not Qj; Rk not Qk; Result(‘D’) FU; Read operands Rj and Rk Execution complete Functional unit done Write result Bookkeeping Rj No; Rk No f((Fj(f)≠Fi(FU) f(if Qj(f)=FU then Rj(f) Yes); or Rj(f)=No) & f(if Qk(f)=FU then Rj(f) Yes); (Fk(f) ≠Fi(FU) or Result(Fi(FU)) 0; Busy(FU) No Rk( f )=No)) CS 152 Lec 16. 10

Scoreboard Example: Cycle 1 CS 152 Lec 16. 11

Scoreboard Example: Cycle 1 CS 152 Lec 16. 11

Scoreboard Example: Cycle 2 • Issue 2 nd LD? CS 152 Lec 16. 12

Scoreboard Example: Cycle 2 • Issue 2 nd LD? CS 152 Lec 16. 12

Scoreboard Example: Cycle 3 • Issue MULT? CS 152 Lec 16. 13

Scoreboard Example: Cycle 3 • Issue MULT? CS 152 Lec 16. 13

Scoreboard Example: Cycle 4 CS 152 Lec 16. 14

Scoreboard Example: Cycle 4 CS 152 Lec 16. 14

Scoreboard Example: Cycle 5 CS 152 Lec 16. 15

Scoreboard Example: Cycle 5 CS 152 Lec 16. 15

Scoreboard Example: Cycle 6 CS 152 Lec 16. 16

Scoreboard Example: Cycle 6 CS 152 Lec 16. 16

Scoreboard Example: Cycle 7 • Read multiply operands? CS 152 Lec 16. 17

Scoreboard Example: Cycle 7 • Read multiply operands? CS 152 Lec 16. 17

Scoreboard Example: Cycle 8 a (First half of clock cycle) CS 152 Lec 16.

Scoreboard Example: Cycle 8 a (First half of clock cycle) CS 152 Lec 16. 18

Scoreboard Example: Cycle 8 b (Second half of clock cycle) CS 152 Lec 16.

Scoreboard Example: Cycle 8 b (Second half of clock cycle) CS 152 Lec 16. 19

Scoreboard Example: Cycle 9 Note Remaining • Read operands for MULT & SUB? Issue

Scoreboard Example: Cycle 9 Note Remaining • Read operands for MULT & SUB? Issue ADDD? CS 152 Lec 16. 20

Scoreboard Example: Cycle 10 CS 152 Lec 16. 21

Scoreboard Example: Cycle 10 CS 152 Lec 16. 21

Scoreboard Example: Cycle 11 CS 152 Lec 16. 22

Scoreboard Example: Cycle 11 CS 152 Lec 16. 22

Scoreboard Example: Cycle 12 • Read operands for DIVD? CS 152 Lec 16. 23

Scoreboard Example: Cycle 12 • Read operands for DIVD? CS 152 Lec 16. 23

Scoreboard Example: Cycle 13 CS 152 Lec 16. 24

Scoreboard Example: Cycle 13 CS 152 Lec 16. 24

Scoreboard Example: Cycle 14 CS 152 Lec 16. 25

Scoreboard Example: Cycle 14 CS 152 Lec 16. 25

Scoreboard Example: Cycle 15 CS 152 Lec 16. 26

Scoreboard Example: Cycle 15 CS 152 Lec 16. 26

Scoreboard Example: Cycle 16 CS 152 Lec 16. 27

Scoreboard Example: Cycle 16 CS 152 Lec 16. 27

Scoreboard Example: Cycle 17 WAR Hazard! • Why not write result of ADD? ?

Scoreboard Example: Cycle 17 WAR Hazard! • Why not write result of ADD? ? ? CS 152 Lec 16. 28

Scoreboard Example: Cycle 18 CS 152 Lec 16. 29

Scoreboard Example: Cycle 18 CS 152 Lec 16. 29

Scoreboard Example: Cycle 19 CS 152 Lec 16. 30

Scoreboard Example: Cycle 19 CS 152 Lec 16. 30

Scoreboard Example: Cycle 20 CS 152 Lec 16. 31

Scoreboard Example: Cycle 20 CS 152 Lec 16. 31

Scoreboard Example: Cycle 21 • WAR Hazard is now gone. . . CS 152

Scoreboard Example: Cycle 21 • WAR Hazard is now gone. . . CS 152 Lec 16. 32

Scoreboard Example: Cycle 22 CS 152 Lec 16. 33

Scoreboard Example: Cycle 22 CS 152 Lec 16. 33

Faster than light computation (skip a couple of cycles) CS 152 Lec 16. 34

Faster than light computation (skip a couple of cycles) CS 152 Lec 16. 34

Scoreboard Example: Cycle 61 CS 152 Lec 16. 35

Scoreboard Example: Cycle 61 CS 152 Lec 16. 35

Scoreboard Example: Cycle 62 CS 152 Lec 16. 36

Scoreboard Example: Cycle 62 CS 152 Lec 16. 36

Review: Scoreboard Example: Cycle 62 • In-order issue; out-of-order execute & commit CS 152

Review: Scoreboard Example: Cycle 62 • In-order issue; out-of-order execute & commit CS 152 Lec 16. 37

CDC 6600 Scoreboard ° Speedup 1. 7 from compiler; 2. 5 by hand BUT

CDC 6600 Scoreboard ° Speedup 1. 7 from compiler; 2. 5 by hand BUT slow memory (no cache) limits benefit ° Limitations of 6600 scoreboard: • No forwarding hardware • Limited to instructions in basic block (small window) • Small number of functional units (structural hazards), especially integer/load store units • Do not issue on structural hazards • Wait for WAR hazards • Prevent WAW hazards CS 152 Lec 16. 38

Summary #1/2: Compiler techniques for parallelism ° Loop unrolling Multiple iterations of loop in

Summary #1/2: Compiler techniques for parallelism ° Loop unrolling Multiple iterations of loop in software: • Amortizes loop overhead over several iterations • Gives more opportunity for scheduling around stalls ° Software Pipelining Take one instruction from each of several iterations of the loop • Software overlapping of loop iterations • Today will show hardware overlapping of loop iterations ° Very Long Instruction Word machines (VLIW) Multiple operations coded in single, long instruction • Requires sophisticated compiler to decide which operations can be done in parallel • Trace scheduling find common path and schedule code as if branches didn’t exist (+ add “fixup code”) ° All of these require additional registers CS 152 Lec 16. 39

Summary #2/2 ° HW exploiting ILP • Works when can’t know dependence at compile

Summary #2/2 ° HW exploiting ILP • Works when can’t know dependence at compile time. • Code for one machine runs well on another ° Key idea of Scoreboard: Allow instructions behind stall to proceed (Decode => Issue instr & read operands) • • Enables out-of-order execution => out-of-order completion ID stage checked both for structural & data dependencies Original version didn’t handle forwarding. No automatic register renaming CS 152 Lec 16. 40