EECS 470 Lecture 5 Intro to Dynamic Scheduling

  • Slides: 50
Download presentation
EECS 470 Lecture 5 Intro to Dynamic Scheduling (Scoreboarding) Winter 2021 Jon Beaumont http:

EECS 470 Lecture 5 Intro to Dynamic Scheduling (Scoreboarding) Winter 2021 Jon Beaumont http: //www. eecs. umich. edu/courses/eecs 470 Many thanks to Prof. Martin and Roth of University of Pennsylvania for most of these slides. Portions developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch of Carnegie Mellon University, Purdue University, University of Michigan, and University of Wisconsin. EECS 470 Lecture 5 Slide 1

Announcements • Reminder r Lab #2 due Friday at 12: 30 p m r

Announcements • Reminder r Lab #2 due Friday at 12: 30 p m r Verilog assignment #2 due Wed 2/10 m r Get checked off during GSI/IA OH Submit to autograder by 11: 59 p HW # 1 due Thursday m Submit to Gradescope by 11: 59 p • We'll take the last few minutes of class to discuss the recent news in the department • Involves discussion of sexual assault • No one is obligated to stay EECS 470 Lecture 5 Slide 2

Last Time • Hazards • Detection • Resolution • Software (avoidance) • Hardware (stalling,

Last Time • Hazards • Detection • Resolution • Software (avoidance) • Hardware (stalling, forwarding) EECS 470 Lecture 5 Slide 3

Lingering Questions • Remember, you can submit lingering questions to cover next lecture at:

Lingering Questions • Remember, you can submit lingering questions to cover next lecture at: https: //bit. ly/3 o. Sr 5 FD EECS 470 Lecture 5 Slide 4

Today • • ILP and limits of scalar pipelines Introduce dynamic scheduling (i. e.

Today • • ILP and limits of scalar pipelines Introduce dynamic scheduling (i. e. out-of-order execution) Register renaming (high level) Case study: scoreboard scheduling EECS 470 Lecture 5 Slide 5

Readings For Today: • H & P Chapter C. 5 -C. 7, Chapter 3

Readings For Today: • H & P Chapter C. 5 -C. 7, Chapter 3 For Monday: • D. Sima “Design Space of Register Renaming Techniques” • Can access online from umich IP address EECS 470 Lecture 5 Slide 6

Limitations of Scalar Pipelines Upper Bound on Scalar Pipeline Throughput Limited by IPC=1 “Flynn

Limitations of Scalar Pipelines Upper Bound on Scalar Pipeline Throughput Limited by IPC=1 “Flynn Bottleneck” Inefficient Unification Into Single Pipeline Long latency for each instruction Performance Lost Due to Rigid In-order Pipeline Unnecessary stalls EECS 470 Lecture 5 Slide 7

Architectures for Instruction-Level Parallelism EECS 470 Lecture 5 Slide 8

Architectures for Instruction-Level Parallelism EECS 470 Lecture 5 Slide 8

Superscalar Machine EECS 470 Lecture 5 Slide 9

Superscalar Machine EECS 470 Lecture 5 Slide 9

What is the real problem? CPI of in-order pipelines degrades very sharply if the

What is the real problem? CPI of in-order pipelines degrades very sharply if the machine parallelism is increased beyond a certain point, i. e. , when Nx. M approaches average distance between dependent instructions Forwarding is no longer effective Pipeline may never be full due to frequent dependency stalls! EECS 470 Lecture 5 Slide 10

ILP: Instruction-Level Parallelism ILP is a measure of the amount of inter-dependencies between instructions

ILP: Instruction-Level Parallelism ILP is a measure of the amount of inter-dependencies between instructions Average ILP = no. instruction / no. cyc required code 1: ILP = 1 i. e. must execute serially code 2: ILP = 3 i. e. can execute at the same time code 1: EECS 470 r 1 r 2 + 1 r 3 r 1 / 17 r 4 r 0 - r 3 code 2: r 1 r 2 + 1 r 3 r 9 / 17 r 4 r 0 - r 10 Lecture 5 Slide 11

The Problem With In-Order Pipelines addf f 0, f 1, f 2 mulf f

The Problem With In-Order Pipelines addf f 0, f 1, f 2 mulf f 2, f 3, f 2 subf f 0, f 1, f 4 1 2 F D E+ E+ E+ W F 3 4 5 6 7 8 9 10 11 12 13 14 15 16 D d* d* E* E* E* W F p* p* D E+ E+ E+ W What’s happening in cycle 4? • mulf stalls due to RAW hazard • OK, this is a fundamental problem • subf stalls due to pipeline hazard (aka structural hazard) • Why? subf can’t proceed into D because mulf is there • That is the only reason, and it isn’t a fundamental one Why can’t subf go into D in cycle 4 and E+ in cycle 5? This tends to be a bigger problem with long latency instructions E. g. loads w/ cache misses or FP arithmetic EECS 470 Lecture 5 Slide 13

New Concepts • Two (somewhat) independent techniques, although very often employed together: • Dynamic

New Concepts • Two (somewhat) independent techniques, although very often employed together: • Dynamic scheduling (a. k. a out-of-order processing) • Register Renaming EECS 470 Lecture 5 Slide 14

Concept 1: Dynamic Scheduling add mul sub div I$ B P p 2, p

Concept 1: Dynamic Scheduling add mul sub div I$ B P p 2, p 3, p 4 p 2, p 4, p 5 p 2, p 5, p 6 p 4, 4, p 7 regfile insn buffer D D$ S • Instructions fetch/decoded into Instruction Buffer • Also called “instruction window” or “instruction scheduler” • Each cycle, hardware checks if the source registers for each instruction to see if it's ready to execute • Instructions can leave buffer when ready in arbitrary order • E. g. if "mul" takes a long time to execute, "div" can execute before "sub" EECS 470 Lecture 5 Slide 15

New Hazards • Problem! • Out of order introduces new types of hazards! •

New Hazards • Problem! • Out of order introduces new types of hazards! • 3 types of hazards: • Read-after-write (RAW): we’re familiar with this • Write-after-read (WAR): “Anti dependencies” • Write-after-write (WAW): “Output dependencies” WAR and WAW also called "name" hazards • Last two relevant to out-of-order processing • Is RAR a thing? EECS 470 Lecture 5 Slide 16

Why are RAR hazards not a thing? a) Programs never read the same value

Why are RAR hazards not a thing? a) Programs never read the same value multiple times b) These are already fixed with forwarding c) Hazards only occur when a value changes add sub mul div EECS 470 r 2, r 3, r 1 r 2, r 1, r 3 r 2, r 3, r 3 r 1, 4, r 1 New Hazards If "div" writes its result back before "add", r 1 will have the wrong value! If "div" writes its result back before "sub" reads it, sub will execute incorrectly Which could solve WAR and WAW? (select all) a) Having a bunch more registers b) Restricting the order of instruction executions c) Predict when there are hazards • "Write-after-write" (WAW) • Occurs in out-of-order processors when two instructions write to the same register • "Write-after-read" (WAR) • Occurs in out-of-order processors when an instruction overwrites a value read by an earlier instruction Lecture 5 Slide 17

Concept 2: Register Renaming • We could just stall anytime there is a WAR

Concept 2: Register Renaming • We could just stall anytime there is a WAR or WAW hazard • BUT, Anti (WAR) and output (WAW) dependencies are “false” • • The dependence is on name/location rather than data Given infinite registers, WAR/WAW can always be eliminated Idea: increase number of physical registers (not visible to programmer) Dynamically rename instrs to use new registers, removes WAR and WAW, but leaves RAW intact • Example • Names: r 1, r 2, r 3 • Locations: p 1, p 2, p 3, p 4, p 5, p 6, p 7 • Original mapping: r 1 p 1, r 2 p 2, r 3 p 3, p 4–p 7 are “free” time Map. Table r 1 p 4 p 4 EECS 470 r 2 p 2 p 2 r 3 p 3 p 5 p 6 Free. List Orig. insns Renamed insns p 4, p 5, p 6, p 7 add sub mul div r 2, r 3, r 1 r 2, r 1, r 3 r 2, r 3 r 1, 4, r 1 p 2, p 3, p 4 p 2, p 4, p 5 p 2, p 5, p 6 p 4, 4, p 7 Lecture 5 Slide 18

Dynamic Scheduling – Full Picture • Dynamic scheduling • Totally in the hardware •

Dynamic Scheduling – Full Picture • Dynamic scheduling • Totally in the hardware • Also called “out-of-order execution” (Oo. O) • Fetch many instructions into instruction window • Use branch prediction to speculate past (multiple) branches • Flush pipeline on branch misprediction • Optional: rename to avoid false dependencies (WAW and WAR) • Execute instructions as soon as possible • Register dependencies are known • Handling memory dependencies more tricky (much more later) • Commit instructions in order • Why? To discuss later • Current machines: 100+ instruction scheduling window EECS 470 Lecture 5 Slide 19

Going Forward: What’s Next • We’ll build this up in steps over the next

Going Forward: What’s Next • We’ll build this up in steps over the next few weeks • “Scoreboarding” - first Oo. O, no register renaming • “Tomasulo’s algorithm” - adds register renaming • Handling precise state and speculation • P 6 -style execution (Intel Pentium Pro) • R 10 k-style execution (MIPS R 10 k) • Handling memory dependencies • Let’s get started! EECS 470 Lecture 5 Slide 20

New Pipeline Diagram Insn ldf X(r 1), f 1 mulf f 0, f 1,

New Pipeline Diagram Insn ldf X(r 1), f 1 mulf f 0, f 1, f 2 stf f 2, Z(r 1) addi r 1, 4, r 1 ldf X(r 1), f 1 mulf f 0, f 1, f 2 stf f 2, Z(r 1) D X W c 1 c 3 c 7 c 8 c 10 c 12 c 16 c 2 c 4+ c 8 c 9 c 11 c 13+ c 17 c 3 c 7 c 9 c 10 c 12 c 16 c 18 • Alternative pipeline diagram • Down: insns • Across: pipeline stages • Decode, e. Xecute, Writeback • In boxes: cycles • '+' means takes multiple cycles • Why? Convenient for out-of-order EECS 470 Lecture 5 Slide 22

Instruction Buffer insn buffer regfile I$ B P D$ D 1 D 2 •

Instruction Buffer insn buffer regfile I$ B P D$ D 1 D 2 • Trick: insn buffer (many names for this buffer) • Basically: a bunch of latches for holding insns • Split D(ecode) into two pieces • Accumulate decoded insns in buffer in-order • Buffer sends insns down rest of pipeline out-of-order EECS 470 Lecture 5 Slide 24

Dispatch and Issue insn buffer regfile I$ B P D$ D S • Dispatch

Dispatch and Issue insn buffer regfile I$ B P D$ D S • Dispatch (D): first part of decode • Allocate slot in insn buffer – New kind of structural hazard (insn buffer is full) • In order: stall back-propagates to younger insns • Issue (S): second part of decode • Send insns from insn buffer to execution units + Out-of-order: wait doesn’t back-propagate to younger insns EECS 470 Lecture 5 Slide 25

Dispatch and Issue with Floating-Point insn buffer regfile I$ B P We often have

Dispatch and Issue with Floating-Point insn buffer regfile I$ B P We often have different "functional units" to execute different types of insts D$ D S E* E* E + E* E/ F-regfile EECS 470 Lecture 5 Slide 26

Scheduling Algorithm I: Scoreboard • Centralized control scheme: insn status explicitly tracked • Insn

Scheduling Algorithm I: Scoreboard • Centralized control scheme: insn status explicitly tracked • Insn buffer: Functional Unit Status Table (FUST) • No register renaming • First implementation: CDC 6600 [1964] • 16 separate non-pipelined functional units (7 int, 4 FP, 5 mem) • No bypassing • Our example: “Simple Scoreboard” • 5 FU: 1 ALU, 1 load, 1 store, 2 FP (3 -cycle, pipelined) EECS 470 Lecture 5 Slide 27

Scoreboard Data Structures • FU Status Table • FU, busy, op, R, R 1,

Scoreboard Data Structures • FU Status Table • FU, busy, op, R, R 1, R 2: destination/source register names • T: destination register tag (FU producing the value) • T 1, T 2: source register tags (FU producing the values) • Register Status Table • T: tag (FU that will write this register) • Tags interpreted as ready-bits • Tag == 0 Value is ready in register file • Tag != 0 Value is not ready, will be supplied by T • Insn status table • S, X bits for all active insns EECS 470 Lecture 5 Slide 28

Simple Scoreboard Data Structures S X Fetched insns Insn R 1 Reg Status R

Simple Scoreboard Data Structures S X Fetched insns Insn R 1 Reg Status R 2 R op FU Status T Regfile value T T 1 == == T 2 == == T CAMs FU • Insn fields and status bits • Tags • Values EECS 470 Lecture 5 Slide 29

Scoreboard Dispatch (D) S X Fetched insns Insn R 1 Reg Status R 2

Scoreboard Dispatch (D) S X Fetched insns Insn R 1 Reg Status R 2 R FU Status op T Regfile value T T 1 == == T 2 == == T FU • Stall for WAW Hazards • Allocate scoreboard entry • Populate T 1 and T 2 using Reg Status • Update Reg Status of destination register EECS 470 Lecture 5 Slide 30

Scoreboard Issue (S) S X Fetched insns Insn R 1 Reg Status R 2

Scoreboard Issue (S) S X Fetched insns Insn R 1 Reg Status R 2 R op T FU Status Regfile value T T 1 == == T 2 == == T FU • Wait for RAW register hazards • Read registers EECS 470 Lecture 5 Slide 31

Issue Policy and Issue Logic • Issue • If multiple insns ready, which one

Issue Policy and Issue Logic • Issue • If multiple insns ready, which one to choose? Issue policy • Oldest first? Safe • Longest latency first? May yield better performance • Select logic: implements issue policy • W 1 priority encoder • W: window size (number of scoreboard entries) EECS 470 Lecture 5 Slide 32

Scoreboard Execute (X) S X Fetched insns Insn R 1 Reg Status R 2

Scoreboard Execute (X) S X Fetched insns Insn R 1 Reg Status R 2 FU Status R op T Regfile value T T 1 == == T 2 == == T FU • Execute insn EECS 470 Lecture 5 Slide 33

Scoreboard Writeback (W) S X Fetched insns Insn R 1 Reg Status R 2

Scoreboard Writeback (W) S X Fetched insns Insn R 1 Reg Status R 2 R FU Status op T Regfile value T T 1 == == T 2 == == T FU • Wait for WAR hazard • Write value into regfile, clear Reg Status entry • Compare tag to waiting insns input tags, match ? clear input tag • Free scoreboard entry EECS 470 Lecture 5 Slide 34

Scoreboard Pipeline • New pipeline structure: F, D, S, X, W Poll: Which stages

Scoreboard Pipeline • New pipeline structure: F, D, S, X, W Poll: Which stages would register renaming directly reduce the number of stalls in? • F (fetch) • Same as it ever was • D (dispatch) • Structural or WAW hazard ? stall : allocate scoreboard entry • S (issue) • RAW hazard ? wait : read registers, go to execute • X (execute) • Execute operation, notify scoreboard when done • W (writeback) • WAR hazard ? wait : write register, free scoreboard entry • W and RAW-dependent S in same cycle • W and structural-dependent D in same cycle EECS 470 Lecture 5 Slide 35

Scoreboard Data Structures Reg Status Insn D S X W ldf X(r 1), f

Scoreboard Data Structures Reg Status Insn D S X W ldf X(r 1), f 1 mulf f 0, f 1, f 2 stf f 2, Z(r 1) addi r 1, 4, r 1 ldf X(r 1), f 1 mulf f 0, f 1, f 2 stf f 2, Z(r 1) FU Status FU busy ALU LD ST FP 1 FP 2 EECS 470 op Reg T f 0 f 1 f 2 r 1 R R 1 R 2 T 1 T 2 no no no Lecture 5 Slide 36

Scoreboard: Cycle 1 Insn Status Insn D ldf X(r 1), f 1 mulf f

Scoreboard: Cycle 1 Insn Status Insn D ldf X(r 1), f 1 mulf f 0, f 1, f 2 stf f 2, Z(r 1) addi r 1, 4, r 1 ldf X(r 1), f 1 mulf f 0, f 1, f 2 stf f 2, Z(r 1) FU Status FU busy ALU LD ST FP 1 FP 2 EECS 470 no yes no no no S X W Reg Status Reg T f 0 f 1 f 2 r 1 c 1 op R R 1 R 2 T 1 T 2 ldf f 1 - r 1 - - LD allocate Lecture 5 Slide 37

Scoreboard: Cycle 2 Insn Status Insn D S c 1 c 2 op R

Scoreboard: Cycle 2 Insn Status Insn D S c 1 c 2 op R R 1 R 2 T 1 T 2 ldf f 1 - r 1 - - mulf f 2 f 0 f 1 - LD ldf X(r 1), f 1 mulf f 0, f 1, f 2 stf f 2, Z(r 1) addi r 1, 4, r 1 ldf X(r 1), f 1 mulf f 0, f 1, f 2 stf f 2, Z(r 1) FU Status FU busy ALU LD ST FP 1 FP 2 EECS 470 no yes no X W Reg Status Reg T f 0 f 1 f 2 r 1 LD FP 1 allocate Lecture 5 Slide 38

Scoreboard: Cycle 3 Insn Status Insn ldf X(r 1), f 1 mulf f 0,

Scoreboard: Cycle 3 Insn Status Insn ldf X(r 1), f 1 mulf f 0, f 1, f 2 stf f 2, Z(r 1) addi r 1, 4, r 1 ldf X(r 1), f 1 mulf f 0, f 1, f 2 stf f 2, Z(r 1) D S X c 1 c 2 c 3 Functional unit status FU busy op R ALU LD ST FP 1 FP 2 EECS 470 no yes yes no ldf stf mulf f 1 f 2 W Reg Status Reg T f 0 f 1 f 2 r 1 R 2 T 1 T 2 f 0 r 1 f 1 FP 1 - LD LD FP 1 allocate Lecture 5 Slide 39

Scoreboard: Cycle 4 Insn Status Insn D S X W c 1 c 2

Scoreboard: Cycle 4 Insn Status Insn D S X W c 1 c 2 c 3 c 4 c 2 c 4 c 3 c 4 op R R 1 R 2 T 1 T 2 addi r 1 - - - ldf X(r 1), f 1 mulf f 0, f 1, f 2 stf f 2, Z(r 1) addi r 1, 4, r 1 ldf X(r 1), f 1 mulf f 0, f 1, f 2 stf f 2, Z(r 1) FU Status FU busy ALU LD ST FP 1 FP 2 EECS 470 yes no Reg Status Reg T f 0 f 1 f 2 r 1 LD FP 1 ALU f 1 written clear allocate free stf mulf f 2 f 0 r 1 f 1 FP 1 - LD f 1 (LD) is ready issue mulf Lecture 5 Slide 40

Scoreboard: Cycle 5 Insn Status Insn D S X W c 1 c 2

Scoreboard: Cycle 5 Insn Status Insn D S X W c 1 c 2 c 3 c 4 c 5 c 2 c 4 c 3 c 5 c 4 op R R 1 R 2 T 1 T 2 addi ldf stf mulf r 1 f 2 f 0 r 1 f 1 FP 1 - ALU - ldf X(r 1), f 1 mulf f 0, f 1, f 2 stf f 2, Z(r 1) addi r 1, 4, r 1 ldf X(r 1), f 1 mulf f 0, f 1, f 2 stf f 2, Z(r 1) FU Status FU busy ALU LD ST FP 1 FP 2 EECS 470 yes yes no Reg Status Reg T c 5 f 0 f 1 f 2 r 1 LD FP 1 ALU allocate Lecture 5 Slide 41

Scoreboard: Cycle 6 Insn Status Insn D S X W c 1 c 2

Scoreboard: Cycle 6 Insn Status Insn D S X W c 1 c 2 c 3 c 4 c 5 c 2 c 4 c 3 c 5+ c 4 c 5 c 6 op R R 1 R 2 T 1 T 2 addi ldf stf mulf r 1 f 2 f 0 r 1 f 1 FP 1 - ALU - ldf X(r 1), f 1 mulf f 0, f 1, f 2 stf f 2, Z(r 1) addi r 1, 4, r 1 ldf X(r 1), f 1 mulf f 0, f 1, f 2 stf f 2, Z(r 1) FU Status FU busy ALU LD ST FP 1 FP 2 EECS 470 yes yes no Reg Status Reg T f 0 f 1 f 2 r 1 LD FP 1 ALU D stall: WAW hazard w/ mulf (f 2) How to tell? Reg. Status[f 2] non-empty Lecture 5 Slide 42

Scoreboard: Cycle 7 Insn Status Insn D S X W c 1 c 2

Scoreboard: Cycle 7 Insn Status Insn D S X W c 1 c 2 c 3 c 4 c 5 c 2 c 4 c 3 c 5+ c 4 c 5 c 6 op R R 1 R 2 T 1 T 2 addi ldf stf mulf r 1 f 2 f 0 r 1 f 1 FP 1 - ALU - ldf X(r 1), f 1 mulf f 0, f 1, f 2 stf f 2, Z(r 1) addi r 1, 4, r 1 ldf X(r 1), f 1 mulf f 0, f 1, f 2 stf f 2, Z(r 1) FU Status FU busy ALU LD ST FP 1 FP 2 EECS 470 yes yes no Reg Status Reg T f 0 f 1 f 2 r 1 LD FP 1 ALU W wait: WAR hazard w/ stf (r 1) How to tell? Untagged r 1 in Fu. Status Requires CAM Lecture 5 Slide 43

Scoreboard: Cycle 8 Insn Status Insn D S X W c 1 c 2

Scoreboard: Cycle 8 Insn Status Insn D S X W c 1 c 2 c 3 c 4 c 5 c 8 c 2 c 4 c 8 c 5 c 3 c 5+ c 4 c 8 op R R 1 R 2 T 1 T 2 addi ldf stf r 1 f 1 - r 1 f 2 r 1 FP 1 ALU - mulf f 2 f 0 f 1 - LD ldf X(r 1), f 1 mulf f 0, f 1, f 2 stf f 2, Z(r 1) addi r 1, 4, r 1 ldf X(r 1), f 1 mulf f 0, f 1, f 2 stf f 2, Z(r 1) FU Status FU busy ALU LD ST FP 1 FP 2 EECS 470 yes yes no yes Reg Status Reg T c 6 f 0 f 1 f 2 r 1 LD FP 1 FP 2 ALU first mulf done (FP 1) W wait f 1 (FP 1) is ready issue stf free allocate Lecture 5 Slide 44

Scoreboard: Cycle 9 Insn Status Insn ldf X(r 1), f 1 mulf f 0,

Scoreboard: Cycle 9 Insn Status Insn ldf X(r 1), f 1 mulf f 0, f 1, f 2 stf f 2, Z(r 1) addi r 1, 4, r 1 ldf X(r 1), f 1 mulf f 0, f 1, f 2 stf f 2, Z(r 1) FU Status FU busy ALU LD ST FP 1 FP 2 EECS 470 no yes D S X W c 1 c 2 c 3 c 4 c 5 c 8 c 2 c 4 c 8 c 5 c 9 c 3 c 5+ c 9 c 6 c 4 c 8 c 9 Reg Status Reg T f 0 f 1 f 2 r 1 LD FP 2 ALU r 1 written clear D stall: structural hazard Fu. Status[ST] op R R 1 R 2 T 1 T 2 ldf stf f 1 - f 2 r 1 - ALU - mulf f 2 f 0 f 1 - LD free r 1 (ALU) is ready issue ldf Lecture 5 Slide 45

Scoreboard: Cycle 10 Insn Status Insn D S ldf X(r 1), f 1 mulf

Scoreboard: Cycle 10 Insn Status Insn D S ldf X(r 1), f 1 mulf f 0, f 1, f 2 stf f 2, Z(r 1) addi r 1, 4, r 1 ldf X(r 1), f 1 mulf f 0, f 1, f 2 stf f 2, Z(r 1) c 1 c 2 c 3 c 4 c 5 c 8 c 10 c 2 c 4 c 8 c 5 c 9 FU Status FU busy op R R 1 R 2 T 1 T 2 ldf stf f 1 - f 2 r 1 FP 2 - mulf f 2 f 0 f 1 - LD ALU LD ST FP 1 FP 2 EECS 470 no yes X W Reg Status Reg T c 3 c 4 c 5+ c 8 c 9 c 10 c 6 c 9 c 10 f 1 f 2 r 1 LD FP 2 W & structural-dependent D in same cycle free, then allocate Lecture 5 Slide 46

In-Order vs. Scoreboard Insn ldf X(r 1), f 1 mulf f 0, f 1,

In-Order vs. Scoreboard Insn ldf X(r 1), f 1 mulf f 0, f 1, f 2 stf f 2, Z(r 1) addi r 1, 4, r 1 ldf X(r 1), f 1 mulf f 0, f 1, f 2 stf f 2, Z(r 1) In-Order D X c 1 c 3 c 7 c 8 c 10 c 12 c 16 c 2 c 4+ c 8 c 9 c 11 c 13+ c 17 W c 3 c 7 c 9 c 10 c 12 c 16 c 18 Scoreboard D S c 1 c 2 c 3 c 4 c 5 c 8 c 10 c 2 c 4 c 8 c 5 c 9 c 11 c 15 X W c 3 c 5+ c 9 c 6 c 10 c 12+ c 16 c 4 c 8 c 10 c 9 c 11 c 15 c 17 • Big speedup? – Only 1 cycle advantage for scoreboard • Why? addi WAR hazard • Scoreboard issued addi earlier (c 8 c 5) • But WAR hazard delayed W until c 9 • Delayed issue of second iteration EECS 470 Lecture 5 Slide 47

In-Order vs. Scoreboard II: Cache Miss Insn ldf X(r 1), f 1 mulf f

In-Order vs. Scoreboard II: Cache Miss Insn ldf X(r 1), f 1 mulf f 0, f 1, f 2 stf f 2, Z(r 1) addi r 1, 4, r 1 ldf X(r 1), f 1 mulf f 0, f 1, f 2 stf f 2, Z(r 1) In-Order D X c 1 c 7 c 11 c 12 c 14 c 16 c 20 c 2+ c 8+ c 12 c 13 c 15 c 17+ c 21 W c 7 c 11 c 13 c 14 c 16 c 20 c 22 Scoreboard D S c 1 c 2 c 3 c 4 c 5 c 6 c 7 c 2 c 8 c 12 c 5 c 13 c 15 c 19 X W c 3+ c 9+ c 13 c 6 c 14 c 16+ c 20 c 8 c 12 c 14 c 13 c 15 c 19 c 21 • Assume • 5 cycle cache miss on first ldf • Ignore FUST structural hazards – Little relative advantage • addi WAR hazard (c 7 c 13) stalls second iteration EECS 470 Lecture 5 Slide 48

Scoreboard Redux • The good + Cheap hardware • Insn. Status + Fu. Status

Scoreboard Redux • The good + Cheap hardware • Insn. Status + Fu. Status + Reg. Status ~ 1 FP unit in area + Pretty good performance • 1. 7 X for FORTRAN (scientific array) programs • The less good – No bypassing • Is this a fundamental problem? – Limited scheduling scope • Structural/WAW hazards delay dispatch – Slow issue of truly-dependent (RAW) insns • WAR hazards delay writeback • Fix with hardware register renaming EECS 470 Lecture 5 Slide 49

Next Time • Add register renaming with Tomasulo's Algorithm • Lingering questions / feedback?

Next Time • Add register renaming with Tomasulo's Algorithm • Lingering questions / feedback? I'll include an anonymous form at the end of every lecture: https: //bit. ly/3 o. Xr 4 Ah EECS 470 That's all for today, let's take 5 and chat about the recent news for those who want to stay We will be discussing sexual assault 50 Lecture 5 Slide 50

In the News • Professor Peter Chen has been arrested pending charges of (essentially)

In the News • Professor Peter Chen has been arrested pending charges of (essentially) child rape • Professor Chen regularly teaches Eng 100, EECS 482 (Operating Systems), is the chief advisor for the CS-Engineering undergrad program, and was interim chair of the department multiple times • We don't know many of the details yet • He has been placed on administrative leave pending more information • My discussion here is not meant to confirm or deny these allegations • This is the most recent of multiple allegations of sexual misconduct by CS faculty in the past few years EECS 470 51 Lecture 5 Slide 51

Impact • However these events resolve, they can have a profoundly destructive impact on

Impact • However these events resolve, they can have a profoundly destructive impact on all of us • Survivors of sexual assault (or their loved ones) still feel trauma • Trust between faculty, staff and students is strained or broken • Take care of yourselves and each other • Feel free to reach out to me any time EECS 470 52 Lecture 5 Slide 52

Resources • CSE information on Reporting Misconduct (guidelines for reporting concerns and misconduct, including

Resources • CSE information on Reporting Misconduct (guidelines for reporting concerns and misconduct, including anonymously, includes a noncomplete list of “responsible employees" (i. e. mandatory reporters, must report any misconduct)) • Sexual Assault Prevention and Awareness Center (SAPAC) · (Support for survivors of sexual assault, 24/7 crisis line) • U-M Counseling and Psychological Services (CAPS) (Provides info on counseling, 24/7 crisis line) • Office of Institutional Equity • Other services: College of Engineering C. A. R. E. Center, Campus Mind Works, Depression Center, Services for Students with Disabilities, UHS, UM Psychiatric Emergency Services EECS 470 53 Lecture 5 Slide 53