EECS 470 Control Hazards and ILP Lecture 3

  • Slides: 47
Download presentation
EECS 470 Control Hazards and ILP Lecture 3 – Winter 2014 Slides developed in

EECS 470 Control Hazards and ILP Lecture 3 – Winter 2014 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch of Carnegie Mellon University, Purdue University, University of Michigan, University of Pennsylvania, and University of Wisconsin. 1

Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson,

Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Announcements • HW 1 due today • Project 1 due Tuesday @9 pm. – Note: can submit multiple times, only last one counts. • HW 2 posted. 2

Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson,

Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Getting help Instructor GSIs/IA • Dr. Mark Brehob, brehob • Jonathan Beaumont, jbbeau • William Cunningham, wcunning – Office hours: (4632 Beyster unless noted) • Monday 10 am-noon • Tuesday 3: 30 pm-5: 00 – EECS 2334, which is the 373 lab--you'll need to grab me and let me know you have an EECS 470 question. • Thursday 10: 30 am-noon • I’ll also be around after class for 10 -15 minutes most days. – – – – Monday-- 5: 00 pm-6: 30 (Will) Tuesday-- 5: 00 pm-6: 30 (Jon) Wednesday-- 5: 00 pm-6: 30 (Will) Thursday-- 5: 00 pm-6: 30 (Jon) Friday-- 12: 30 pm-2: 30 (varies) Saturday-- 2: 00 pm-5: 00 (Jon) Sunday-- 6: 00 pm-9: 00 (Will) Don’t forget Piazza. If you don’t have access, let us know ASAP. 3

Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson,

Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Readings – H & P Chapter C, 3. 1, 3 -4 -3. 5 • Reading all of 3 at some point, so may want to read it all now… – Book note: • Book is on-line so you don’t need to buy it. • But exams are open book and open notes… – Won’t need the text, but… 4

Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson,

Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Today • Control hazards (again) • Costs and Power • Instruction Level Parallelism (ILP) and Dynamic Execution 5

Pipelining & Control Hazards Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth,

Pipelining & Control Hazards Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Handling Control Hazards Avoidance (static) – No branches? – Convert branches to predication • Control dependence becomes data dependence Detect and Stall (dynamic) – Stop fetch until branch resolves Speculate and squash (dynamic) – Keep going past branch, throw away instructions if wrong 6

Pipelining & Control Hazards Avoidance Detect and Stall Speculate and Squash Portions © Austin,

Pipelining & Control Hazards Avoidance Detect and Stall Speculate and Squash Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Avoidance Via Predication t 1 a, b t 1, PC+2 x x, #1 y n, d if (a == b) { x++; y = n / d; } sub jnz add div sub t 1 a, b add(t 1) x x, #1 div(t 1) y n, d sub t 1 a, b add t 2 x, #1 div t 3 n, d cmov(t 1) x t 2 cmov(t 1) y t 3 7

Pipelining & Control Hazards Avoidance Detect and Stall Speculate and Squash Portions © Austin,

Pipelining & Control Hazards Avoidance Detect and Stall Speculate and Squash Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Handling Control Hazards: Detect & Stall Detection – In decode, check if opcode is branch or jump Stall – Hold next instruction in Fetch – Pass noop to Decode 8

Pipelining & Control Hazards Avoidance Detect and Stall Speculate and Squash Portions © Austin,

Pipelining & Control Hazards Avoidance Detect and Stall Speculate and Squash Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Handling Control Hazards: Speculate & Squash Speculate “Not-Taken” – Assume branch is not taken Squash – Overwrite opcodes in Fetch, Decode, Execute with noop – Pass target to Fetch 9

Pipelining & Control Hazards Avoidance Detect and Stall Speculate and Squash Portions © Austin,

Pipelining & Control Hazards Avoidance Detect and Stall Speculate and Squash Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Problems with Speculate & Squash Always assumes branch is not taken Can we do better? Yes. – Predict branch direction and target! – Why possible? Program behavior repeats. More on branch prediction to come. . . 10

Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson,

Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch The cost of computing… $ and Watts 11

The cost of computing… $ and Watts Portions © Austin, Brehob, Falsafi, Hill, Hoe,

The cost of computing… $ and Watts Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Digital System Cost is also a key design constraint – Architecture is about trade-offs – Cost plays a major role Huge difference between Cost & Price E. g. , – Higher Price Lower Volume Higher Cost Higher Price – Direct Cost – List vs. Selling Price also depends on the customer $ – College student vs. US Government Embedded Portables Desktops Servers $$$$$$ Supercomputer 12

The cost of computing… $ and Watts Direct Cost Portions © Austin, Brehob, Falsafi,

The cost of computing… $ and Watts Direct Cost Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Cost distribution for a Personal Computer – Processor board 37% • CPU, memory, – I/O devices 37% • Hard disk, DVD, monitor, … – Software 20% – Tower/cabinet 6% Integrated systems account for a substantial fraction of cost 13

The cost of computing… $ and Watts Portions © Austin, Brehob, Falsafi, Hill, Hoe,

The cost of computing… $ and Watts Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch IC Cost Equation Die cost + Test cost + Packaging cost IC cost = Final test yield Wafer cost Die cost = Dies/wafer x Die yield = f(defect density, die area) 14

The cost of computing… $ and Watts Portions © Austin, Brehob, Falsafi, Hill, Hoe,

The cost of computing… $ and Watts Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Why is power a problem in a μP? Introduction • Power used by the μP, vs. system power • Dissipating Heat – Melting (very bad) – Packaging (to cool $) – Heat leads to poorer performance. • Providing Power – Battery – Cost of electricity 15

The cost of computing… $ and Watts Portions © Austin, Brehob, Falsafi, Hill, Hoe,

The cost of computing… $ and Watts Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Where does the juice go in laptops? • Others have measured ~55% processor increase under max load in laptops [Hsu+Kremer, 2002] 16

The cost of computing… $ and Watts Portions © Austin, Brehob, Falsafi, Hill, Hoe,

The cost of computing… $ and Watts Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Why is power a problem? Why worry about power dissipation? Battery life Thermal issues: affect cooling, packaging, reliability, timing Environment 17

The cost of computing… $ and Watts Portions © Austin, Brehob, Falsafi, Hill, Hoe,

The cost of computing… $ and Watts Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Temperature/Current-Constrained Power-Aware Computing Applications Energy-Constrained Computing 18

Exploiting ILP: Basics Measuring ILP Dynamic execution Portions © Austin, Brehob, Falsafi, Hill, Hoe,

Exploiting ILP: Basics Measuring ILP Dynamic execution Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch More Performance Optimization Exploiting Instruction Level Parallelism 19

Exploiting ILP: Basics Measuring ILP Dynamic execution Portions © Austin, Brehob, Falsafi, Hill, Hoe,

Exploiting ILP: Basics Measuring ILP Dynamic execution Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Limitations of Scalar Pipelines Upper Bound on Scalar Pipeline Throughput Limited by IPC=1 “Flynn Bottleneck” Performance Lost Due to Rigid In-order Pipeline Unnecessary stalls 20

Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson,

Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Exploiting ILP: Basics Measuring ILP Dynamic execution Terms • Instruction parallelism – Number of instructions being worked on • Operation Latency – The time (in cycles) until the result of an instruction is available for use as an operand in a subsequent instruction. For example, if the result of an Add instruction can be used as an operand of an instruction that is issued in the cycle after the Add is issued, we say that the Add has an operation latency of one. • Peak IPC – The maximum sustainable number of instructions that can be executed per clock. # Performance modeling for computer architects, C. M. Krishna 21

Exploiting ILP: Basics Measuring ILP Dynamic execution Portions © Austin, Brehob, Falsafi, Hill, Hoe,

Exploiting ILP: Basics Measuring ILP Dynamic execution Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Architectures for Exploiting Instruction-Level Parallelism 22

Exploiting ILP: Basics Measuring ILP Dynamic execution Portions © Austin, Brehob, Falsafi, Hill, Hoe,

Exploiting ILP: Basics Measuring ILP Dynamic execution Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Superscalar Machine 23

Exploiting ILP: Basics Measuring ILP Dynamic execution Portions © Austin, Brehob, Falsafi, Hill, Hoe,

Exploiting ILP: Basics Measuring ILP Dynamic execution Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch What is the real problem? CPI of in-order pipelines degrades very sharply if the machine parallelism is increased beyond a certain point. i. e. , when Nx. M approaches average distance between dependent instructions Forwarding is no longer effective Pipeline may never be full due to frequent dependency stalls! 24

Exploiting ILP: Basics Measuring ILP Dynamic execution addf f 0, f 1, f 2

Exploiting ILP: Basics Measuring ILP Dynamic execution addf f 0, f 1, f 2 mulf f 2, f 3, f 2 subf f 0, f 1, f 4 Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Missed Speedup in In-Order Pipelines 1 2 3 4 5 F D E+ E+ E+ F D d* d* F p* p* 6 7 8 9 10 11 12 13 14 15 16 W E* E* E* W D E+ E+ E+ W What’s happening in cycle 4? – mulf stalls due to RAW hazard • OK, this is a fundamental problem – subf stalls due to pipeline hazard • Why? subf can’t proceed into D because mulf is there • That is the only reason, and it isn’t a fundamental one Why can’t subf go into D in cycle 4 and E+ in cycle 5? 25

Exploiting ILP: Basics Measuring ILP Dynamic execution Portions © Austin, Brehob, Falsafi, Hill, Hoe,

Exploiting ILP: Basics Measuring ILP Dynamic execution Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch The Problem With In-Order Pipelines regfile I$ B P D$ • In-order pipeline – Structural hazard: 1 insn register (latch) per stage • 1 instruction per stage per cycle (unless pipeline is replicated) • Younger instr. can’t “pass” older instr. without “clobbering” it • Out-of-order pipeline – Implement “passing” functionality by removing structural hazard 26

Exploiting ILP: Basics Measuring ILP Dynamic execution Portions © Austin, Brehob, Falsafi, Hill, Hoe,

Exploiting ILP: Basics Measuring ILP Dynamic execution Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch New Pipeline Terminology regfile I$ B P D$ • In-order pipeline – Often written as F, D, X, W (multi-cycle X includes M) – Variable latency • 1 -cycle integer (including mem) • 3 -cycle pipelined FP 27

Exploiting ILP: Basics Measuring ILP Dynamic execution Portions © Austin, Brehob, Falsafi, Hill, Hoe,

Exploiting ILP: Basics Measuring ILP Dynamic execution Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch ILP: Instruction-Level Parallelism ILP is a measure of the amount of inter-dependencies between instructions Average ILP =no. instruction / no. cyc required code 1: ILP = 1 i. e. must execute serially code 2: ILP = 3 i. e. can execute at the same time code 1: r 1 r 2 + 1 r 3 r 1 / 17 r 4 r 0 - r 3 code 2: r 1 r 2 + 1 r 3 r 9 / 17 r 4 r 0 - r 10 28

Exploiting ILP: Basics Measuring ILP Dynamic execution Portions © Austin, Brehob, Falsafi, Hill, Hoe,

Exploiting ILP: Basics Measuring ILP Dynamic execution Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Purported Limits on ILP Weiss and Smith [1984] Sohi and Vajapeyam [1987] Tjaden and Flynn [1970] Tjaden and Flynn [1973] Uht [1986] Smith et al. [1989] Jouppi and Wall [1988] Johnson [1991] Acosta et al. [1986] Wedig [1982] Butler et al. [1991] Melvin and Patt [1991] Wall [1991] Kuck et al. [1972] Riseman and Foster [1972] Nicolau and Fisher [1984] 1. 58 1. 81 1. 86 1. 96 2. 00 2. 40 2. 50 2. 79 3. 00 5. 8 6 7 8 51 90 29

Exploiting ILP: Basics Measuring ILP Dynamic execution ILP=1 Portions © Austin, Brehob, Falsafi, Hill,

Exploiting ILP: Basics Measuring ILP Dynamic execution ILP=1 Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Scope of ILP Analysis r 1 r 2 + 1 r 3 r 1 / 17 r 4 r 0 - r 3 r 11 r 12 + 1 r 13 r 11 / 17 r 14 r 13 - r 20 ILP=2 Out-of-order execution exposes more ILP 30

Exploiting ILP: Basics Measuring ILP Dynamic execution Portions © Austin, Brehob, Falsafi, Hill, Hoe,

Exploiting ILP: Basics Measuring ILP Dynamic execution Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch How Large Must the “Window” Be? 31

Exploiting ILP: Basics Measuring ILP Dynamic execution Portions © Austin, Brehob, Falsafi, Hill, Hoe,

Exploiting ILP: Basics Measuring ILP Dynamic execution Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Dynamic Scheduling – Oo. O Execution • Dynamic scheduling – Totally in the hardware – Also called “out-of-order execution” (Oo. O) • Fetch many instructions into instruction window – Use branch prediction to speculate past (multiple) branches – Flush pipeline on branch misprediction • Rename to avoid false dependencies (WAW and WAR) • Execute instructions as soon as possible – Register dependencies are known – Handling memory dependencies more tricky (much more later) • Commit instructions in order – Any strange happens before commit, just flush the pipeline • Current machines: 100+ instruction scheduling window 32

Exploiting ILP: Basics Measuring ILP Dynamic execution Portions © Austin, Brehob, Falsafi, Hill, Hoe,

Exploiting ILP: Basics Measuring ILP Dynamic execution Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Motivation for Dynamic Scheduling • Dynamic scheduling (out-of-order execution) – Execute instructions in non-sequential order… + Reduce RAW stalls + Increase pipeline and functional unit (FU) utilization – Original motivation was to increase FP unit utilization + Expose more opportunities for parallel issue (ILP) – Not in-order can be in parallel – …but make it appear like sequential execution • Important – But difficult • Next few lectures 33

Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson,

Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Dynamic Scheduling: The Big Picture add sub mul div I$ B P p 2, p 3, p 4 p 2, p 4, p 5 p 2, p 5, p 6 p 4, 4, p 7 regfile insn buffer D D$ S Ready Table t P 2 P 3 P 4 P 5 P 6 P 7 Yes Yes Yes Yes add p 2, p 3, p 4 sub p 2, p 4, p 5 and div p 4, 4, p 7 mul p 2, p 5, p 6 • Instructions fetch/decoded/renamed into Instruction Buffer – Also called “instruction window” or “instruction scheduler” • Instructions (conceptually) check ready bits every cycle – Execute when ready 34

Exploiting ILP: Basics Measuring ILP Dynamic execution Portions © Austin, Brehob, Falsafi, Hill, Hoe,

Exploiting ILP: Basics Measuring ILP Dynamic execution Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Going Forward: What’s Next • We’ll build this up in steps over the next few weeks – Register renaming to eliminate “false” dependencies – “Tomasulo’s algorithm” to implement Oo. O execution – Handling precise state and speculation – Handling memory dependencies • Let’s get started! 35

Dynamic execution Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith,

Dynamic execution Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Dependency vs. Hazard • A dependency exists independent of the hardware. – So if Inst #1’s result is needed for Inst #1000 there is a dependency – It is only a hazard if the hardware has to deal with it. • So in our pipelined machine we only worried if there wasn’t a “buffer” of two instructions between the dependent instructions. 36

Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson,

Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Dynamic execution Hazards Renaming True Data dependencies • True data dependency – RAW – Read after Write R 1=R 2+R 3 R 4=R 1+R 5 • True dependencies prevent reordering – (Mostly) unavoidable 37

Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson,

Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Dynamic execution Hazards Renaming False Data Dependencies • False or Name dependencies – WAW – Write after Write R 1=R 2+R 3 R 1=R 4+R 5 – WAR – Write after Read R 2=R 1+R 3 R 1=R 4+R 5 • False dependencies prevent reordering – Can they be eliminated? (Yes, with renaming!) 38

Dynamic execution Hazards Renaming Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth,

Dynamic execution Hazards Renaming Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Data Dependency Graph: Simple example R 1=MEM[R 2+0] R 2=R 2+4 R 3=R 1+R 4 MEM[R 2+0]=R 3 // A // B // C // D RAW WAR 39

Dynamic execution Hazards Renaming Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth,

Dynamic execution Hazards Renaming Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Data Dependency Graph: More complex example R 1=MEM[R 3+4] R 2=MEM[R 3+8] R 1=R 1*R 2 MEM[R 3+4]=R 1 MEM[R 3+8]=R 1 R 1=MEM[R 3+12] R 2=MEM[R 3+16] R 1=R 1*R 2 MEM[R 3+12]=R 1 MEM[R 3+16]=R 1 // // // A B C D E F G H I J RAW WAR 40

Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson,

Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Dynamic execution Hazards Renaming Eliminating False Dependencies R 1=MEM[R 3+4] R 2=MEM[R 3+8] R 1=R 1*R 2 MEM[R 3+4]=R 1 MEM[R 3+8]=R 1 R 1=MEM[R 3+12] R 2=MEM[R 3+16] R 1=R 1*R 2 MEM[R 3+12]=R 1 MEM[R 3+16]=R 1 // // // A B C D E F G H I J • Well, logically there is no reason for F-J to be dependent on A-E. So…. . • ABFG • CH • DEIJ – Should be possible. • But that would cause either C or H to have the wrong reg inputs • How do we fix this? – Remember, the dependency is really on the name of the register – So… change the register names! 41

Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson,

Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Dynamic execution Hazards Renaming Register Renaming Concept – The register names are arbitrary – The register name only needs to be consistent between writes. R 1= …. = R 1 …. …. = … R 1= …. The value in R 1 is “alive” from when the value is written until the last read of that value. 42

Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson,

Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Dynamic execution Hazards Renaming So after renaming, what happens to the dependencies? P 1=MEM[R 3+4] P 2=MEM[R 3+8] P 3=P 1*P 2 MEM[R 3+4]=P 3 MEM[R 3+8]=P 3 P 4=MEM[R 3+12] P 5=MEM[R 3+16] P 6=P 4*P 5 MEM[R 3+12]=P 6 MEM[R 3+16]=P 6 //A //B //C //D //E //F //G //H //I //J RAW WAR 43

Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson,

Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Dynamic execution Hazards Renaming Register Renaming Approach • Every time an architected register is written we assign it to a physical register – Until the architected register is written again, we continue to translate it to the physical register number – Leaves RAW dependencies intact • It is really simple, let’s look at an example: – Names: r 1, r 2, r 3 – Locations: p 1, p 2, p 3, p 4, p 5, p 6, p 7 – Original mapping: r 1 p 1, r 2 p 2, r 3 p 3, p 4–p 7 are “free” Map. Table Free. List Orig. insns Renamed insns r 1 p 4 p 4 p 4, p 5, p 6, p 7 add sub mul div r 2 p 2 p 2 r 3 p 3 p 5 p 6 r 2, r 3, r 1 r 2, r 1, r 3 r 2, r 3 r 1, 4, r 1 p 2, p 3, p 4 p 2, p 4, p 5 p 2, p 5, p 6 p 4, 4, p 7 44

Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson,

Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Dynamic execution Hazards Renaming R 1=MEM[P 7+4] R 2=MEM[R 3+8] R 1=R 1*R 2 MEM[R 3+4]=R 1 MEM[R 3+8]=R 1 R 1=MEM[R 3+12] R 2=MEM[R 3+16] R 1=R 1*R 2 MEM[R 3+12]=R 1 MEM[R 3+16]=R 1 Arch EECS 470 // // // A B C D E F G H I J P 1=MEM[R 3+4] P 2=MEM[R 3+8] P 3=P 1*P 2 MEM[R 3+4]=P 3 MEM[R 3+8]=P 3 P 4=MEM[R 3+12] P 5=MEM[R 3+16] P 6=P 4*P 5 MEM[R 3+12]=P 6 MEM[R 3+16]=P 6 V? Physical 1 1 2 1 3 1 45 45

Terminology Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi,

Terminology Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch • There a lot of terms and ideas in out-of-order processors. And because of lot of the work was done in parallel, there isn’t a standard set of names for things. r Here we’ve called the table that maps the architected register to a physical register the “map table”. That is probably the m r m m I generally use Intel’s term “Register Alias Table” or RAT. Also “rename table” isn’t an uncommon term for it. • I try to use a mix of terminology in this class so that you can understand others when they are describing something… r EECS 470 It’s not as bad as it sounds, but it is annoying at first. 46

Dynamic execution Hazards Renaming Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth,

Dynamic execution Hazards Renaming Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, Wenisch Register Renaming Hardware • Really simple table (rename table) – Every time an instruction which writes a register is encountered assign it a new physical register number • But there is some complexity – When do you free physical registers? 47