Lecture 5 Overview of Superscalar Techniques Cpr E

  • Slides: 23
Download presentation
Lecture 5 Overview of Superscalar Techniques Cpr. E 581 Computer Systems Architecture, Fall 2009

Lecture 5 Overview of Superscalar Techniques Cpr. E 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading: Textbook, Ch. 2. 1 “Complexity-Effective Superscalar Processors”, Ph. D Thesis by Subbarao Palacharla, Ch. 1 1

Sequential Execution Model Any program execution is “correct” if the final architectural states (registers

Sequential Execution Model Any program execution is “correct” if the final architectural states (registers and memory contents) is the same as by sequential execution Single-cycle implementation is intuitively correct If instructions are not executed sequentially, what is “correct” execution? 2

Out-of-order Execution Compared with a sequential execution, an out-of-order execution may 1. 2. Fetch

Out-of-order Execution Compared with a sequential execution, an out-of-order execution may 1. 2. Fetch and execute instructions that should not be executed Execute instructions in an different order 3

Sequential Execution Model A program execution is correct if 1. 2. 3. The same

Sequential Execution Model A program execution is correct if 1. 2. 3. The same set of instructions write to user-visible register and memory; Each instruction receives the same operands as in the sequential execution; and Any register or memory word receives the value of the last write as in the sequential execution 4

Dependences and Correctness Three types of dependences between instructions n n n Control dependence

Dependences and Correctness Three types of dependences between instructions n n n Control dependence Data dependence Name dependence Why do we care dependences? n n Processor hardware can observe those dependences By correctly handling the dependences, the three statements will hold 5

Data Dependence LD MULTI LD SUBD DIVD ADD F 2, 0(R 3) F 0,

Data Dependence LD MULTI LD SUBD DIVD ADD F 2, 0(R 3) F 0, F 2, F 4 F 6, 0(R 2) F 8, F 6, F 2 F 10, F 6 F 12, F 8, F 2 LD 1 LD 2 MULTI SUBD DIVD ADD Note: no branch in this code 6

Data Dependences Instruction J is dependent on I if n n I’s output is

Data Dependences Instruction J is dependent on I if n n I’s output is used by J, or J is dependent on K, and K is dependent on I Loop: L. D ADD. D S. D DADDUI BNE F 0, 0(R 1) F 4, F 0, F 2 F 4, 0(R 1) R 1, #-8 R 1, R 2, LOOP Data Dependence Graph L. D DADDUI ADD. D BNE S. D 7

Data Dependences through registers Load ALU Dependence through memory SW Memory regfile Load ALU

Data Dependences through registers Load ALU Dependence through memory SW Memory regfile Load ALU Br Store ADD r 8, r 9, r 10 BEQ r 8, r 11, loop LW SW r 8, 100(r 9) LW r 10, 100(r 9) 8

Name Dependences Antidependence (WAR): one instruction overwrite a register or memory location that a

Name Dependences Antidependence (WAR): one instruction overwrite a register or memory location that a prior instruction reads LW R 1, 100(R 2) ADD R 2, R 3, R 4 Output dependence (WAW): two instructions write the same register or memory location LW R 1, 100(R 2) Add R 2, R 1, R 2 Add R 1, R 3, R 4 Those dependences can be removed 9

Dependences vs Hazards Dependences are properties of programs Hazards are properties of pipelines Dependences

Dependences vs Hazards Dependences are properties of programs Hazards are properties of pipelines Dependences indicates the potential of hazards Pipeline implementations determine actual hazards and the length of any stall What hazards are exposed by MIPS 5 -stage pipeline? 10

Dynamic Scheduling General idea: when an instruction stalls, look for independent instructions following it

Dynamic Scheduling General idea: when an instruction stalls, look for independent instructions following it DIV. D F 0, F 2, F 4 ADD. D F 10, F 8 SUB. D F 12, F 8, F 14 DIV. D SUB. D ADD. D Instruction window: how far to look ahead Out-of-order execution Respect data dependence What hazards would be exposed? 11

Data Dependence between Operations ALU to ALU SUBD F 8, F 6, F 2

Data Dependence between Operations ALU to ALU SUBD F 8, F 6, F 2 ADD F 6, F 8, F 2 SUBD IF ID EX WB ADD IF ID -EX WB Load and other insts LD F 2, 0(R 3) MULTI F 0, F 2, F 4 LD IF ID EX MEM WB MULTI IF ID --EX WB 12

Dependences between Operations Store to load //R 3+100==R 4? S. D F 6, 100(R

Dependences between Operations Store to load //R 3+100==R 4? S. D F 6, 100(R 3) L. D F 2, 0(R 4) • Register instruction can be detected by matching register index S. D L. D IF IF ID ID EX EX MEM -? WB MEM • Detecting memory dependence is more difficult 13

Dynamic Scheduling L. D MULTI L. D SUB. D DIV. D ADD. D F

Dynamic Scheduling L. D MULTI L. D SUB. D DIV. D ADD. D F 2, 0(R 3) F 0, F 2, F 4 F 6, 0(R 2) F 8, F 6, F 2 F 10, F 6 F 12, F 8, F 2 LD 1 LD 2 MULTI SUBD DIVD ADD How to schedule pipeline operations? 14

Is This Working? Inst IF ID Schd EXE MEM WB L. D 1 2

Is This Working? Inst IF ID Schd EXE MEM WB L. D 1 2 3 4 5 6 MULT 1 2 3 -5 6 -11 - 12 L. D SUB. D 2 2 3 3 4 4 -6 5 7 -8 6 9 7 10 DIV. D 3 4 32 33 Add. D 3 4 11 12 5 -11 12 -31 5 -8 9 -10 Assume (1) two-way issue; (2) FU delay as implied 15

Dynamic Scheduling Implementation Wakeup I 1 I 2 I 3 … I_k SELECT To

Dynamic Scheduling Implementation Wakeup I 1 I 2 I 3 … I_k SELECT To FUs Adapted from UCB CS 252 S 98, Copyright 1998 USB Scoreboarding: 1966: scoreboarding in CDC 6600 Tomasulo: Three years later in IBM 360/91 Introducing register renaming Use tag-based instruction wakeup 16

Name Dependences and Register Renaming Original code: ADD R 3, R 1, R 2

Name Dependences and Register Renaming Original code: ADD R 3, R 1, R 2 SUB R 4, R 3 ADD R 3, R 6, R 7 SUB R 3, R 4 What prevents parallelism? Renamed code: R 3, R 4, R 3 renamed to P 6, P 7, P 8, P 9 sequentially ADD P 6, R 1, R 2 SUB P 7, R 4, P 6 ADD P 8, R 6, R 7 SUB P 9, R 5, P 7 Finally R 3 <= P 9, R 4 <= P 7 17

Register Renaming and Correctness 1. 2. 3. The same set of instructions write to

Register Renaming and Correctness 1. 2. 3. The same set of instructions write to user-visible register and memory; Each instruction receives the same operands as in the sequential execution; and Any register or memory word receives the value of the last write as in the sequential execution 18

Renaming Implementation First proposed in Tomasulo (1969) Use register status table Renamed to reservation

Renaming Implementation First proposed in Tomasulo (1969) Use register status table Renamed to reservation station In other processors (e. g. Alpha 21264, Intel P 4) Use register mapping table No separate architectural/physical registers; no copy-back In P-III Use register alias table Renamed arch. register to physical register Pd Data copied back to arch. register Rd Rs Rt Renaming Ps Pt 19

Branch Prediction and Speculative Execution Modern processors must speculate! n n Branch prediction: SPEC

Branch Prediction and Speculative Execution Modern processors must speculate! n n Branch prediction: SPEC 2 k INT has one branch per seven instructions! Precise interrupt Memory disambiguation More performance-oriented speculations Two disjointed but connected issues: 1. 2. How to make the best prediction What to do when the speculation is wrong 20

Branch Prediction and Speculative Execution Review the three conditions of correctness: 1. 2. 3.

Branch Prediction and Speculative Execution Review the three conditions of correctness: 1. 2. 3. The processor commits the same set of instructions as executed in a sequential processor Any committed instruction receives the same operands (from its parents) as in the sequential execution Any register or memory word receives the value of the last write from the committed instructions and as in the sequential execution 21

Branch Prediction and Speculative Execution Branch prediction – control speculation n n Must predict

Branch Prediction and Speculative Execution Branch prediction – control speculation n n Must predict on branches What to predict Branch direction Branch target address What info can be used n n n PC value Previous branch outputs also use branch pattern in complex branch predictors What building blocks are need n Branch prediction table (BHT), branch target buffer (BTB), pattern registers, and some logics 22

Generic Superscalar Processor Models schedule D-cache FU FU bypass Regfile Wakeup select Rename Fetch

Generic Superscalar Processor Models schedule D-cache FU FU bypass Regfile Wakeup select Rename Fetch Issue queue based commit execute schedule D-cache FU FU Wakeup select bypass Reg ROB Rename Fetch Reservation based commit execute Revised from Paracharla Ph. D thesis 1998 23