CS 152 Computer Architecture and Engineering Lecture 13

  • Slides: 52
Download presentation
CS 152 Computer Architecture and Engineering Lecture 13 Introduction to Pipelining: Datapath and Control

CS 152 Computer Architecture and Engineering Lecture 13 Introduction to Pipelining: Datapath and Control March 3 rd, 2004 John Kubiatowicz (www. cs. berkeley. edu/~kubitron) lecture slides: http: //inst. eecs. berkeley. edu/~cs 152/ 3/17/03 ©UCB Spring 2003 CS 152 / Kubiatowicz Lec 13. 1

Recap: Exceptions user program Exception: System Exception Handler return from exception normal control flow:

Recap: Exceptions user program Exception: System Exception Handler return from exception normal control flow: sequential, jumps, branches, calls, returns ° Exception = unprogrammed control transfer • system takes action to handle the exception - must record the address of the offending instruction - record any other information necessary to return afterwards • returns control to user • must save & restore user state ° Allows constuction of a “user virtual machine” 3/3/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 11. 2

Recap: Precise Exceptions ° Precise state of the machine is preserved as if program

Recap: Precise Exceptions ° Precise state of the machine is preserved as if program executed up to the offending instruction • All previous instructions completed • Offending instruction and all following instructions act as if they have not even started • Same system code will work on different implementations • Position clearly established by IBM • Difficult in the presence of pipelining, out ot order execution, . . . • MIPS takes this position ° Imprecise system software has to figure out what is where and put it all back together ° Performance goals often lead designers to forsake precise interrupts • system software developers, user, markets etc. usually wish they had not done this ° Modern techniques for out of order execution and branch prediction help implement precise interrupts 3/3/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 11. 3

Recap: Sequential Laundry 6 PM 7 8 9 10 11 Midnight Time 30 40

Recap: Sequential Laundry 6 PM 7 8 9 10 11 Midnight Time 30 40 20 T a s k A B O r d e r C D ° Sequential laundry takes 6 hours for 4 loads ° If they learned pipelining, how long would laundry take? 3/3/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 11. 4

Recap: Pipelining Lessons 6 PM 7 8 9 Time 30 40 T a s

Recap: Pipelining Lessons 6 PM 7 8 9 Time 30 40 T a s k O r d e r 40 40 40 20 ° Pipelining doesn’t help latency of single task, it helps throughput of entire workload ° Pipeline rate limited by slowest pipeline stage A ° Multiple tasks operating simultaneously using different resources B ° Potential speedup = Number pipe stages C ° Unbalanced lengths of pipe stages reduces speedup D ° Time to “fill” pipeline and time to “drain” it reduces speedup ° Stall for Dependences 3/3/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 11. 5

Recap: Ideal Pipelining Assume instructions are completely independent! IF DCD IF EX DCD IF

Recap: Ideal Pipelining Assume instructions are completely independent! IF DCD IF EX DCD IF MEM WB EX MEM DCD IF DCD WB Maximum Speedup Number of stages Speedup Time for unpipelined operation Time for longest stage Example: 40 ns data path, 5 stages, Longest stage is 10 ns, Speedup 4 3/3/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 11. 6

The Big Picture: Where are We Now? ° The Five Classic Components of a

The Big Picture: Where are We Now? ° The Five Classic Components of a Computer Processor Input Control Memory Datapath Output ° Today’s Topics: • • • 3/3/04 Recap last lecture/finish datapath Pipelined Control/ Do it yourself Pipelined Control Administrivia Hazards/Forwarding Exceptions Review MIPS R 3000 pipeline ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 11. 7

Can pipelining get us into trouble? ° Yes: Pipeline Hazards • structural hazards: attempt

Can pipelining get us into trouble? ° Yes: Pipeline Hazards • structural hazards: attempt to use the same resource two different ways at the same time - E. g. , combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV) • data hazards: attempt to use item before it is ready - E. g. , one sock of pair in dryer and one in washer; can’t fold until get sock from washer through dryer - instruction depends on result of prior instruction still in the pipeline • control hazards: attempt to make a decision before condition is evaulated - E. g. , washing football uniforms and need to get proper detergent level; need to see after dryer before next load in - branch instructions ° Can always resolve hazards by waiting • pipeline control must detect the hazard • take action (or delay action) to resolve hazards 3/3/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 11. 8

Single Memory is a Structural Hazard Time (clock cycles) Instr 4 Reg Mem Reg

Single Memory is a Structural Hazard Time (clock cycles) Instr 4 Reg Mem Reg Im Reg ALU Instr 3 Im Mem ALU Instr 2 Reg ALU Instr 1 Im ALU O r d e r Load ALU I n s t r. Mem Reg Detection is easy in this case! Fix: Stall Instr 3 fetch. 3/3/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 11. 9

Structural Hazards limit performance ° Example: if 1. 3 memory accesses per instruction and

Structural Hazards limit performance ° Example: if 1. 3 memory accesses per instruction and only one memory access per cycle then • average CPI 1. 3 • otherwise resource is more than 100% utilized 3/3/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 11. 10

Control Hazard Solution #1: Stall Add Beq Reg Im Mem Reg Im Reg Lost

Control Hazard Solution #1: Stall Add Beq Reg Im Mem Reg Im Reg Lost potential ALU Load Im ALU O r d e r Time (clock cycles) ALU I n s t r. Mem Reg ° Stall: wait until decision is clear ° Impact: 2 lost cycles (i. e. 3 clock cycles per branch instruction) => slow ° Move decision to end of decode • save 1 cycle per branch 3/3/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 11. 11

Control Hazard Solution #2: Predict Beq Load Reg Im Mem Reg Im Reg ALU

Control Hazard Solution #2: Predict Beq Load Reg Im Mem Reg Im Reg ALU Add Im ALU O r d e r Time (clock cycles) ALU I n s t r. Mem Reg ° Predict: guess one direction then back up if wrong ° Impact: 0 lost cycles per branch instruction if right, 1 if wrong (right 50% of time) • Need to “Squash” and restart following instruction if wrong • Produce CPI on branch of (1 *. 5 + 2 *. 5) = 1. 5 • Total CPI might then be: 1. 5 *. 2 + 1 *. 8 = 1. 1 (20% branch) ° More dynamic scheme: history of 1 branch ( 90%) 3/3/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 11. 12

Control Hazard Solution #3: Delayed Branch Misc Load Im Mem Reg Im Reg ALU

Control Hazard Solution #3: Delayed Branch Misc Load Im Mem Reg Im Reg ALU Beq Reg ALU Add Im ALU O r d e r Time (clock cycles) ALU I n s t r. Mem Reg ° Delayed Branch: Redefine branch behavior (takes place after next instruction) ° Impact: 0 clock cycles per branch instruction if can find instruction to put in “slot” ( 50% of time) ° As launch more instruction per clock cycle, less useful 3/3/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 11. 13

Data Hazard on r 1: Read after write hazard (RAW) add r 1, r

Data Hazard on r 1: Read after write hazard (RAW) add r 1, r 2, r 3 sub r 4, r 1, r 3 and r 6, r 1, r 7 or r 8, r 1, r 9 xor r 10, r 11 3/3/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 11. 14

Data Hazard on r 1: Read after write hazard (RAW) • Dependencies backwards in

Data Hazard on r 1: Read after write hazard (RAW) • Dependencies backwards in time are hazards Time (clock cycles) IF Reg Dm Im Reg ALU or r 8, r 1, r 9 WB ALU 3/3/04 and r 6, r 1, r 7 MEM ALU O r d e r sub r 4, r 1, r 3 Im EX ALU I n s t r. add r 1, r 2, r 3 ID/RF xor r 10, r 11 ©UCB Spring 2004 Reg Reg Dm Reg CS 152 / Kubiatowicz Lec 11. 15

Data Hazard Solution: Forwarding • “Forward” result from one stage to another Time (clock

Data Hazard Solution: Forwarding • “Forward” result from one stage to another Time (clock cycles) IF Reg Dm Im Reg ALU or r 8, r 1, r 9 WB ALU and r 6, r 1, r 7 MEM ALU O r d e r sub r 4, r 1, r 3 Im EX ALU I n s t r. add r 1, r 2, r 3 ID/RF xor r 10, r 11 Reg • “or” OK if define read/write properly 3/3/04 ©UCB Spring 2004 Reg Reg Dm Reg CS 152 / Kubiatowicz Lec 11. 16

Forwarding (or Bypassing): What about Loads? • Dependencies backwards in time are hazards Time

Forwarding (or Bypassing): What about Loads? • Dependencies backwards in time are hazards Time (clock cycles) IF MEM Reg Dm Im Reg ALU sub r 4, r 1, r 3 Im EX ALU lw r 1, 0(r 2) ID/RF WB Reg Dm Reg • Can’t solve with forwarding: • Must delay/stall instruction dependent on loads 3/3/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 11. 17

Forwarding (or Bypassing): What about Loads • Dependencies backwards in time are hazards Time

Forwarding (or Bypassing): What about Loads • Dependencies backwards in time are hazards Time (clock cycles) IF Reg Im MEM WB Dm Reg Stall Reg ALU sub r 4, r 1, r 3 Im EX ALU lw r 1, 0(r 2) ID/RF Dm Reg • Can’t solve with forwarding: • Must delay/stall instruction dependent on loads 3/3/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 11. 18

Recap: Data Hazards I Fet ch DCD Mem. Op. Fetch IFetch Structural Hazard I

Recap: Data Hazards I Fet ch DCD Mem. Op. Fetch IFetch Structural Hazard I Fet ch DCD Op. Fetch Jump IFetch IF DCD EX IF Mem WB DCD EX IF Store °°° Control Hazard °°° RAW (read after write) Data Hazard Mem WB DCD EX Mem WB IF DCD IF 3/3/04 DCD Exec DCD OF ©UCB Spring 2004 WAW Data Hazard (write after write) OF Ex RS Ex Mem WAR Data Hazard (write after read) CS 152 / Kubiatowicz Lec 11. 19

Designing a Pipelined Processor ° Go back and examine your datapath and control diagram

Designing a Pipelined Processor ° Go back and examine your datapath and control diagram ° associated resources with states ° ensure that flows do not conflict, or figure out how to resolve conflicts ° assert control in appropriate stage 3/3/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 11. 20

Control and Datapath: Split state diag into 5 pieces IR < Mem[PC]; PC <–

Control and Datapath: Split state diag into 5 pieces IR < Mem[PC]; PC <– PC+4; A < R[rs]; B<– R[rt] S <– A + SX; M <– Mem[S] < B M Data Mem B ©UCB Spring 2004 Reg. File S D 3/3/04 If Cond PC < PC+SX; Mem Access A Exec R[rd] <– M; IR Inst. Mem R[rt] <– S; PC Next PC R[rd] <– S; S <– A + SX; Equal S <– A or ZX; Reg File S <– A + B; CS 152 / Kubiatowicz Lec 11. 21

Pipelined Processor (almost) for slides D Reg. File M Data Mem B Mem Access

Pipelined Processor (almost) for slides D Reg. File M Data Mem B Mem Access PC Next PC Equal IRmem WB Ctrl Exec S IRwb IRex A Mem Ctrl Dcd Ctrl Reg File IR Inst. Mem Valid ° What happens if we start a new instruction every cycle? 3/3/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 11. 22

Pipelined Datapath (as in book); hard to read 3/3/04 ©UCB Spring 2004 CS 152

Pipelined Datapath (as in book); hard to read 3/3/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 11. 23

Administrivia ° Midterm I on Wednesday 3/10 • • 5: 30 8: 30 in

Administrivia ° Midterm I on Wednesday 3/10 • • 5: 30 8: 30 in 306 Soda Hall Bring a Calculator! One 8 1/2 by 11 page (both sides) of notes Make up exam: Tuesday 5: 30 – 8: 30 in 606 Soda Hall ° Meet at La. Val’s afterwards for Pizza • North side Lavals! • I’ll Buy! ° Materials through Chapter 5, some of Chapter 6, Appendix A, B & C • Should understand single cycle processor • multicycle processor, including exceptions ° Get moving on Lab 3! • This is a hard lab, since there are so many new things ° Homework 4 out later today 3/3/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 11. 24

Pipelining the Load Instruction Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

Pipelining the Load Instruction Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Clock 1 st lw Ifetch Reg/Dec 2 nd lw Ifetch 3 rd lw Exec Mem Wr Reg/Dec Exec Mem Wr Ifetch Reg/Dec Exec Mem Wr ° The five independent functional units in the pipeline datapath are: • • • 3/3/04 Instruction Memory for the Ifetch stage Register File’s Read ports (bus A and bus. B) for the Reg/Dec stage ALU for the Exec stage Data Memory for the Mem stage Register File’s Write port (bus W) for the Wr stage ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 11. 25

The Four Stages of R type Cycle 1 Cycle 2 R-type Ifetch Reg/Dec Cycle

The Four Stages of R type Cycle 1 Cycle 2 R-type Ifetch Reg/Dec Cycle 3 Cycle 4 Exec Wr ° Ifetch: Instruction Fetch • Fetch the instruction from the Instruction Memory ° Reg/Dec: Registers Fetch and Instruction Decode ° Exec: • ALU operates on the two register operands • Update PC ° Wr: Write the ALU output back to the register file 3/3/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 11. 26

Pipelining the R type and Load Instruction Cycle 1 Cycle 2 Cycle 3 Cycle

Pipelining the R type and Load Instruction Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Clock R-type Ifetch R-type Reg/Dec Exec Ifetch Reg/Dec Load Ops! We have a problem! Wr R-type Ifetch Wr Exec Mem Wr Reg/Dec Exec Wr R-type Ifetch Reg/Dec Exec Wr ° We have pipeline conflict or structural hazard: • Two instructions try to write to the register file at the same time! • Only one write port 3/3/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 11. 27

Important Observation ° Each functional unit can only be used once per instruction °

Important Observation ° Each functional unit can only be used once per instruction ° Each functional unit must be used at the same stage for all instructions: • Load uses Register File’s Write Port during its 5 th stage Load 1 Ifetch 2 Reg/Dec 3 Exec 4 Mem 5 Wr • R type uses Register File’s Write Port during its 4 th stage 1 R-type Ifetch 2 Reg/Dec 3 Exec 4 Wr ° 2 ways to solve this pipeline hazard. 3/3/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 11. 28

Solution 1: Insert “Bubble” into the Pipeline Cycle 1 Cycle 2 Cycle 3 Cycle

Solution 1: Insert “Bubble” into the Pipeline Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Clock Ifetch Load Reg/Dec Exec Ifetch Reg/Dec R-type Ifetch Wr Exec Mem Reg/Dec Exec Wr Wr R-type Ifetch Reg/Dec Pipeline Exec R-type Ifetch Bubble Reg/Dec Ifetch Wr Exec Reg/Dec Wr Exec ° Insert a “bubble” into the pipeline to prevent 2 writes at the same cycle • The control logic can be complex. • Lose instruction fetch and issue opportunity. ° No instruction is started in Cycle 6! 3/3/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 11. 29

Solution 2: Delay R type’s Write by One Cycle ° Delay R type’s register

Solution 2: Delay R type’s Write by One Cycle ° Delay R type’s register write by one cycle: • Now R type instructions also use Reg File’s write port at Stage 5 • Mem stage is a NOOP stage: nothing is being done. 1 R-type Ifetch Cycle 1 Cycle 2 2 Reg/Dec 3 Exec 4 Mem 5 Wr Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Clock R-type Ifetch R-type Reg/Dec Exec Mem Wr Ifetch Reg/Dec Exec Mem Wr Reg/Dec Exec Mem Load R-type Ifetch 3/3/04 ©UCB Spring 2004 Wr CS 152 / Kubiatowicz Lec 11. 30

Modified Control & Datapath IR < Mem[PC]; PC <– PC+4; A < R[rs]; B<–

Modified Control & Datapath IR < Mem[PC]; PC <– PC+4; A < R[rs]; B<– R[rt] Mem[S] < B B D 3/3/04 ©UCB Spring 2004 M S Reg. File A Data Mem R[rd] <– M; IR Inst. Mem R[rt] <– M; PC Next PC R[rd] <– M; M <– Mem[S] if Cond PC < PC+SX; Mem Access M <– S S <– A + SX; Exec M <– S S <– A + SX; Equal S <– A or ZX; Reg File S <– A + B; CS 152 / Kubiatowicz Lec 11. 31

The Four Stages of Store Cycle 1 Cycle 2 Store Ifetch Reg/Dec Cycle 3

The Four Stages of Store Cycle 1 Cycle 2 Store Ifetch Reg/Dec Cycle 3 Cycle 4 Exec Mem Wr ° Ifetch: Instruction Fetch • Fetch the instruction from the Instruction Memory ° Reg/Dec: Registers Fetch and Instruction Decode ° Exec: Calculate the memory address ° Mem: Write the data into the Data Memory 3/3/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 11. 32

The Three Stages of Beq Cycle 1 Cycle 2 Beq Ifetch Reg/Dec Cycle 3

The Three Stages of Beq Cycle 1 Cycle 2 Beq Ifetch Reg/Dec Cycle 3 Cycle 4 Exec Mem Wr ° Ifetch: Instruction Fetch • Fetch the instruction from the Instruction Memory ° Reg/Dec: • Registers Fetch and Instruction Decode ° Exec: • compares the two register operand, • select correct branch target address • latch into PC 3/3/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 11. 33

Control Diagram IR < Mem[PC]; PC < PC+4; A < R[rs]; B<– R[rt] Mem[S]

Control Diagram IR < Mem[PC]; PC < PC+4; A < R[rs]; B<– R[rt] Mem[S] < B B D 3/3/04 ©UCB Spring 2004 M S Reg. File A Data Mem R[rd] <– M; IR Inst. Mem R[rt] <– S; PC Next PC R[rd] <– S; M <– Mem[S] If Cond PC < PC+SX; Mem Access M <– S S <– A + SX; Exec M <– S S <– A + SX; Equal S <– A or ZX; Reg File S <– A + B; CS 152 / Kubiatowicz Lec 11. 34

Recall: Single cycle control! Control Ideal Instruction Memory Rd Rs 5 5 A Clk

Recall: Single cycle control! Control Ideal Instruction Memory Rd Rs 5 5 A Clk PC 32 Rw Ra Rb 32 32 -bit Registers Clk Conditions Rt 5 32 32 ALU Next Address Instruction Address Control Signals Instruction B 32 Data Address Data In Ideal Data Memory Data Out Clk Datapath 3/3/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 11. 35

Data Stationary Control ° The Main Control generates the control signals during Reg/Dec •

Data Stationary Control ° The Main Control generates the control signals during Reg/Dec • Control signals for Exec (Ext. Op, ALUSrc, . . . ) are used 1 cycle later • Control signals for Mem (Mem. Wr Branch) are used 2 cycles later • Control signals for Wr (Memto. Reg Mem. Wr) are used 3 cycles later Reg/Dec Mem. Wr Branch Memto. Reg. Wr 3/3/04 Reg. Dst Mem. Wr Branch Memto. Reg. Wr ©UCB Spring 2004 Mem. Wr Branch Memto. Reg. Wr Wr Mem/Wr Register Reg. Dst Ext. Op ALUSrc ALUOp Mem Ex/Mem Register Main Control ID/Ex Register IF/ID Register Ext. Op ALUSrc ALUOp Exec Memto. Reg. Wr CS 152 / Kubiatowicz Lec 11. 36

A B WB Ctrl PC Next PC ©UCB Spring 2004 M S D 3/3/04

A B WB Ctrl PC Next PC ©UCB Spring 2004 M S D 3/3/04 Mem Ctrl Reg. File im v rw wb Data Mem rs rt v rw wb me Mem Access op v rw wb me ex Exec rt rs Decode IR fun Reg File Inst. Mem Datapath + Data Stationary Control CS 152 / Kubiatowicz Lec 11. 37

Let’s Try it Out 3/3/04 10 lw r 1, r 2(35) 14 add. I

Let’s Try it Out 3/3/04 10 lw r 1, r 2(35) 14 add. I r 2, 3 20 sub r 3, r 4, r 5 24 beq r 6, r 7, 100 30 ori r 8, r 9, 17 34 add r 10, r 11, r 12 100 and r 13, r 14, 15 ©UCB Spring 2004 these addresses are octal CS 152 / Kubiatowicz Lec 11. 38

Start: Fetch 10 n = B PC 10 D Next PC M S 3/3/04

Start: Fetch 10 n = B PC 10 D Next PC M S 3/3/04 ©UCB Spring 2004 Data Mem Access im WB Ctrl Mem Ctrl Exec rs rt Reg File IR n Reg. File n Decode Inst. Mem n IF 10 lw r 1, r 2(35) 14 add. I r 2, 3 20 sub r 3, r 4, r 5 24 beq r 6, r 7, 100 30 ori r 8, r 9, 17 34 add r 10, r 11, r 12 CS 152 / Kubiatowicz 100 and r 13, r 14, 15 Lec 11. 39

B PC 14 D Next PC M S 3/3/04 ©UCB Spring 2004 Data Mem

B PC 14 D Next PC M S 3/3/04 ©UCB Spring 2004 Data Mem = im A WB Ctrl Mem Access rt n Reg. File n Exec 2 Reg File IR n Decode lw r 1, r 2(35) Inst. Mem Fetch 14, Decode 10 ID 10 lw IF 14 add. I r 2, 3 r 1, r 2(35) 20 sub r 3, r 4, r 5 24 beq r 6, r 7, 100 30 ori r 8, r 9, 17 34 add r 10, r 11, r 12 CS 152 / Kubiatowicz 100 and r 13, r 14, 15 Lec 11. 40

PC 20 3/3/04 ©UCB Spring 2004 Data Mem D Next PC M S Mem

PC 20 3/3/04 ©UCB Spring 2004 Data Mem D Next PC M S Mem Access = B WB Ctrl Mem Ctrl Exec r 2 2 n Reg. File lw r 1 Decode rt 35 IR n Reg File add. I r 2, 3 Inst. Mem Fetch 20, Decode 14, Exec 10 EX 10 lw ID 14 add. I r 2, 3 IF 20 sub r 3, r 4, r 5 24 beq r 6, r 7, 100 30 ori r 8, r 9, 17 34 add r 10, r 11, r 12 r 1, r 2(35) CS 152 / Kubiatowicz 100 and r 13, r 14, 15 Lec 11. 41

3/3/04 Data Mem Reg. File lw r 1 24 M 10 lw EX 14

3/3/04 Data Mem Reg. File lw r 1 24 M 10 lw EX 14 add. I r 2, 3 ID 20 sub r 3, r 4, r 5 24 beq r 6, r 7, 100 30 ori r 8, r 9, 17 34 add r 10, r 11, r 12 IF PC Next PC D M Mem Access = B WB Ctrl Mem Ctrl r 2+35 r 2 Exec 4 n add. I r 2, 3 Decode 5 3 IR Reg File sub r 3, r 4, r 5 Inst. Mem Fetch 24, Decode 20, Exec 14, Mem 10 ©UCB Spring 2004 r 1, r 2(35) CS 152 / Kubiatowicz 100 and r 13, r 14, 15 Lec 11. 42

lw r 1 M[r 2+35] r 2+3 30 Note Delayed Branch: always execute ori

lw r 1 M[r 2+35] r 2+3 30 Note Delayed Branch: always execute ori after beq 3/3/04 WB 10 M 14 ©UCB Spring 2004 lw r 1, r 2(35) add. I r 2, 3 EX 20 ID 24 sub r 3, r 4, r 5 beq r 6, r 7, 100 IF 30 ori r 8, r 9, 17 add r 10, r 11, r 12 34 PC Next PC D Data Mem r 5 Mem Access = r 4 WB Ctrl Reg. File add. I r 2 sub r 3 Mem Ctrl 7 Exec 6 Reg File IR Decode beq r 6, r 7 100 Inst. Mem Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10 CS 152 / Kubiatowicz 100 and r 13, r 14, 15 Lec 11. 43

3/3/04 r 1=M[r 2+35] WB Ctrl Reg. File add. I r 2+3 ©UCB Spring

3/3/04 r 1=M[r 2+35] WB Ctrl Reg. File add. I r 2+3 ©UCB Spring 2004 Data Mem sub r 3 100 PC Next PC D Mem Access = r 7 Mem Ctrl r 4 -r 5 r 6 9 Exec beq Decode xx 100 IR Reg File ori r 8, r 9 17 Inst. Mem Fetch 100, Dcd 30, Ex 24, Mem 20, WB 14 10 WB 14 M 20 lw r 1, r 2(35) add. I r 2, 3 sub r 3, r 4, r 5 EX 24 ID 30 beq r 6, r 7, 100 ori r 8, r 9, 17 34 add r 10, r 11, r 12 CS 152 / Kubiatowicz IF 100 and r 13, r 14, 15 Lec 11. 44

Mem Ctrl 3/3/04 10 lw 14 add. I r 2, 3 ___ WB 20

Mem Ctrl 3/3/04 10 lw 14 add. I r 2, 3 ___ WB 20 M 24 EX 30 34 PC Next PC D Fill it in yourself! WB Ctrl Reg. File = Data Mem Exec Reg File IR Mem Access ? Decode Inst. Mem Fetch 104, Dcd 100, Ex 30, Mem 24, WB 20 ©UCB Spring 2004 r 1, r 2(35) sub r 3, r 4, r 5 beq r 6, r 7, 100 ori r 8, r 9, 17 add r 10, r 11, r 12 CS 152 / Kubiatowicz ID 100 and r 13, r 14, 15 Lec 11. 45

IR ? Mem Ctrl Fill it in yourself! Reg. File ___ PC Next PC

IR ? Mem Ctrl Fill it in yourself! Reg. File ___ PC Next PC D ©UCB Spring 2004 Data Mem = Mem Access Reg File ? ? 3/3/04 WB Ctrl ? Exec ? Decode Inst. Mem Fetch 110, Dcd 104, Ex 100, Mem 30, WB 24 10 lw r 1, r 2(35) 14 add. I r 2, 3 20 sub r 3, r 4, r 5 WB 24 M 30 beq r 6, r 7, 100 ori r 8, r 9, 17 34 add r 10, r 11, r 12 CS 152 / Kubiatowicz r 14, 15 EX 100 and r 13, Lec 11. 46

IR ? ? Mem Ctrl Fill it in yourself! Reg. File ___ 10 lw

IR ? ? Mem Ctrl Fill it in yourself! Reg. File ___ 10 lw 14 add. I r 2, 3 20 sub r 3, r 4, r 5 24 beq r 6, r 7, 100 ori r 8, r 9, 17 add r 10, r 11, r 12 WB 30 34 PC Next PC D Data Mem ? Mem Access Reg File ? = 3/3/04 WB Ctrl ? Exec ? Decode Inst. Mem Fetch 114, Dcd 110, Ex 104, Mem 100, WB 30 ©UCB Spring 2004 r 1, r 2(35) CS 152 / Kubiatowicz M 100 and r 13, r 14, 15 Lec 11. 47

Pipelined Processor Bubbles D ° Separate control at each stage Reg. File M Data

Pipelined Processor Bubbles D ° Separate control at each stage Reg. File M Data Mem B Mem Access PC Next PC Equal IRmem WB Ctrl Exec S IRwb IRex A Mem Ctrl Dcd Ctrl Stalls Reg File IR Inst. Mem Valid ° Stalls propagate backwards to freeze previous stages ° Bubbles in pipeline introduced by placing “Noops” into local stage, stall previous stages. 3/3/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 11. 48

Recap: Data Hazards ° Avoid some “by design” • eliminate WAR by always fetching

Recap: Data Hazards ° Avoid some “by design” • eliminate WAR by always fetching operands early (DCD) in pipe • eleminate WAW by doing all WBs in order (last stage, static) ° Detect and resolve remaining ones • stall or forward (if possible) IF DCD EX IF Mem WB DCD EX Mem WB IF DCD IF 3/3/04 RAW Data Hazard Mem WB DCD OF ©UCB Spring 2004 WAW Data Hazard OF Ex RS Ex Mem WAR Data Hazard CS 152 / Kubiatowicz Lec 11. 49

Is CPI = 1 for our pipeline? ° Remember that CPI is an “Average

Is CPI = 1 for our pipeline? ° Remember that CPI is an “Average # cycles/inst IFetch Dcd Exec IFetch Dcd Mem WB Exec Mem IFetch Dcd WB ° CPI here is 1, since the average throughput is 1 instruction every cycle. ° What if there are stalls or multi cycle execution? ° Usually CPI > 1. How close can we get to 1? ? 3/3/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 11. 50

Summary ° What makes it easy • all instructions are the same length •

Summary ° What makes it easy • all instructions are the same length • just a few instruction formats • memory operands appear only in loads and stores ° Hazards limit performance • Structural: need more HW resources • Data: need forwarding, compiler scheduling • Control: early evaluation & PC, delayed branch, prediction ° Data hazards must be handled carefully: • RAW data hazards handled by forwarding • WAW and WAR hazards don’t exist in 5 stage pipeline ° MIPS I instruction set architecture made pipeline visible (delayed branch, delayed load) 3/3/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 11. 51

Summary: Where this class is going ° We’ll build a simple pipeline and look

Summary: Where this class is going ° We’ll build a simple pipeline and look at these issues • Lab 4 Pipelined Processor • Lab 5 With caches ° We’ll talk about modern processors and what’s really hard: • Branches (control hazards) are really hard! • Exception handling • Trying to improve performance with out of order execution, etc. • Trying to get CPI < 1 (Superscalar execution) 3/3/04 ©UCB Spring 2004 CS 152 / Kubiatowicz Lec 11. 52