ECE 313 Computer Organization Lecture 18 Pipelined Processor

  • Slides: 37
Download presentation
ECE 313 - Computer Organization Lecture 18 - Pipelined Processor Design 2 Fall 2004

ECE 313 - Computer Organization Lecture 18 - Pipelined Processor Design 2 Fall 2004 Reading: 6. 3 -6. 6, 6. 8 Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 nestorj@lafayette. edu Portions of these slides are derived from: Textbook figures © 1998 Morgan Kaufmann Publishers all rights reserved Tod Amon's COD 2 e Slides © 1998 Morgan Kaufmann Publishers all rights reserved Dave Patterson’s CS 152 Slides - Fall 1997 © UCB Rob Rutenbar’s 18 -347 Slides - Fall 1999 CMU ECE 313 Fall 2004 Lecture 18 - Pipelining 2 other sources as noted 1

Pipelined Datapath with Control Signals ECE 313 Fall 2004 Lecture 18 - Pipelining 2

Pipelined Datapath with Control Signals ECE 313 Fall 2004 Lecture 18 - Pipelining 2 4

Control for Pipelined Datapath Reg. Dst ALUOp[1: 0] ALUSrc Mem. Read Mem. Write Branch

Control for Pipelined Datapath Reg. Dst ALUOp[1: 0] ALUSrc Mem. Read Mem. Write Branch Reg. Write Memto. Reg Source: Book Fig. 6. 29, p 469 ECE 313 Fall 2004 Lecture 18 - Pipelining 2 6

Control for Pipelined Datapath Source: Book Fig. 6. 25, p 401 ECE 313 Fall

Control for Pipelined Datapath Source: Book Fig. 6. 25, p 401 ECE 313 Fall 2004 Lecture 18 - Pipelining 2 7

Datapath and Control Unit ECE 313 Fall 2004 Lecture 18 - Pipelining 2 8

Datapath and Control Unit ECE 313 Fall 2004 Lecture 18 - Pipelining 2 8

Tracking Control Signals - Cycle 1 LW ECE 313 Fall 2004 Lecture 18 -

Tracking Control Signals - Cycle 1 LW ECE 313 Fall 2004 Lecture 18 - Pipelining 2 9

Tracking Control Signals - Cycle 2 SW ECE 313 Fall 2004 LW Lecture 18

Tracking Control Signals - Cycle 2 SW ECE 313 Fall 2004 LW Lecture 18 - Pipelining 2 10

Tracking Control Signals - Cycle 3 1 0 ADD ECE 313 Fall 2004 SW

Tracking Control Signals - Cycle 3 1 0 ADD ECE 313 Fall 2004 SW LW Lecture 18 - Pipelining 2 01 11

Tracking Control Signals - Cycle 4 0 0 1 SUB ECE 313 Fall 2004

Tracking Control Signals - Cycle 4 0 0 1 SUB ECE 313 Fall 2004 ADD SW Lecture 18 - Pipelining 2 LW 12

Tracking Control Signals - Cycle 5 1 1 SUB ECE 313 Fall 2004 ADD

Tracking Control Signals - Cycle 5 1 1 SUB ECE 313 Fall 2004 ADD Lecture 18 - Pipelining 2 SW LW 13

Data Hazards Revisited… } Data hazards occur when data is used before it is

Data Hazards Revisited… } Data hazards occur when data is used before it is stored (Fig. 6. 28) ECE 313 Fall 2004 Lecture 18 - Pipelining 2 15

Data Hazard Solution: Forwarding } Key idea: connect data internally before it's stored (Fig.

Data Hazard Solution: Forwarding } Key idea: connect data internally before it's stored (Fig. 6. 29) ECE 313 Fall 2004 Lecture 18 - Pipelining 2 16

Data Hazard Solution: Forwarding } Add hardware to feed back ALU and MEM results

Data Hazard Solution: Forwarding } Add hardware to feed back ALU and MEM results to both ALU inputs (Fig. 6. 32) ECE 313 Fall 2004 Lecture 18 - Pipelining 2 17

Controlling Forwarding } Need to test when register numbers match in rs, rt, and

Controlling Forwarding } Need to test when register numbers match in rs, rt, and rd fields stored in pipeline registers } "EX" hazard: } EX/MEM - test whether instruction writes register file and examine rd register } ID/EX - test whether instruction reads rs or rt register and matches rd register in EX/MEM } "MEM" hazard: } MEM/WB - test whether instruction writes register file and examine rd (rt) register } ID/EX - test whether instruction reads rs or rt register and matches rd (rt) register in EX/MEM ECE 313 Fall 2004 Lecture 18 - Pipelining 2 18

Forwarding Unit Detail EX Hazard if (EX/MEM. Reg. Write and (EX/MEM. Register. Rd ≠

Forwarding Unit Detail EX Hazard if (EX/MEM. Reg. Write and (EX/MEM. Register. Rd ≠ 0) and (EX/MEM. Register. Rd = ID/EX. Register. Rs)) Forward. A = 10 if (EX/MEM. Reg. Write and (EX/MEM. Register. Rd ≠ 0) and (EX/MEM. Register. Rd = ID/EX. Register. Rt)) Forward. B = 10 ECE 313 Fall 2004 Lecture 18 - Pipelining 2 19

Forwarding Unit Detail MEM Hazard if (MEM/WB. Reg. Write and (MEM/WB. Register. Rd ≠

Forwarding Unit Detail MEM Hazard if (MEM/WB. Reg. Write and (MEM/WB. Register. Rd ≠ 0) and (MEM/WB. Register. Rd = ID/EX. Register. Rs)) Forward. A = 01 if (MEM/WB. Reg. Write and (MEM/WB. Register. Rd ≠ 0) and (MEM/WB. Register. Rd = ID/EX. Register. Rt)) Forward. B = 01 ECE 313 Fall 2004 Lecture 18 - Pipelining 2 20

EX Hazard Complication } What if a register is changed more than once? }

EX Hazard Complication } What if a register is changed more than once? } add $1, $2; } add $1, $3; } add $1, $4; } Answer: forward most recent result (in MEM stage) ECE 313 Fall 2004 Lecture 18 - Pipelining 2 21

Forwarding Unit Detail MEM Hazard Revised if (MEM/WB. Reg. Write and (MEM/WB. Register. Rd

Forwarding Unit Detail MEM Hazard Revised if (MEM/WB. Reg. Write and (MEM/WB. Register. Rd ≠ 0) and (EX/MEM. Register. Rd ≠ ID/EX. Register. Rs) and (MEM/WB. Register. Rd = ID/EX. Register. Rs)) Forward. A = 01 if (MEM/WB. Reg. Write and (MEM/WB. Register. Rd ≠ 0) and (EX/MEM. Register. Rd ≠ ID/EX. Register. Rt) and (MEM/WB. Register. Rd = ID/EX. Register. Rt)) Forward. B = 01 ECE 313 Fall 2004 Lecture 18 - Pipelining 2 22

Forwarding Elaboration } Extra 2 -1 mux needed for immediate instructions Added Mux Fig

Forwarding Elaboration } Extra 2 -1 mux needed for immediate instructions Added Mux Fig (6. 33) ECE 313 Fall 2004 Lecture 18 - Pipelining 2 23

Data Hazards and Stalls } We still have to stall when register is loaded

Data Hazards and Stalls } We still have to stall when register is loaded from memory and used in following instruction (Fig. 6. 34) ECE 313 Fall 2004 Lecture 18 - Pipelining 2 24

Data Hazards and Stalls } Add a hazard detection unit to detect this condition

Data Hazards and Stalls } Add a hazard detection unit to detect this condition and stall Typo: Should read AND (Fig. 6. 35) ECE 313 Fall 2004 Lecture 18 - Pipelining 2 25

Pipelined Processor with Hazard Detection (Fig. 6. 36) ECE 313 Fall 2004 Lecture 18

Pipelined Processor with Hazard Detection (Fig. 6. 36) ECE 313 Fall 2004 Lecture 18 - Pipelining 2 26

Data Transfer Instructions Binary Representation 6 bits 5 bits 16 bits op rs rt

Data Transfer Instructions Binary Representation 6 bits 5 bits 16 bits op rs rt offset } Used for load, store instructions } } op: Basic operation of the instruction (opcode) Address rs: first register source operand source for sw rt: second register source operand destination for lw offset: 16 -bit signed address offset (-32, 768 to +32, 767) } Also called “I-Format” or “I-Type” instructions ECE 313 Fall 2004 Lecture 18 - Pipelining 2 27

Hazard Detection Unit - Control Detail if (ID/EX. Mem. Read and ((ID/EX. Register. Rt

Hazard Detection Unit - Control Detail if (ID/EX. Mem. Read and ((ID/EX. Register. Rt = IF/ID. Register. Rs) or ((ID/EX. Register. Rt = IF/ID. Register. Rt))) stall ECE 313 Fall 2004 Lecture 18 - Pipelining 2 28

Hazard detection unit - what happens } MUX zeros out control signals for instruction

Hazard detection unit - what happens } MUX zeros out control signals for instruction in ID } "squashes” the instruction } “no-op” propagates through following stages } IF/ID holds stalled instruction until next clock cycle } PC holds current value until next clock cycle (reloads first instruction) ECE 313 Fall 2004 Lecture 18 - Pipelining 2 29

Branch Hazards } Just stalling for each branch is not practical } Common assumption:

Branch Hazards } Just stalling for each branch is not practical } Common assumption: branch not taken } When assumption fails: flush three instructions (Fig. 6. 37) ECE 313 Fall 2004 Lecture 18 - Pipelining 2 30

Reducing Branch Delay } Key idea: move branch logic to ID stage of pipeline

Reducing Branch Delay } Key idea: move branch logic to ID stage of pipeline } New adder calculates branch target (PC + 4 + extend(IMM) << 2) } New hardware tests rs == rt after register read } Add flush signal to squash instruction in IF/ID register } Reduced penalty (1 cycle) when branch taken } Example: Figure 6. 38, p. 420 ECE 313 Fall 2004 Lecture 18 - Pipelining 2 31

Pipelined Processor Branch Hardware in ID (Old Fig. 6. 51) ECE 313 Fall 2004

Pipelined Processor Branch Hardware in ID (Old Fig. 6. 51) ECE 313 Fall 2004 Lecture 18 - Pipelining 2 32

Pipelining Outline } Introduction } Pipelined Processor Design } } } Datapath Control Dealing

Pipelining Outline } Introduction } Pipelined Processor Design } } } Datapath Control Dealing with Hazards & Forwarding Branch Prediction Exceptions Performance } Advanced Pipelining } Superscalar } Dynamic Pipelining } Examples ECE 313 Fall 2004 Lecture 18 - Pipelining 2 33

Branch Prediction } Key idea: instead of always assuming branch not taken, use a

Branch Prediction } Key idea: instead of always assuming branch not taken, use a prediction based on previous history } Branch history table: small memory • index using lower bits instruction address • save “what happened” on last execution – branch taken OR – branch not taken } Use history to make prediction ECE 313 Fall 2004 Lecture 18 - Pipelining 2 34

More about Branch Prediction } Consider nested loops: for (i=1; i<M; i++) { for

More about Branch Prediction } Consider nested loops: for (i=1; i<M; i++) { for (j=1; j<N; j++) {. . . } } oloop: . . iloop: . . bne . . . $1, $2, iloop bne $3, $4, oloop } Prediction fails on first and last branch } More history can improve performance ECE 313 Fall 2004 Lecture 18 - Pipelining 2 35

Branch Prediction w/2 -Bit History } Key idea: must be wrong twice before changing

Branch Prediction w/2 -Bit History } Key idea: must be wrong twice before changing prediction Taken Not taken Predict taken Taken Not taken Predict not taken Taken Not taken (Fig. 6. 39) ECE 313 Fall 2004 Lecture 18 - Pipelining 2 36

Pipelining Outline } Introduction } Pipelined Processor Design } } } Datapath Control Dealing

Pipelining Outline } Introduction } Pipelined Processor Design } } } Datapath Control Dealing with Hazards & Forwarding Branch Prediction Exceptions Performance } Advanced Pipelining } Superscalar } Dynamic Pipelining } Examples ECE 313 Fall 2004 Lecture 18 - Pipelining 2 37

Pipelining and Exceptions } Exceptions require suspension of execution } Complicating factors } Several

Pipelining and Exceptions } Exceptions require suspension of execution } Complicating factors } Several instructions are in pipeline } Exception may occur before instruction is complete } Must flush pipeline to suspend execution, but may lose information about the exception ECE 313 Fall 2004 Lecture 18 - Pipelining 2 38

Pipelining and Exceptions (cont’d) (Fig. 6. 42, old 6. 55) ECE 313 Fall 2004

Pipelining and Exceptions (cont’d) (Fig. 6. 42, old 6. 55) ECE 313 Fall 2004 Lecture 18 - Pipelining 2 39

Performance of the Pipelined Implementation } Use “gcc” instr. mix to calculate CPI lw

Performance of the Pipelined Implementation } Use “gcc” instr. mix to calculate CPI lw sw R-type branch jump 25% 10% 52% 11% 2% 1 cycle (2 cycles when load-use hazard) 1 cycle (2 when prediction wrong) 2 cycles } Assumptions: } 50% of load instructions are followed by immed. use } 25% of branch predictions are wrong } Calculating CPI } CPI = (1. 5 cycles * 0. 25) + (1 cycle * 0. 10) + (1 cycle * 0. 52) + (1. 25 cycles * 0. 11) + (2 cycles * 0. 02) } CPI = 1. 17 cycles per instruction ECE 313 Fall 2004 Lecture 18 - Pipelining 2 42

Performance of the Pipelined Implementation (cont’d) } Calculate the average execution time: Pipelined 1.

Performance of the Pipelined Implementation (cont’d) } Calculate the average execution time: Pipelined 1. 17 CPI * 200 ps/clock = Single-Cycle 1 CPI * 600 ps/clock Multicycle 4. 12 CPI * 200 ps / clock 234 ps = 600 ps = 824 ps } Speedup of pipelined implementation } 2. 56 X faster than single cycle } 3. 4 X faster than multicycle } CPI may differ as instruction mix changes, id est, depending on the performance benchmarks ECE 313 Fall 2004 Lecture 18 - Pipelining 2 43