Cp E 442 Designing a Pipeline Processor lect

  • Slides: 41
Download presentation
Cp. E 442 Designing a Pipeline Processor (lect. II) CPE 442 hazards. 1 Introduction

Cp. E 442 Designing a Pipeline Processor (lect. II) CPE 442 hazards. 1 Introduction to Computer Architecture

Outline of Today’s Lecture ° Recap and Introduction (5 minutes) ° Introduction to Hazards

Outline of Today’s Lecture ° Recap and Introduction (5 minutes) ° Introduction to Hazards (15 minutes) ° Forwarding (25 minutes) ° 1 cycle Load Delay (5 minutes) ° 1 cycle Branch Delay (15 minutes) ° What makes pipelining hard ° Summary (5 minutes) CPE 442 hazards. 2 Introduction to Computer Architecture

Review: Single Cycle, Multiple Cycle, vs. Pipeline Cycle 1 Cycle 2 Clk Single Cycle

Review: Single Cycle, Multiple Cycle, vs. Pipeline Cycle 1 Cycle 2 Clk Single Cycle Implementation: Load Store Waste Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk Multiple Cycle Implementation: Load Ifetch Reg Exec Mem Wr Store Ifetch Reg Exec Mem R-type Ifetch Pipeline Implementation: Load Ifetch Reg Store Ifetch Exec Mem Wr Reg Exec Mem R-type Ifetch CPE 442 hazards. 3 Reg Exec Wr Mem Wr Introduction to Computer Architecture

Review: A Pipelined Datapath Clk Ifetch Reg/Dec Ext. Op Reg. Wr Mem ALUOp Wr

Review: A Pipelined Datapath Clk Ifetch Reg/Dec Ext. Op Reg. Wr Mem ALUOp Wr Branch 1 0 PC Ra Rt RFile Rw Di Rd 0 Data Mem RA Do WA Di 1 0 1 Reg. Dst CPE 442 hazards. 4 Exec Unit Zero Mux Rt Rb Imm 16 bus. A bus. B Mem/Wr Register Rs ID/Ex Register IUnit I Imm 16 IF/ID Register A PC+4 Ex/Mem Register PC+4 ALUSrc Mem. Wr Memto. Reg Introduction to Computer Architecture

Review: Pipeline Control “Data Stationary Control” ° The Main Control generates the control signals

Review: Pipeline Control “Data Stationary Control” ° The Main Control generates the control signals during Reg/Dec • Control signals for Exec (Ext. Op, ALUSrc, . . . ) are used 1 cycle later • Control signals for Mem (Mem. Wr Branch) are used 2 cycles later • Control signals for Wr (Memto. Reg Mem. Wr) are used 3 cycles later Reg/Dec Mem. Wr Branch Memto. Reg. Wr CPE 442 hazards. 5 Reg. Dst Mem. Wr Branch Memto. Reg Reg. Wr Wr Mem/Wr Register Reg. Dst Ext. Op ALUSrc ALUOp Mem Ex/Mem Register Main Control ID/Ex Register IF/ID Register Ext. Op ALUSrc ALUOp Exec Memto. Reg. Wr Introduction to Computer Architecture

Review: Pipeline Summary ° Pipeline Processor: • Natural enhancement of the multiple clock cycle

Review: Pipeline Summary ° Pipeline Processor: • Natural enhancement of the multiple clock cycle processor • Each functional unit can only be used once per instruction • If a instruction is going to use a functional unit: - it must use it at the same stage as all other instructions • Pipeline Control: - Each stage’s control signal depends ONLY on the instruction that is currently in that stage CPE 442 hazards. 6 Introduction to Computer Architecture

Outline of Today’s Lecture ° Recap and Introduction (5 minutes) ° Introduction to Hazards

Outline of Today’s Lecture ° Recap and Introduction (5 minutes) ° Introduction to Hazards ° Forwarding (25 minutes) ° 1 cycle Load Delay (5 minutes) ° 1 cycle Branch Delay (15 minutes) ° What makes pipelining hard ° Summary (5 minutes) CPE 442 hazards. 7 Introduction to Computer Architecture

Introduction to Hazards ° Limits to pipelining: Hazards prevent next instruction from executing during

Introduction to Hazards ° Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle • structural hazards: HW cannot support this combination of instructions • data hazards: instruction depends on result of prior instruction still in the pipeline • control hazards: pipelining of branches & other instructions. Common solution is to stall the pipeline until the hazardbubbles” in the pipeline CPE 442 hazards. 8 Introduction to Computer Architecture

A Single Memory is a Structural Hazard Time (clock cycles) Instr 4 CPE 442

A Single Memory is a Structural Hazard Time (clock cycles) Instr 4 CPE 442 hazards. 9 Reg Mem Reg Mem Reg ALU Instr 3 Reg ALU Instr 2 Mem ALU Instr 1 Reg ALU O r d e r Load Mem ALU I n s t r. Mem Reg Introduction to Computer Architecture

Option 1: Stall to resolve Memory Structural Hazard Time (clock cycles) Mem Instr 2

Option 1: Stall to resolve Memory Structural Hazard Time (clock cycles) Mem Instr 2 CPE 442 hazards. 10 Reg Mem Reg Mem bubble Mem Reg ALU Instr 4 Reg ALU Instr 3(stall) Mem ALU Instr 1 Reg ALU O r d e r Load Mem ALU I n s t r. Mem Reg Introduction to Computer Architecture

Option 2: Duplicate to Resolve Structural Hazard • Separate Instruction Cache (Im) & Data

Option 2: Duplicate to Resolve Structural Hazard • Separate Instruction Cache (Im) & Data Cache (Dm) Time (clock cycles) Instr 4 CPE 442 hazards. 11 Reg Dm Im Reg ALU Instr 3 Im ALU Instr 2 Dm ALU Instr 1 Reg ALU O r d e r Load Im ALU I n s t r. Reg Reg Dm Reg Introduction to Computer Architecture

Data Hazard on r 1 add r 1 , r 2, r 3 sub

Data Hazard on r 1 add r 1 , r 2, r 3 sub r 4, r 1 , r 3 and r 6, r 1 , r 7 or r 8, r 1 , r 9 xor r 10, r 11 CPE 442 hazards. 12 Introduction to Computer Architecture

Data Hazard on r 1: (Figure 6. 30, page 397, P&H) • Dependencies backwards

Data Hazard on r 1: (Figure 6. 30, page 397, P&H) • Dependencies backwards in time are hazards Time (clock cycles) IF CPE 442 hazards. 13 Dm Im Reg ALU xor r 10, r 11 Reg ALU or r 8, r 1, r 9 WB ALU and r 6, r 1, r 7 MEM ALU O r d e r sub r 4, r 1, r 3 Im EX ALU I n s t r. add r 1, r 2, r 3 ID/RF Reg Reg Dm Reg Introduction to Computer Architecture

Option 1: HW Stalls to Resolve Data Hazard • Dependencies backwards in time are

Option 1: HW Stalls to Resolve Data Hazard • Dependencies backwards in time are hazards Time (clock cycles) IF xor r 10, r 11 CPE 442 hazards. 14 Reg bubble Reg Im Dm Reg Im Reg ALU or r 8, r 1, r 9 Im Dm WB ALU and r 6, r 1, r 7 Reg MEM ALU O r d e r sub r 4, r 1, r 3 Im EX ALU I n s t r. add r 1, r 2, r 3 ID/RF Im Reg Dm Introduction to Computer Architecture

But recall use of “Data Stationary Control” ° The Main Control generates the control

But recall use of “Data Stationary Control” ° The Main Control generates the control signals during Reg/Dec • Control signals for Exec (Ext. Op, ALUSrc, . . . ) are used 1 cycle later • Control signals for Mem (Mem. Wr Branch) are used 2 cycles later • Control signals for Wr (Memto. Reg Mem. Wr) are used 3 cycles later Reg/Dec Mem. Wr Branch Memto. Reg. Wr CPE 442 hazards. 15 Reg. Dst Mem. Wr Branch Memto. Reg Reg. Wr Wr Mem/Wr Register Reg. Dst Ext. Op ALUSrc ALUOp Mem Ex/Mem Register Main Control ID/Ex Register IF/ID Register Ext. Op ALUSrc ALUOp Exec Memto. Reg. Wr Introduction to Computer Architecture

Option 1: How HW really stalls pipeline • HW doesn’t change PC => keeps

Option 1: How HW really stalls pipeline • HW doesn’t change PC => keeps fetching same instruction & sets control signals to to benign values (0) Time (clock cycles) IF stall and r 6, r 1, r 7 CPE 442 hazards. 16 Im Dm WB Reg bubble bubble Im bubble Im Reg Im Dm Reg ALU stall sub r 4, r 1, r 3 Reg MEM ALU O r d e r stall Im EX ALU I n s t r. add r 1, r 2, r 3 ID/RF Reg Dm Introduction to Computer Architecture

Option 2: SW inserts indepdendent instructions • Worst case inserts NOP instructions Time (clock

Option 2: SW inserts indepdendent instructions • Worst case inserts NOP instructions Time (clock cycles) IF Im Reg Dm Reg ALU CPE 442 hazards. 17 Dm ALU and r 6, r 1, r 7 Reg ALU nop sub r 4, r 1, r 3 WB ALU nop MEM ALU O r d e r nop Im EX ALU I n s t r. add r 1, r 2, r 3 ID/RF Reg Im Reg Reg Dm Introduction to Computer Architecture

Outline of Today’s Lecture ° Recap and Introduction (5 minutes) ° Introduction to Hazards

Outline of Today’s Lecture ° Recap and Introduction (5 minutes) ° Introduction to Hazards (15 minutes) ° Forwarding ° 1 cycle Load Delay (5 minutes) ° 1 cycle Branch Delay (15 minutes) ° What makes pipelining hard ° Summary (5 minutes) CPE 442 hazards. 18 Introduction to Computer Architecture

Option 3 Insight: Data is available! (Figure 6. 35, page 415, P&H) • Pipeline

Option 3 Insight: Data is available! (Figure 6. 35, page 415, P&H) • Pipeline registers already contain needed data Time (clock cycles) IF CPE 442 hazards. 19 Dm Im Reg ALU xor r 10, r 11 Reg ALU or r 8, r 1, r 9 WB ALU and r 6, r 1, r 7 MEM ALU O r d e r sub r 4, r 1, r 3 Im EX ALU I n s t r. add r 1, r 2, r 3 ID/RF Reg Reg Dm Reg Introduction to Computer Architecture

HW Change for “Forwarding” (Bypassing): • Increase multiplexors to add paths from pipeline registers

HW Change for “Forwarding” (Bypassing): • Increase multiplexors to add paths from pipeline registers • Assumes register read during write gets new value (otherwise more results to be forwarded) CPE 442 hazards. 20 Introduction to Computer Architecture

Complete data Path with Hazard detection and Forwarding Figure 6. 41 in the text

Complete data Path with Hazard detection and Forwarding Figure 6. 41 in the text CPE 442 hazards. 21 Introduction to Computer Architecture

Outline of Today’s Lecture ° Recap and Introduction (5 minutes) ° Introduction to Hazards

Outline of Today’s Lecture ° Recap and Introduction (5 minutes) ° Introduction to Hazards (15 minutes) ° Forwarding (25 minutes) ° 1 cycle Load Delay ° 1 cycle Branch Delay (15 minutes) ° What makes pipelining hard ° Summary (5 minutes) CPE 442 hazards. 22 Introduction to Computer Architecture

From Last Lecture: The Delay Load Phenomenon Cycle 1 Cycle 2 Cycle 3 Cycle

From Last Lecture: The Delay Load Phenomenon Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Clock I 0: Load Ifetch Plus 1 Reg/Dec Exec Mem Wr Ifetch Reg/Dec Exec Mem Plus 2 Plus 3 Plus 4 Wr ° Although Load is fetched during Cycle 1: • The data is NOT written into the Reg File until the end of Cycle 5 • We cannot read this value from the Reg File until Cycle 6 • 3 -instruction delay before the load take effect CPE 442 hazards. 23 Introduction to Computer Architecture

Forwarding reduces Data Hazard to 1 cycle: (Figure 6. 47, page 420 P&H) Time

Forwarding reduces Data Hazard to 1 cycle: (Figure 6. 47, page 420 P&H) Time (clock cycles) IF CPE 442 hazards. 24 Reg Dm Im Reg ALU or r 8, r 1, r 9 WB ALU and r 6, r 1, r 7 MEM ALU O r d e r sub r 4, r 1, r 6 Im EX ALU I n s t r. lw r 1, 0(r 2) ID/RF Reg Reg Dm Reg Introduction to Computer Architecture

Option 1: HW Stalls to Resolve Data Hazard • “Interlock”: checks for hazard &

Option 1: HW Stalls to Resolve Data Hazard • “Interlock”: checks for hazard & stalls Time (clock cycles) IF or r 8, r 1, r 9 CPE 442 hazards. 25 Reg bubble Im Reg Dm Im Reg ALU and r 6, r 1, r 7 Im Dm WB ALU sub r 4, r 1, r 3 Reg MEM ALU O r d e r stall Im EX ALU I n s t r. lw r 1, 0(r 2) ID/RF Reg Dm Reg Introduction to Computer Architecture

Option 2: SW inserts independent instructions • Worst case inserts NOP instructions • MIPS

Option 2: SW inserts independent instructions • Worst case inserts NOP instructions • MIPS I solution: No HW checking Time (clock cycles) IF CPE 442 hazards. 26 Reg Dm Im Reg ALU or r 8, r 1, r 9 Reg ALU and r 6, r 1, r 7 Im Dm WB ALU sub r 4, r 1, r 3 Reg MEM ALU O r d e r nop Im EX ALU I n s t r. lw r 1, 0(r 2) ID/RF Reg Reg Dm Reg Introduction to Computer Architecture

Software Scheduling to Avoid Load Hazards Try producing fast code for a = b

Software Scheduling to Avoid Load Hazards Try producing fast code for a = b + c; d = e – f; assuming a, b, c, d , e, and f in memory. Slow code: LW Rb, b LW Rc, c ADD Ra, Rb, Rc SW a, Ra LW Re, e LW Rf, f SUB Rd, Re, Rf SW d, Rd CPE 442 hazards. 27 Introduction to Computer Architecture

Slow code: CPE 442 hazards. 29 Fast code: LW Rb, b LW Rc, c

Slow code: CPE 442 hazards. 29 Fast code: LW Rb, b LW Rc, c ADD Ra, Rb, Rc LW Re, e SW a, Ra ADD Ra, Rb, Rc LW Re, e LW Rf, f SW a, Ra LW Rf, f SUB Rd, Re, Rf SW d, Rd Introduction to Computer Architecture

Outline of Today’s Lecture ° Recap and Introduction (5 minutes) ° Introduction to Hazards

Outline of Today’s Lecture ° Recap and Introduction (5 minutes) ° Introduction to Hazards (15 minutes) ° Forwarding (25 minutes) ° 1 cycle Load Delay (5 minutes) ° 1 cycle Branch Delay ° What makes pipelining hard ° Summary (5 minutes) CPE 442 hazards. 30 Introduction to Computer Architecture

From Last Lecture: The Delay Branch Phenomenon Cycle 4 Cycle 5 Cycle 6 Cycle

From Last Lecture: The Delay Branch Phenomenon Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Cycle 11 Clk 12: Beq Ifetch Reg/Dec Exec (target is 1000) 16: R-type Ifetch Reg/Dec 20: R-type Ifetch 24: R-type Mem Wr Exec Mem Wr Reg/Dec Exec Mem Wr Ifetch Reg/Dec Exec Mem 1000: Target of Br Wr ° Although Beq is fetched during Cycle 4: • Target address is NOT written into the PC until the end of Cycle 7 • Branch’s target is NOT fetched until Cycle 8 • 3 -instruction delay before the branch take effect CPE 442 hazards. 31 Introduction to Computer Architecture

Control Hazard on Branches: 3 stage stall CPE 442 hazards. 32 Introduction to Computer

Control Hazard on Branches: 3 stage stall CPE 442 hazards. 32 Introduction to Computer Architecture

Branch Stall Impact ° If CPI = 1, 30% branch, Stall 3 cycles =>

Branch Stall Impact ° If CPI = 1, 30% branch, Stall 3 cycles => new CPI = 1. 9! ° 2 part solution: • Determine branch taken or not sooner, AND • Compute taken branch address earlier ° Solution Option 1: • Move Zero test to ID/RF stage • Adder to calculate new PC in ID/RF stage • 1 clock cycle penalty for branch vs. 3 CPE 442 hazards. 33 Introduction to Computer Architecture

Option 1: move HW forward to reduce branch delay Data Path before change Instruction

Option 1: move HW forward to reduce branch delay Data Path before change Instruction Fetch CPE 442 hazards. 34 Instr. Decode Reg. Fetch Execute Addr. Calc. Memory Access Write Back Introduction to Computer Architecture

Branch Delay now 1 clock cycle Data Path after change Memory Instruction Instr. Decode

Branch Delay now 1 clock cycle Data Path after change Memory Instruction Instr. Decode Execute Fetch Reg. Fetch Addr. Calc. Access CPE 442 hazards. 35 Write Back Introduction to Computer Architecture

Option 2: No Stalls, Define Branch as Delayed, insert instruction after the branch and

Option 2: No Stalls, Define Branch as Delayed, insert instruction after the branch and allow it to execute always, ° Worst case, SW inserts NOP into branch delay if no instruction can be found ° Where to get instructions to fill branch delay slot? • Before branch instruction, example sw r 1, 0(r 2); beqd r 0, r 2, T change to, beqd r 0, r 2, T; sw r 1, 0(r 2) • From the target address: only valuable when branch • From fall through: only valuable when don’t branch ° Compiler effectiveness for single branch delay slot: • Fills about 60% of branch delay slots • About 80% of instructions executed in branch delay slots useful in computation • about 50% (60% x 80%) of slots usefully filled CPE 442 hazards. 36 Introduction to Computer Architecture

Complete data Path with Hazard detection and Forwarding Figure 6. 41 in the text

Complete data Path with Hazard detection and Forwarding Figure 6. 41 in the text CPE 442 hazards. 37 Introduction to Computer Architecture

Example Text Figure 6. 52 CPE 442 hazards. 38 Introduction to Computer Architecture

Example Text Figure 6. 52 CPE 442 hazards. 38 Introduction to Computer Architecture

Outline of Today’s Lecture ° Recap and Introduction (5 minutes) ° Introduction to Hazards

Outline of Today’s Lecture ° Recap and Introduction (5 minutes) ° Introduction to Hazards (15 minutes) ° Forwarding (25 minutes) ° 1 cycle Load Delay (5 minutes) ° 1 cycle Branch Delay (15 minutes) ° What makes pipelining hard ° Summary (5 minutes) CPE 442 hazards. 39 Introduction to Computer Architecture

When is pipelining hard? ° Interrupts: 5 instructions executing in 5 stage pipeline •

When is pipelining hard? ° Interrupts: 5 instructions executing in 5 stage pipeline • How to stop the pipeline? • Restrart? • Who caused the interrupt? Stage Problem interrupts occurring IF Page fault on instruction fetch; misaligned memory access; memory-protection violation ID Undefined or illegal opcode EX Arithmetic interrupt MEM Page fault on data fetch; misaligned memory access; memory-protection violation ° Load with data page fault, Add with instruction page fault? ° Solution 1: interrupt vector/instruction 2: interrupt ASAP, restart everything incomplete CPE 442 hazards. 40 Introduction to Computer Architecture

Data path with Exception Handling, Text Figure 6. 55, add a Cause register, an

Data path with Exception Handling, Text Figure 6. 55, add a Cause register, an Exception PC, and constant addr. of Exception Handeling routine CPE 442 hazards. 41 Introduction to Computer Architecture

Review: Summary of Pipelining Basics ° Speed Up Š Pipeline Depth (number of stages);

Review: Summary of Pipelining Basics ° Speed Up Š Pipeline Depth (number of stages); if ideal CPI is 1, then: ° Hazards limit performance on computers: • structural: need more HW resources • data: need forwarding, compiler scheduling • control: early evaluation & PC, delayed branch, prediction ° Increasing length of pipe increases impact of hazards since pipelining helps instruction bandwidth, not latency ° Compilers key to reducing cost of data and control hazards • load delay slots • branch delay slots ° Exceptions, Instruction Set, FP makes pipelining harder ° Longer pipelines => Branch prediction, more instruction parallelism? CPE 442 hazards. 42 Introduction to Computer Architecture