Appendix C Pipeline implementation Pipeline hazards detection and

  • Slides: 39
Download presentation
Appendix C • Pipeline implementation • Pipeline hazards, detection and forwarding • Multiple-cycle operations

Appendix C • Pipeline implementation • Pipeline hazards, detection and forwarding • Multiple-cycle operations • MIPS R 4000 CDA 5155 Fall 2014, Peir / University of Florida 1

Limits of Pipelining • Increasing the number of pipeline stages in a given logic

Limits of Pipelining • Increasing the number of pipeline stages in a given logic block by a factor of n generally allows increasing clock speed & throughput by a factor of almost n. – Usually less than n because of overheads such as latches and balance of delay in each stage. • But, pipelining has a natural limit: – At least 1 layer of logic gates per pipeline stage! – Practical minimum is usally several gates (2 -10). – Commercial designs are approaching this point!! 2

Simple RISC Datapath 3

Simple RISC Datapath 3

Basic RISC Pipelining • Basic idea: – Each instruction spends 1 clock cycle in

Basic RISC Pipelining • Basic idea: – Each instruction spends 1 clock cycle in each of the 5 execution stages. – During 1 clock cycle, the pipeline can be processing (different stages of) 5 different instructions. 4

Adding Pipeline Registers 5

Adding Pipeline Registers 5

Pipeline Hazards • Hazards are circumstances which may lead to stalls (delays, “bubbles”) in

Pipeline Hazards • Hazards are circumstances which may lead to stalls (delays, “bubbles”) in the pipeline if not addressed. • Three major types: – Structural hazards: • Lack of HW resources to keep all instructions moving. – Data hazards • Data results of earlier instrs. not yet avail. when needed. – Control hazards • Control decisions resulting from earlier instrs. (branches) not yet made; don’t know which new instrs. to execute. 6

Structural Hazard Example Suppose you had a combined instruction+data memory with only 1 read

Structural Hazard Example Suppose you had a combined instruction+data memory with only 1 read port 7

Hazards Produce “Bubbles” 8

Hazards Produce “Bubbles” 8

Another View 9

Another View 9

Example Data Hazard 10

Example Data Hazard 10

Forwarding for Data Hazards 11

Forwarding for Data Hazards 11

Another Forwarding Example 12

Another Forwarding Example 12

Three Types of Data Hazards • Let i be an earlier instruction, j a

Three Types of Data Hazards • Let i be an earlier instruction, j a later one. • RAW (read after write) – j tries to read a value before i writes it • WAW (write after write) – i and j write to same place, but in the wrong order. – Only occurs if >1 pipeline stage can write. • WAR (write after read) – j writes a new value to a location before i has read the old one. – Only occurs if writes can happen before reads in pipeline. 13

An Unavoidable Stall - Load 14

An Unavoidable Stall - Load 14

Stalling for Load Dependent 15

Stalling for Load Dependent 15

Data Hazard Prevention • A clever compiler can often reschedule instructions (code motion) to

Data Hazard Prevention • A clever compiler can often reschedule instructions (code motion) to avoid a stall. – A simple example: • Original code: lw r 2, 0(r 4) add r 1, r 2, r 3 lw r 5, 4(r 4) • Transformed code: lw r 2, 0(r 4) lw r 5, 4(r 4) add r 1, r 2, r 3 Note: Stall happens here! No stall needed! 16

MIPS Instruction Format 17

MIPS Instruction Format 17

5 -Stage Pipeline 18

5 -Stage Pipeline 18

Operations of Pipe Stages 19

Operations of Pipe Stages 19

Data Hazard Detection 20

Data Hazard Detection 20

Hazard Detection Logic for Load NOTE, The right part of the equ. should be

Hazard Detection Logic for Load NOTE, The right part of the equ. should be IF/ID. IR (Fig. C. 25) • Example: Detecting whether an instruction that has just been fetched needs to be stalled because of dependence from a preceding load. 21

Forwarding Situations in MIPS Same as Figure C. 26 22

Forwarding Situations in MIPS Same as Figure C. 26 22

Forwarding to The ALU Provide multiple path to the input of the ALU 23

Forwarding to The ALU Provide multiple path to the input of the ALU 23

Datapath with Forwarding Hardware PCSrc ID/EX EX/MEM Control IF/ID Add 4 Instruction Memory PC

Datapath with Forwarding Hardware PCSrc ID/EX EX/MEM Control IF/ID Add 4 Instruction Memory PC Shift left 2 Register File Read Address Add Read Addr 1 Read Data 1 Read Addr 2 16 Sign Extend MEM/WB Data Memory ALU Write Addr Read Write Data Branch Data 2 Address Read Data Write Data ALU cntrl 32 EX/MEM. Register. Rd ID/EX. Register. Rt ID/EX. Register. Rs Forward Unit MEM/WB. Register. Rd

Adding the Hazard Hardware PCSrc Hazard Unit EX/MEM 0 Control 0 Shift left 2

Adding the Hazard Hardware PCSrc Hazard Unit EX/MEM 0 Control 0 Shift left 2 4 Instruction Memory PC ID/EX. Mem. Read 1 IF/ID Add ID/EX Read Address Add Read Addr 1 Register Read 1 Read Addr Data 2 File Write Addr Read Data 2 Write Data 16 Sign Extend 32 ID/EX. Register. Rt Branch Data Memory ALU Address Read Data Write Data ALU cntrl Forward Unit MEM/WB

Branch Hazard • Suppose the new PC value is not computed until the MEM

Branch Hazard • Suppose the new PC value is not computed until the MEM stage. • Then we must stall 3 clocks after every branch! 28

Early Branch Resolution Branch resolution at ID stage 29

Early Branch Resolution Branch resolution at ID stage 29

Predict-Not-Taken (Branch resolves in ID) Same as Fig. C. 12 30

Predict-Not-Taken (Branch resolves in ID) Same as Fig. C. 12 30

Delayed Branches Machine code sequence: Branch instruction Delay slot instruction(s) Post-branch instructions Same as

Delayed Branches Machine code sequence: Branch instruction Delay slot instruction(s) Post-branch instructions Same as Fig. C. 13 Branch is taken (if taken) at this point 31

Filling the Branch-Delay Slot For (b), (c) must no side-effect! Note, dynamic branch prediction

Filling the Branch-Delay Slot For (b), (c) must no side-effect! Note, dynamic branch prediction will be covered in Chap. 3 32

Multi-Cycle Execution Figure C. 33 The MIPS pipeline with three additional unpipelined, floating-point, functional

Multi-Cycle Execution Figure C. 33 The MIPS pipeline with three additional unpipelined, floating-point, functional units. 33

Latency & Initiation Interval • Latency: – Extra delay cycles before result is available.

Latency & Initiation Interval • Latency: – Extra delay cycles before result is available. • Initiation interval: – Minimum number of cycles before a new input can be given to that functional unit. 34

Pipelined Multiple-FP Operations Figure C. 35 A pipeline that supports multiple outstanding FP operations.

Pipelined Multiple-FP Operations Figure C. 35 A pipeline that supports multiple outstanding FP operations. 35

Pipelining FP Instructions • Notice instructions may complete out-of-order: – MULTD IF ID M

Pipelining FP Instructions • Notice instructions may complete out-of-order: – MULTD IF ID M 1 M 2 M 3 M 4 M 5 M 6 M 7 ME WB – ADDD – LD – SD IF ID A 1 A 2 A 3 A 4 ME WB IF ID EX ME WB • Raises the possibility of WAW hazards, and structural hazards in MEM & WB stages. • Structural hazards may occur especially often with non-pipelined DIV unit. • Out-of-order completion impacts exception handling. 36

Issues in Multi-Cycle Operations • Stall for RAW is longer and more frequent (Fig.

Issues in Multi-Cycle Operations • Stall for RAW is longer and more frequent (Fig. C. 37) • WAW is possible; WAR is not (why? ) • Structural Hazard possible for non-pipelined unit • Multiple WBs are likely (Fig. C. 38) • Handling hazards – At Issue (ID) stage: • Check structural hazards: functional unit, WB port • Check RAW hazards: Issue with forwarding • Check WAW hazards: Not issue to make sure write in order – Detect and stall instruction before MEM and WB stages • More uniform handling given in Chapter 3. 37

Maintaining Precise Exception • Settle for imprecise exception • Buffer and complete in order

Maintaining Precise Exception • Settle for imprecise exception • Buffer and complete in order – Require large buffers and comparators – History file, future file approaches • Software trap handling when exception occurs • Hybrid scheme: Issue when certain no exception for early instruction – All instructions before can be completed – No instructions after can be completed 38

Real MIPS R 4000 Pipeline • • • IF, IS - Instruction cache fetch,

Real MIPS R 4000 Pipeline • • • IF, IS - Instruction cache fetch, First & Second halves. RF - Inst. decode, Register Fetch, hazard check… EX - Execution (EA calc, ALU op, target calc…) DF, DS - Data cache access, First & Second halves. TC - Tag Check, did cache access hit? Note, use data before resolving hit/miss. • WB - Write-Back for loads & register-register ops. Read through C. 43 – C. 51 39

2 -Cycle Load Delay 40

2 -Cycle Load Delay 40

Branch Delay 41

Branch Delay 41