The Processor Lecture 3 4 Pipelining Datapath and

Learning Objectives q Name the five stages of the pipelined processor q Explain what

q Introduction to Pipelining Design l Chapter 4. 5 4

Instruction Critical Paths What is the clock cycle time assuming negligible delays for muxes,

Single Cycle Disadvantages & Advantages q Uses the clock cycle inefficiently – the clock

How Can We Make It Faster? q Start fetching and executing the next instruction

Analogy: Assembly Line v. s. Mechanic Shop q Mechanic Shop l l q The

The Five Stages of Executing Instruction Cycle 1 Cycle 2 Cycle 3 Cycle 4

Why Pipeline? For Performance! Time (clock cycles) Inst 5 IM Reg DM IM Reg

A Pipelined MIPS Processor q Start the next instruction before the current one has

Single Cycle versus Pipeline Single Cycle Implementation (CC = 800 ps): Cycle 1 Cycle

Ideal CPU Time of Pipelined Execution CPU time = Number of clock cycles ×

Pipelining the MIPS ISA q What makes it easy l all instructions are the

q Pipelined Datapath l Chapter 4. 6, 286 15

Pipeline Registers q Need registers between stages l Hold information produced in previous cycle

Single-clock-cycle diagram q Cycle-by-cycle flow of instructions through the pipelined datapath q “Single-clock-cycle” pipeline

WB for Load 1 0 Wrong register number 24

MIPS Pipeline Datapath q State registers between each pipeline stage to isolate them IF:

Graphically Representing MIPS Pipeline q Reg ALU IM DM Reg Can help with answering

Multi-Cycle Pipeline Diagram q Showing the resource usage 28

Multi-Cycle Pipeline Diagram q Traditional form 29

q Pipeline Control l Chapter 4. 6, page 300 30

Pipelined Control Signals q Control signals derived from instructions l As in single-cycle implementation

Pipeline Control q IF Stage: read Instr Memory (always asserted) and write PC (on

MIPS Pipeline Control Path Modifications 1 0 34

Slides: 34

Download presentation

The Processor Lecture 3. 4: Pipelining Datapath and Control 1

Learning Objectives q Name the five stages of the pipelined processor q Explain what each stage does q Calculate the total CPU times for single-cycle implementation and pipelined implementation q Specify how the datapath components and control signals are distributed among 5 pipeline stages q Understand that the instruction a pipeline stage works on is decided by the content of the pipeline register in front of the stage q Calculate the total length (i. e. , the number of bits) of each pipeline register q Determine the content of a pipeline register 2

Coverage q Chapters 4. 5 & 4. 6 3

q Introduction to Pipelining Design l Chapter 4. 5 4

Instruction Critical Paths What is the clock cycle time assuming negligible delays for muxes, control unit, sign extension, PC access, shift left 2, wires, setup and hold times except: q l Instruction Memory and Data Memory (200 ps) l ALU and adders (200 ps) l Register File access (reads or writes) (100 ps) Instr. I Mem Reg Rd ALU Op D Mem Reg Wr Rtype load 200 100 200 200 store 200 100 200 beq 200 100 200 jump 200 Total 100 600 100 800 700 500 200 5

Single Cycle Disadvantages & Advantages q Uses the clock cycle inefficiently – the clock cycle must be timed to accommodate the slowest instruction l especially problematic for more complex instructions like floating-point multiplication Cycle 1 Cycle 2 Clk lw l sw Waste Is slow but q Is simple and easy to understand 6

How Can We Make It Faster? q Start fetching and executing the next instruction before the current one has completed l l Pipelining – modern processors are pipelined for performance Remember the performance equation: CPU time = IC × CPI × CC q Under ideal conditions and with a large number of instructions, the speedup from pipelining is approximately equal to the number of pipe stages l A five stage pipeline is nearly five times faster because the CC is nearly five times faster - CPI=1 for single-cycle implementation - CPI≈1 for pipelined implementation 7

Analogy: Assembly Line v. s. Mechanic Shop q Mechanic Shop l l q The mechanic needs to do everything It takes hours to fix just one car Car assembly line l Many workers work together - Each worker just puts one or a few components into the car l One assembly line can produce hundreds or thousands of cars per day 8

The Five Stages of Executing Instruction Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 lw IFetch Dec Exec Mem WB q IFetch: Instruction Fetch and Update PC q Dec: Registers Fetch and Instruction Decode q Exec: Execute R-type; calculate memory address; etc. q Mem: Read/write the data from/to the Data Memory q WB: Write the result data into the register file 9

Why Pipeline? For Performance! Time (clock cycles) Inst 5 IM Reg DM IM Reg ALU Inst 4 DM ALU Inst 3 Reg ALU Inst 2 IM ALU O r d e r Inst 1 ALU I n s t r. Once the pipeline is full, one instruction is completed every cycle, so CPI = 1 Reg Reg DM Reg Time to fill the pipeline 10

A Pipelined MIPS Processor q Start the next instruction before the current one has completed l l improves throughput - total amount of work done in a given time instruction latency (execution time, delay time, response time from the start of an instruction to its completion) is not reduced Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 lw sw R-type IFetch Dec Exec Mem WB - clock cycle (pipeline stage time) is limited by the slowest stage v some stages don’t need the whole clock cycle (e. g. , WB) 11

Single Cycle versus Pipeline Single Cycle Implementation (CC = 800 ps): Cycle 1 Cycle 2 Clk lw sw 400 ps Pipeline Implementation (CC = 200 ps): lw IFetch sw Dec Exec Mem WB IFetch Dec Exec Mem WB Dec Exec Mem R-type IFetch Waste WB q To complete an entire instruction in the pipelined case takes 1000 ps (as compared to 800 ps for the single cycle case). Why ? q How long does each take to complete 1, 000 instrs ? 12

Ideal CPU Time of Pipelined Execution CPU time = Number of clock cycles × Clock period q N: total number of instructions q K: pipeline stages Number of clock cycles = N + K - 1 13

Pipelining the MIPS ISA q What makes it easy l all instructions are the same length (32 bits) - can fetch in the 1 st stage and decode in the 2 nd stage l l few instruction formats (three) memory operations occur only in loads and stores - can use the execute stage to calculate memory addresses l l q each instruction writes at most one result (i. e. , changes the machine state) and does it in the last few pipeline stages (MEM or WB) operands must be aligned in memory so a single data transfer takes only one data memory access Only cover the following 8 instructions as an example l lw, sw, add, sub, and, or, slt, beq 14

q Pipelined Datapath l Chapter 4. 6, 286 15

MIPS Pipelined Datapath 16

Pipeline Registers q Need registers between stages l Hold information produced in previous cycle 1 0 17

Single-clock-cycle diagram q Cycle-by-cycle flow of instructions through the pipelined datapath q “Single-clock-cycle” pipeline diagram q l Show pipeline usage in a single cycle l Highlight resources used in each cycle We will look at “single-clock-cycle” diagrams for load & store instructions 18

IF for Load & Store 1 0 19

ID for Load & Store 1 0 20

EX for Load & Store 1 0 21

MEM for Load 1 0 22

MEM for Store 1 0 23

WB for Load 1 0 Wrong register number 24

Corrected Pipelined Datapath 1 0 25

MIPS Pipeline Datapath q State registers between each pipeline stage to isolate them IF: IFetch ID: Dec EX: Execute IF/ID ID/EX MEM: Mem. Access WB: Write. Back EX/MEM Add Shift left 2 4 PC Instruction Memory Read Address Add Read Addr 1 Data Memory Register Read Addr 2 Data 1 File Write Addr Write Data 16 Sign Extend Read Data 2 MEM/WB ALU Address Read Data Write Data 32 System Clock 26

Graphically Representing MIPS Pipeline q Reg ALU IM DM Reg Can help with answering questions like: l l l How many cycles does it take to execute this code? What is the ALU doing during cycle 4? Is there a hazard, why does it occur, and how can it be fixed? 27

Multi-Cycle Pipeline Diagram q Showing the resource usage 28

Multi-Cycle Pipeline Diagram q Traditional form 29

q Pipeline Control l Chapter 4. 6, page 300 30

Pipelined Control 31

Pipelined Control Signals q Control signals derived from instructions l As in single-cycle implementation 32

Pipeline Control q IF Stage: read Instr Memory (always asserted) and write PC (on System Clock) q ID Stage: no control signals to set EX Stage R lw Reg Dst 1 MEM Stage WB Stage ALU ALU Brch Mem Reg Mem Op 1 Op 0 Src Read Write to. Reg 1 0 0 0 1 0 1 1 sw X 0 0 1 0 X beq X 0 1 0 0 0 X 33

MIPS Pipeline Control Path Modifications 1 0 34