The Processor Lecture 3 4 Pipelining Datapath and

  • Slides: 34
Download presentation
The Processor Lecture 3. 4: Pipelining Datapath and Control 1

The Processor Lecture 3. 4: Pipelining Datapath and Control 1

Learning Objectives q Name the five stages of the pipelined processor q Explain what

Learning Objectives q Name the five stages of the pipelined processor q Explain what each stage does q Calculate the total CPU times for single-cycle implementation and pipelined implementation q Specify how the datapath components and control signals are distributed among 5 pipeline stages q Understand that the instruction a pipeline stage works on is decided by the content of the pipeline register in front of the stage q Calculate the total length (i. e. , the number of bits) of each pipeline register q Determine the content of a pipeline register 2

Coverage q Chapters 4. 5 & 4. 6 3

Coverage q Chapters 4. 5 & 4. 6 3

q Introduction to Pipelining Design l Chapter 4. 5 4

q Introduction to Pipelining Design l Chapter 4. 5 4

Instruction Critical Paths What is the clock cycle time assuming negligible delays for muxes,

Instruction Critical Paths What is the clock cycle time assuming negligible delays for muxes, control unit, sign extension, PC access, shift left 2, wires, setup and hold times except: q l Instruction Memory and Data Memory (200 ps) l ALU and adders (200 ps) l Register File access (reads or writes) (100 ps) Instr. I Mem Reg Rd ALU Op D Mem Reg Wr Rtype load 200 100 200 200 store 200 100 200 beq 200 100 200 jump 200 Total 100 600 100 800 700 500 200 5

Single Cycle Disadvantages & Advantages q Uses the clock cycle inefficiently – the clock

Single Cycle Disadvantages & Advantages q Uses the clock cycle inefficiently – the clock cycle must be timed to accommodate the slowest instruction l especially problematic for more complex instructions like floating-point multiplication Cycle 1 Cycle 2 Clk lw l sw Waste Is slow but q Is simple and easy to understand 6

How Can We Make It Faster? q Start fetching and executing the next instruction

How Can We Make It Faster? q Start fetching and executing the next instruction before the current one has completed l l Pipelining – modern processors are pipelined for performance Remember the performance equation: CPU time = IC × CPI × CC q Under ideal conditions and with a large number of instructions, the speedup from pipelining is approximately equal to the number of pipe stages l A five stage pipeline is nearly five times faster because the CC is nearly five times faster - CPI=1 for single-cycle implementation - CPI≈1 for pipelined implementation 7

Analogy: Assembly Line v. s. Mechanic Shop q Mechanic Shop l l q The

Analogy: Assembly Line v. s. Mechanic Shop q Mechanic Shop l l q The mechanic needs to do everything It takes hours to fix just one car Car assembly line l Many workers work together - Each worker just puts one or a few components into the car l One assembly line can produce hundreds or thousands of cars per day 8

The Five Stages of Executing Instruction Cycle 1 Cycle 2 Cycle 3 Cycle 4

The Five Stages of Executing Instruction Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 lw IFetch Dec Exec Mem WB q IFetch: Instruction Fetch and Update PC q Dec: Registers Fetch and Instruction Decode q Exec: Execute R-type; calculate memory address; etc. q Mem: Read/write the data from/to the Data Memory q WB: Write the result data into the register file 9

Why Pipeline? For Performance! Time (clock cycles) Inst 5 IM Reg DM IM Reg

Why Pipeline? For Performance! Time (clock cycles) Inst 5 IM Reg DM IM Reg ALU Inst 4 DM ALU Inst 3 Reg ALU Inst 2 IM ALU O r d e r Inst 1 ALU I n s t r. Once the pipeline is full, one instruction is completed every cycle, so CPI = 1 Reg Reg DM Reg Time to fill the pipeline 10

A Pipelined MIPS Processor q Start the next instruction before the current one has

A Pipelined MIPS Processor q Start the next instruction before the current one has completed l l improves throughput - total amount of work done in a given time instruction latency (execution time, delay time, response time from the start of an instruction to its completion) is not reduced Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 lw sw R-type IFetch Dec Exec Mem WB - clock cycle (pipeline stage time) is limited by the slowest stage v some stages don’t need the whole clock cycle (e. g. , WB) 11

Single Cycle versus Pipeline Single Cycle Implementation (CC = 800 ps): Cycle 1 Cycle

Single Cycle versus Pipeline Single Cycle Implementation (CC = 800 ps): Cycle 1 Cycle 2 Clk lw sw 400 ps Pipeline Implementation (CC = 200 ps): lw IFetch sw Dec Exec Mem WB IFetch Dec Exec Mem WB Dec Exec Mem R-type IFetch Waste WB q To complete an entire instruction in the pipelined case takes 1000 ps (as compared to 800 ps for the single cycle case). Why ? q How long does each take to complete 1, 000 instrs ? 12

Ideal CPU Time of Pipelined Execution CPU time = Number of clock cycles ×

Ideal CPU Time of Pipelined Execution CPU time = Number of clock cycles × Clock period q N: total number of instructions q K: pipeline stages Number of clock cycles = N + K - 1 13

Pipelining the MIPS ISA q What makes it easy l all instructions are the

Pipelining the MIPS ISA q What makes it easy l all instructions are the same length (32 bits) - can fetch in the 1 st stage and decode in the 2 nd stage l l few instruction formats (three) memory operations occur only in loads and stores - can use the execute stage to calculate memory addresses l l q each instruction writes at most one result (i. e. , changes the machine state) and does it in the last few pipeline stages (MEM or WB) operands must be aligned in memory so a single data transfer takes only one data memory access Only cover the following 8 instructions as an example l lw, sw, add, sub, and, or, slt, beq 14

q Pipelined Datapath l Chapter 4. 6, 286 15

q Pipelined Datapath l Chapter 4. 6, 286 15

MIPS Pipelined Datapath 16

MIPS Pipelined Datapath 16

Pipeline Registers q Need registers between stages l Hold information produced in previous cycle

Pipeline Registers q Need registers between stages l Hold information produced in previous cycle 1 0 17

Single-clock-cycle diagram q Cycle-by-cycle flow of instructions through the pipelined datapath q “Single-clock-cycle” pipeline

Single-clock-cycle diagram q Cycle-by-cycle flow of instructions through the pipelined datapath q “Single-clock-cycle” pipeline diagram q l Show pipeline usage in a single cycle l Highlight resources used in each cycle We will look at “single-clock-cycle” diagrams for load & store instructions 18

IF for Load & Store 1 0 19

IF for Load & Store 1 0 19

ID for Load & Store 1 0 20

ID for Load & Store 1 0 20

EX for Load & Store 1 0 21

EX for Load & Store 1 0 21

MEM for Load 1 0 22

MEM for Load 1 0 22

MEM for Store 1 0 23

MEM for Store 1 0 23

WB for Load 1 0 Wrong register number 24

WB for Load 1 0 Wrong register number 24

Corrected Pipelined Datapath 1 0 25

Corrected Pipelined Datapath 1 0 25

MIPS Pipeline Datapath q State registers between each pipeline stage to isolate them IF:

MIPS Pipeline Datapath q State registers between each pipeline stage to isolate them IF: IFetch ID: Dec EX: Execute IF/ID ID/EX MEM: Mem. Access WB: Write. Back EX/MEM Add Shift left 2 4 PC Instruction Memory Read Address Add Read Addr 1 Data Memory Register Read Addr 2 Data 1 File Write Addr Write Data 16 Sign Extend Read Data 2 MEM/WB ALU Address Read Data Write Data 32 System Clock 26

Graphically Representing MIPS Pipeline q Reg ALU IM DM Reg Can help with answering

Graphically Representing MIPS Pipeline q Reg ALU IM DM Reg Can help with answering questions like: l l l How many cycles does it take to execute this code? What is the ALU doing during cycle 4? Is there a hazard, why does it occur, and how can it be fixed? 27

Multi-Cycle Pipeline Diagram q Showing the resource usage 28

Multi-Cycle Pipeline Diagram q Showing the resource usage 28

Multi-Cycle Pipeline Diagram q Traditional form 29

Multi-Cycle Pipeline Diagram q Traditional form 29

q Pipeline Control l Chapter 4. 6, page 300 30

q Pipeline Control l Chapter 4. 6, page 300 30

Pipelined Control 31

Pipelined Control 31

Pipelined Control Signals q Control signals derived from instructions l As in single-cycle implementation

Pipelined Control Signals q Control signals derived from instructions l As in single-cycle implementation 32

Pipeline Control q IF Stage: read Instr Memory (always asserted) and write PC (on

Pipeline Control q IF Stage: read Instr Memory (always asserted) and write PC (on System Clock) q ID Stage: no control signals to set EX Stage R lw Reg Dst 1 MEM Stage WB Stage ALU ALU Brch Mem Reg Mem Op 1 Op 0 Src Read Write to. Reg 1 0 0 0 1 0 1 1 sw X 0 0 1 0 X beq X 0 1 0 0 0 X 33

MIPS Pipeline Control Path Modifications 1 0 34

MIPS Pipeline Control Path Modifications 1 0 34