The Processor Lecture 3 4 Pipelining Datapath and
- Slides: 34
The Processor Lecture 3. 4: Pipelining Datapath and Control 1
Learning Objectives q Name the five stages of the pipelined processor q Explain what each stage does q Calculate the total CPU times for single-cycle implementation and pipelined implementation q Specify how the datapath components and control signals are distributed among 5 pipeline stages q Understand that the instruction a pipeline stage works on is decided by the content of the pipeline register in front of the stage q Calculate the total length (i. e. , the number of bits) of each pipeline register q Determine the content of a pipeline register 2
Coverage q Chapters 4. 5 & 4. 6 3
q Introduction to Pipelining Design l Chapter 4. 5 4
Instruction Critical Paths What is the clock cycle time assuming negligible delays for muxes, control unit, sign extension, PC access, shift left 2, wires, setup and hold times except: q l Instruction Memory and Data Memory (200 ps) l ALU and adders (200 ps) l Register File access (reads or writes) (100 ps) Instr. I Mem Reg Rd ALU Op D Mem Reg Wr Rtype load 200 100 200 200 store 200 100 200 beq 200 100 200 jump 200 Total 100 600 100 800 700 500 200 5
Single Cycle Disadvantages & Advantages q Uses the clock cycle inefficiently – the clock cycle must be timed to accommodate the slowest instruction l especially problematic for more complex instructions like floating-point multiplication Cycle 1 Cycle 2 Clk lw l sw Waste Is slow but q Is simple and easy to understand 6
How Can We Make It Faster? q Start fetching and executing the next instruction before the current one has completed l l Pipelining – modern processors are pipelined for performance Remember the performance equation: CPU time = IC × CPI × CC q Under ideal conditions and with a large number of instructions, the speedup from pipelining is approximately equal to the number of pipe stages l A five stage pipeline is nearly five times faster because the CC is nearly five times faster - CPI=1 for single-cycle implementation - CPI≈1 for pipelined implementation 7
Analogy: Assembly Line v. s. Mechanic Shop q Mechanic Shop l l q The mechanic needs to do everything It takes hours to fix just one car Car assembly line l Many workers work together - Each worker just puts one or a few components into the car l One assembly line can produce hundreds or thousands of cars per day 8
The Five Stages of Executing Instruction Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 lw IFetch Dec Exec Mem WB q IFetch: Instruction Fetch and Update PC q Dec: Registers Fetch and Instruction Decode q Exec: Execute R-type; calculate memory address; etc. q Mem: Read/write the data from/to the Data Memory q WB: Write the result data into the register file 9
Why Pipeline? For Performance! Time (clock cycles) Inst 5 IM Reg DM IM Reg ALU Inst 4 DM ALU Inst 3 Reg ALU Inst 2 IM ALU O r d e r Inst 1 ALU I n s t r. Once the pipeline is full, one instruction is completed every cycle, so CPI = 1 Reg Reg DM Reg Time to fill the pipeline 10
A Pipelined MIPS Processor q Start the next instruction before the current one has completed l l improves throughput - total amount of work done in a given time instruction latency (execution time, delay time, response time from the start of an instruction to its completion) is not reduced Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 lw sw R-type IFetch Dec Exec Mem WB - clock cycle (pipeline stage time) is limited by the slowest stage v some stages don’t need the whole clock cycle (e. g. , WB) 11
Single Cycle versus Pipeline Single Cycle Implementation (CC = 800 ps): Cycle 1 Cycle 2 Clk lw sw 400 ps Pipeline Implementation (CC = 200 ps): lw IFetch sw Dec Exec Mem WB IFetch Dec Exec Mem WB Dec Exec Mem R-type IFetch Waste WB q To complete an entire instruction in the pipelined case takes 1000 ps (as compared to 800 ps for the single cycle case). Why ? q How long does each take to complete 1, 000 instrs ? 12
Ideal CPU Time of Pipelined Execution CPU time = Number of clock cycles × Clock period q N: total number of instructions q K: pipeline stages Number of clock cycles = N + K - 1 13
Pipelining the MIPS ISA q What makes it easy l all instructions are the same length (32 bits) - can fetch in the 1 st stage and decode in the 2 nd stage l l few instruction formats (three) memory operations occur only in loads and stores - can use the execute stage to calculate memory addresses l l q each instruction writes at most one result (i. e. , changes the machine state) and does it in the last few pipeline stages (MEM or WB) operands must be aligned in memory so a single data transfer takes only one data memory access Only cover the following 8 instructions as an example l lw, sw, add, sub, and, or, slt, beq 14
q Pipelined Datapath l Chapter 4. 6, 286 15
MIPS Pipelined Datapath 16
Pipeline Registers q Need registers between stages l Hold information produced in previous cycle 1 0 17
Single-clock-cycle diagram q Cycle-by-cycle flow of instructions through the pipelined datapath q “Single-clock-cycle” pipeline diagram q l Show pipeline usage in a single cycle l Highlight resources used in each cycle We will look at “single-clock-cycle” diagrams for load & store instructions 18
IF for Load & Store 1 0 19
ID for Load & Store 1 0 20
EX for Load & Store 1 0 21
MEM for Load 1 0 22
MEM for Store 1 0 23
WB for Load 1 0 Wrong register number 24
Corrected Pipelined Datapath 1 0 25
MIPS Pipeline Datapath q State registers between each pipeline stage to isolate them IF: IFetch ID: Dec EX: Execute IF/ID ID/EX MEM: Mem. Access WB: Write. Back EX/MEM Add Shift left 2 4 PC Instruction Memory Read Address Add Read Addr 1 Data Memory Register Read Addr 2 Data 1 File Write Addr Write Data 16 Sign Extend Read Data 2 MEM/WB ALU Address Read Data Write Data 32 System Clock 26
Graphically Representing MIPS Pipeline q Reg ALU IM DM Reg Can help with answering questions like: l l l How many cycles does it take to execute this code? What is the ALU doing during cycle 4? Is there a hazard, why does it occur, and how can it be fixed? 27
Multi-Cycle Pipeline Diagram q Showing the resource usage 28
Multi-Cycle Pipeline Diagram q Traditional form 29
q Pipeline Control l Chapter 4. 6, page 300 30
Pipelined Control 31
Pipelined Control Signals q Control signals derived from instructions l As in single-cycle implementation 32
Pipeline Control q IF Stage: read Instr Memory (always asserted) and write PC (on System Clock) q ID Stage: no control signals to set EX Stage R lw Reg Dst 1 MEM Stage WB Stage ALU ALU Brch Mem Reg Mem Op 1 Op 0 Src Read Write to. Reg 1 0 0 0 1 0 1 1 sw X 0 0 1 0 X beq X 0 1 0 0 0 X 33
MIPS Pipeline Control Path Modifications 1 0 34
- 01:640:244 lecture notes - lecture 15: plat, idah, farad
- Pipelining and superscalar techniques
- Pipelining and superscalar techniques
- Reservation table in pipeline
- Pipelining
- Instruction pipelining in computer architecture
- Pipelining protocol
- Vector pipelining
- Gambar komputasi pipeline
- How to overcome data hazards in pipelining
- Major hurdles of pipelining
- Principle of pipelining
- Pipelining in verilog
- Collision prevention in computer architecture
- Pipelining in 8086 microprocessor
- Adam smith pipelining
- Pipelining
- Pipelining
- Pipelining
- Pipeline adalah
- Pipelining
- Pipelining
- Pipelining adalah
- "us pipelining"
- Pipeline yang berguna untuk operasi vektor adalah:
- "us pipelining"
- "us pipelining"
- "us pipelining"
- Difference between single cycle and multicycle datapath
- Datapath pipeline
- Building a datapath
- Datapath
- Datapath active sqx
- Sqx capture cards
- Mips datapath