Lecture 7 An Introduction to pipelining Pipelining Its

Pipelining: Its Natural! • Laundry Example • Ann, Brian, Cathy, Dave each have one

Sequential Laundry 6 PM 7 8 9 10 11 Midnight Time 30 40 20

Pipelined Laundry Start work ASAP 6 PM 7 8 9 10 11 Time 30

Pipelining Lessons T a s k O r d e r • Pipelining doesn’t

Definitions • • • Pipe stage or pipe segment Pipeline depth Machine cycle Latency

Design Issues • Balance the length of each pipeline stage Throughput = Depth of

DLX Implementation • Integer subset of DLX – load/store word – branch – integer

Instruction Formats I opcode 0 R 5 6 opcode 0 J rs 1 5

1 st and 2 nd Instruction cycles • Instruction fetch (IF) IR NPC Mem[PC];

3 rd Instruction cycle • Execution & effective address (EX) – Memory reference •

4 th Instruction cycle • Memory access & branch completion (MEM) – Memory reference

5 th Instruction cycle • Write-back (WB) – Register - register ALU instruction •

Datapath M ux Zero? Add Cond NPC 4 M ux A PC Instr. Cache

Control Step 1 Step 2 RR ALU Imm Store Load Step 3 Step 4

Basic Pipeline Instr # i i +1 i +2 i +3 i +4 Clock

Pipeline Resources IM Reg IM ALU Reg IM DM ALU Reg IM Reg DM

Pipelined Datapath IF/ID 4 Add ID/EX M ux EX/MEM MEM/WB Zero? M ux PC

Performance limitations • Imbalance among pipe stages – limits cycle time to slowest stage

Slides: 19

Download presentation

Lecture 7 An Introduction to pipelining

Pipelining: Its Natural! • Laundry Example • Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold • Washer takes 30 minutes • Dryer takes 40 minutes • “Folder” takes 20 minutes A B C D

Sequential Laundry 6 PM 7 8 9 10 11 Midnight Time 30 40 20 T a s k O r d e r A B C D • Sequential laundry takes 6 hours for 4 loads • If they learned pipelining, how long would laundry take?

Pipelined Laundry Start work ASAP 6 PM 7 8 9 10 11 Time 30 40 T a s k O r d e r 40 40 40 20 A B C D • Pipelined laundry takes 3. 5 hours for 4 loads Midnight

Pipelining Lessons T a s k O r d e r • Pipelining doesn’t help latency of single task, it helps throughput of 6 PM 7 8 9 entire workload Time • Pipeline rate limited by slowest pipeline stage 30 40 40 20 • Multiple tasks operating simultaneously A • Potential speedup = Number pipe stages B • Unbalanced lengths of pipe stages reduces C speedup • Time to “fill” pipeline D and time to “drain” it reduces speedup

Definitions • • • Pipe stage or pipe segment Pipeline depth Machine cycle Latency Throughput

Design Issues • Balance the length of each pipeline stage Throughput = Depth of the pipeline Time per instruction on unpipelined machine • Problems – Usually, stages are not balanced – Pipelining overhead – Hazards (conflicts) • Performance (throughput – Decrease of the CPI – Decrease of cycle time CPU performance equation)

DLX Implementation • Integer subset of DLX – load/store word – branch – integer ALU – NO jumps, NO FP • Unpipelined implementation – maximum five cycles per instruction

Instruction Formats I opcode 0 R 5 6 opcode 0 J rs 1 5 6 opcode 0 rd 10 11 rs 1 immediate 15 16 rs 2 10 11 31 rd 15 16 function 20 21 31 name 5 6 31 Fixed-field decoding

1 st and 2 nd Instruction cycles • Instruction fetch (IF) IR NPC Mem[PC]; PC + 4 • Instruction decode & register fetch (ID) A Regs[IR 6. . 10]; B Regs[IR 11. . 15]; Imm ((IR 16)16 # # IR 16. . 31)

3 rd Instruction cycle • Execution & effective address (EX) – Memory reference • ALUOutput A + Imm – Register - Register ALU instruction • ALUOutput A func B – Register - Immediate ALU instruction • ALUOutput A op Imm – Branch • ALUOutput NPC + Imm; Cond (A op 0)

4 th Instruction cycle • Memory access & branch completion (MEM) – Memory reference • PC NPC • LMD Mem[ALUOutput] • Mem[ALUOutput] B (load) (store) – Branch • if (cond) PC ALUOutput; else PC NPC

5 th Instruction cycle • Write-back (WB) – Register - register ALU instruction • Regs[IR 16. . 20] ALUOutput – Register - immediate ALU instruction • Regs[IR 11. . 15] ALUOutput – Load instruction • Regs[IR 11. . 15] LMD

Datapath M ux Zero? Add Cond NPC 4 M ux A PC Instr. Cache IR Regs B Sign extend IF ALU ID M ux ALU Output LMD M ux Data Cache Imm EX MEM WB

Control Step 1 Step 2 RR ALU Imm Store Load Step 3 Step 4 Step 5

Basic Pipeline Instr # i i +1 i +2 i +3 i +4 Clock number 4 5 6 1 2 3 7 8 IF ID EX MEM WB IF ID EX MEM 9 WB

Pipeline Resources IM Reg IM ALU Reg IM DM ALU Reg IM Reg DM ALU Reg DM Reg ALU DM Reg

Pipelined Datapath IF/ID 4 Add ID/EX M ux EX/MEM MEM/WB Zero? M ux PC Instr. Cache ALU Regs M ux Sign extend Data Cache

Performance limitations • Imbalance among pipe stages – limits cycle time to slowest stage • Pipelining overhead – Pipeline register delay – Clock skew • Clock cycle > clock skew + latch overhead