CENG 450 Computer Systems and Architecture Lecture 5

Overview of Today’s Lecture: MIPS et al z Pipelining z. MIPS ISA z. More

What is pipelining? z Implementation technique in which multiple instructions are overlapped in execution

Pipelining Example: Laundry z. You have 4 loads of cloths to wash: z. Steps

Pipelining Example: Laundry z There are 2 approaches to do this job: y. Sequential

Pipelining Example: Laundry • Sequential Laundry – Needs 8 hours for 4 loads Time

Pipelining Example: Laundry • Pipelined Laundry: – Start work ASAP – Needs only 3.

Pipelining Example: Laundry z. Pipelined Laundry Observations: y At some point, all stages of

Pipelining Example: Laundry z. Pipelined Laundry Observations: y. Speedup due to pipelining depends on

CPU Pipelining z Review: 5 stages of a MIPS instruction – – – z

CPU Pipelining z Example: Resources for Load Instruction y Fetch instruction from instruction memory

CPU Pipelining • Note that accessing source & destination registers is performed in two

CPU Pipelining: Example • Single-Cycle, non-pipelined execution • Total time for 3 instructions: 24

CPU Pipelining: Example • Single-cycle, pipelined execution – Improve performance by increasing instruction throughput

CPU Pipelining: Example z Assumptions: y Only consider the following instructions: lw, sw, add,

CPU Pipelining z Review: Datapath resources Instruction [25– 0] Shift 26 left 2 28

CPU Pipelining Example: z Theoretically: y Speedup should be equal to number of stages

Pipelining MIPS Instruction Set z MIPS was designed with pipelining in mind => Pipelining

MIPS Addressing Modes/Inst. Formats • All instructions 32 bits wide Register (direct) op rs

CPU Pipelining: MIPS (Fetch & Decode) Instruction [25– 0] Shift 26 left 2 28

Pipelining MIPS Instruction Set 2. MIPS has limited instruction format y Source register in

CPU Pipelining: MIPS z Fast Decode Instruction [25– 0] Shift 26 left 2 28

Pipelining MIPS Instruction Set 3. Memory operands appear only in lw & sw instructions

CPU Pipelining: MIPS z Fast Execution Instruction [25– 0] Shift 26 left 2 28

Pipelining MIPS Instruction Set 4. Operands must be aligned in memory y Transfer of

Slides: 26

Download presentation

CENG 450 Computer Systems and Architecture Lecture 5 Amirali Baniasadi amirali@ece. uvic. ca 1

Overview of Today’s Lecture: MIPS et al z Pipelining z. MIPS ISA z. More MIPS

What is pipelining? z Implementation technique in which multiple instructions are overlapped in execution z Real-life pipelining examples? y Laundry y Factory production lines y Traffic? ? 3

Pipelining Example: Laundry z. You have 4 loads of cloths to wash: z. Steps (stages) required: y. Wash y. Dry y. Fold y. Store clothes into drawers A B C D z. Each stage needs 30 minutes z. We can’t start the next step until the previous step is finished 4

Pipelining Example: Laundry z There are 2 approaches to do this job: y. Sequential (non-pipelined): x Wait until the first load is put away in order to start the next load y. Pipelined (ASAP): x. As soon as the washer is empty, start putting the next load, while the first load is put into dryer 5

Pipelining Example: Laundry • Sequential Laundry – Needs 8 hours for 4 loads Time Task order 6 PM 7 8 9 10 11 12 1 2 AM A B C D 6

Pipelining Example: Laundry • Pipelined Laundry: – Start work ASAP – Needs only 3. 5 hours for 4 loads! Time 6 PM 7 8 9 10 11 12 1 2 AM Task order A B C D 7

Pipelining Example: Laundry z. Pipelined Laundry Observations: y At some point, all stages of washing will be operating concurrently y Pipelining doesn’t reduce number of stages x doesn’t help latency of single task x helps throughput of entire workload y As long as we have separate resources, we can pipeline the tasks y Multiple tasks operating simultaneously use different resources 8

Pipelining Example: Laundry z. Pipelined Laundry Observations: y. Speedup due to pipelining depends on the number of stages in the pipeline y. Pipeline rate limited by slowest pipeline stage x If dryer needs 45 min , time for all stages has to be 45 min to accommodate it x Unbalanced lengths of pipe stages reduces speedup y. Time to “fill” pipeline and time to “drain” it reduces speedup y. If one load depends on another, we will have to wait (Delay/Stall for Dependencies) 9

CPU Pipelining z Review: 5 stages of a MIPS instruction – – – z Load Fetch instruction from instruction memory Read registers while decoding instruction Execute operation or calculate address, depending on the instruction type Access an operand from data memory Write result into a register We can reduce the cycles to fit the stages. Cycle 1 Cycle 2 Ifetch Reg/Dec Cycle 3 Exec Cycle 4 Mem Cycle 5 Wr 10

CPU Pipelining z Example: Resources for Load Instruction y Fetch instruction from instruction memory (Ifetch) – Instruction memory (IM) y Read registers while decoding instruction(Reg/Dec) – Register file & decoder (Reg) y Execute operation or calculate address, depending on the instruction type(Exec) – ALU y Access an operand from data memory (Mem) – Data memory (DM) y Write result into a register (Wr) – Register file (Reg) 11

CPU Pipelining • Note that accessing source & destination registers is performed in two different parts of the cycle • We need to decide upon which part of the cycle should reading and writing to the register file take place. Reading Im Reg Im Inst 3 Dm Reg Dm Im Reg ALU Inst 2 Reg ALU Inst 1 Im Fill time Reg Reg Dm Reg ALU Im Inst 4 Writing ALU O r d e r Inst 0 ALU I n s t r. Time (clock cycles) Dm Sink time Reg 12

CPU Pipelining: Example • Single-Cycle, non-pipelined execution • Total time for 3 instructions: 24 ns P ro g ra m e x e c u tio n o rd e r Time (in instructions) lw $1, 100($0) lw $2, 200($0) 2 Instruction Reg fetch 4 ALU 8 ns 6 8 10 12 16 18 Data Reg access Instruction Reg fetch ALU 8 ns lw $3, 300($0) 14 Data access Reg Instruction fetch . . . 8 ns 13

CPU Pipelining: Example • Single-cycle, pipelined execution – Improve performance by increasing instruction throughput – Total time for 3 instructions = 14 ns – Each instruction adds 2 ns to total execution time – Stage time limited by slowest resource (2 ns) – Assumptions: • Write to register occurs in 1 st half of clock • Read from register occurs in 2 nd half of clock P ro g ra m e x e c u tio n T ime o rd e r ( in in s tru c tio n s) 2 lw $1, 100($0) Instruction fetch lw $2, 200($0) 2 ns lw $3, 300($0) 4 R eg Instruction fetch 2 ns 6 ALU R eg Instruction fetch 2 ns 8 D a ta access ALU R eg 2 ns 10 14 12 R eg D a ta access R eg ALU D a ta access 2 ns R eg 2 ns 14

CPU Pipelining: Example z Assumptions: y Only consider the following instructions: lw, sw, add, sub, and, or, slt, beq y Operation times for instruction classes are: x. Memory access 2 ns x. ALU operation 2 ns x. Register file read or write 1 ns y Use a single- cycle (not multi-cycle) model y Clock cycle must accommodate the slowest instruction (2 ns) y Both pipelined & non-pipelined approaches use the same HW components 15

CPU Pipelining z Review: Datapath resources Instruction [25– 0] Shift 26 left 2 28 Jump address [31– 0] 0 1 M u x ALU Add result 1 0 Zero ALU result Address Read data PC+4 [31– 28] Add 4 Instruction [31– 26] PC Instruction [25– 21] Read address Instruction memory Read register 1 Instruction [20– 16] Instruction [31– 0] Instruction [15– 11] Instruction [15– 0] Shift left 2 Reg. Dst Jump Branch Mem. Read Control Memto. Reg ALUOp Mem. Write ALUSrc Reg. Write 0 M u x 1 Read data 1 Read register 2 Registers Read Write data 2 register 0 M u x 1 Write data 16 Sign extend Write data Data memory 1 M u x 0 32 ALU control Instruction [5– 0] 16

CPU Pipelining Example: z Theoretically: y Speedup should be equal to number of stages ( n tasks, k stages, p latency) y Speedup = n*p =~ k (for large n) y p/k*(n-1) + p z Practically: y Stages are imperfectly balanced y Pipelining needs overhead y Speedup less than number of stages z If we have 3 consecutive instructions y Non-pipelined needs 8 x 3 = 24 ns y Pipelined needs 14 ns => Speedup = 24 / 14 = 1. 7 z If we have 1003 consecutive instructions y Add more time for 1000 instruction (i. e. 1003 instruction)on the previous example x. Non-pipelined total time= 1000 x 8 + 24 = 8024 ns x. Pipelined total time = 1000 x 2 + 14 = 2014 ns => Speedup ~ 3. 98~ (8 ns / 2 ns] ~ near perfect speedup => Performance increases for larger number of instructions (throughput) 17

Pipelining MIPS Instruction Set z MIPS was designed with pipelining in mind => Pipelining is easy in MIPS: – All instruction are the same length – Limited instruction format – Memory operands appear only in lw & sw instructions – Operands must be aligned in memory z 1. All MIPS instruction are the same length y Fetch instruction in 1 st pipeline stage y Decode instructions in 2 nd stage y If instruction length varies (e. g. 80 x 86), pipelining will be more challenging 18

MIPS Addressing Modes/Inst. Formats • All instructions 32 bits wide Register (direct) op rs rt rd register Immediate Base+index op rs rt immed register PC-relative op rs PC rt Memory + immed Memory + 19

CPU Pipelining: MIPS (Fetch & Decode) Instruction [25– 0] Shift 26 left 2 28 Jump address [31– 0] 0 1 M u x ALU Add result 1 0 Zero ALU result Address Read data PC+4 [31– 28] Add 4 Instruction [31– 26] PC Instruction [25– 21] Read address Read register 1 Instruction [20– 16] Instruction [31– 0] Instruction memory Instruction [15– 11] Instruction [15– 0] Instruction[31 -26] = opcode Shift left 2 Reg. Dst Jump Branch Mem. Read Control Memto. Reg ALUOp Mem. Write ALUSrc Reg. Write 0 M u x 1 Read data 1 Read register 2 Registers Read Write data 2 register 0 M u x 1 Write data 16 Sign extend Write data Data memory 1 M u x 0 32 ALU control Instruction [5– 0] 20

Pipelining MIPS Instruction Set 2. MIPS has limited instruction format y Source register in the same place for each instruction (symmetric) y 2 nd stage can begin reading at the same time as decoding y If instruction format wasn’t symmetric, stage 2 should be split into 2 distinct stages => Total stages = 6 (instead of 5) 21

CPU Pipelining: MIPS z Fast Decode Instruction [25– 0] Shift 26 left 2 28 Jump address [31– 0] 0 1 M u x ALU Add result 1 0 Zero ALU result Address Read data PC+4 [31– 28] Add 4 Instruction [31– 26] PC Instruction [25– 21] Read address Read register 1 Instruction [20– 16] Instruction [31– 0] Instruction memory Instruction [15– 11] 0 M u x 1 Read data 1 Read register 2 Registers Read Write data 2 register Instruction[20 -16] = rt 0 M u x 1 Write data 16 Instruction [15– 0] Instruction[25 -21] = rs Shift left 2 Reg. Dst Jump Branch Mem. Read Control Memto. Reg ALUOp Mem. Write ALUSrc Reg. Write Sign extend Write data Data memory 1 M u x 0 32 ALU control Instruction [5– 0] Instruction[15 -0] = immediate 22

Pipelining MIPS Instruction Set 3. Memory operands appear only in lw & sw instructions y We can use the execute stage to calculate memory address y Access memory in the next stage y If we needed to operate on operands in memory (e. g. 80 x 86), stages 3 & 4 would expand to x. Address calculation x. Memory access x. Execute 23

CPU Pipelining: MIPS z Fast Execution Instruction [25– 0] Shift 26 left 2 28 Jump address [31– 0] 0 1 M u x ALU Add result 1 0 Zero ALU result Address Read data PC+4 [31– 28] Add 4 Instruction [31– 26] PC Instruction [25– 21] Read address Instruction memory Read register 1 Instruction [20– 16] Instruction [31– 0] Instruction [15– 11] Instruction [15– 0] Shift left 2 Reg. Dst Jump Branch Mem. Read Control Memto. Reg ALUOp Mem. Write ALUSrc Reg. Write 0 M u x 1 Read data 1 Read register 2 Registers Read Write data 2 register 0 M u x 1 Write data 16 Sign extend Write data Data memory 1 M u x 0 32 ALU control Instruction [5– 0] 24

Pipelining MIPS Instruction Set 4. Operands must be aligned in memory y Transfer of more than one data operand can be done in a single stage with no conflicts y Need not worry about single data transfer instruction requiring 2 data memory accesses y Requested data can be transferred between the CPU & memory in a single pipeline stage 25