Computer Organization CS 224 Fall 2012 Lesson 28

  • Slides: 13
Download presentation
Computer Organization CS 224 Fall 2012 Lesson 28

Computer Organization CS 224 Fall 2012 Lesson 28

q Pipelined laundry: overlapping execution l Parallelism improves performance n Four loads: n n

q Pipelined laundry: overlapping execution l Parallelism improves performance n Four loads: n n Speedup = 8/3. 5 = 2. 3 Non-stop: n Speedup = 2 n/0. 5 n + 1. 5 ≈ 4 = number of stages § 4. 5 An Overview of Pipelining Analogy

MIPS Pipeline q Five stages, one step per stage 3. IF: Instruction fetch from

MIPS Pipeline q Five stages, one step per stage 3. IF: Instruction fetch from memory ID: Instruction decode & register read EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register 1. 2.

Pipeline Performance q Assume time for stages is l 100 ps for register read

Pipeline Performance q Assume time for stages is l 100 ps for register read or write l 200 ps for other stages q Compare datapath pipelined datapath with single-cycle Instr fetch Register read ALU op Memory access Register write Total time lw 200 ps 100 ps 800 ps sw 200 ps 100 ps 200 ps R-format 200 ps 100 ps 200 ps beq 200 ps 100 ps 200 ps 700 ps 100 ps 600 ps 500 ps

Pipeline Performance Single-cycle (Tc= 800 ps) Pipelined (Tc= 200 ps)

Pipeline Performance Single-cycle (Tc= 800 ps) Pipelined (Tc= 200 ps)

Pipeline Speedup q If all stages are balanced l i. e. , all take

Pipeline Speedup q If all stages are balanced l i. e. , all take the same time l Time between instructionspipelined = Time between instructionsnonpipelined Number of stages q If not balanced, speedup is less q Speedup due to increased throughput l Latency (time for each instruction) does not decrease

Pipelining and ISA Design q MIPS ISA designed for pipelining l All instructions are

Pipelining and ISA Design q MIPS ISA designed for pipelining l All instructions are 32 -bits - Easier to fetch and decode in one cycle - c. f. x 86: 1 - to 17 -byte instructions l Few instruction formats, very regular - Can decode and read registers in one step l Load/store addressing - Can calculate address in 3 rd stage, access memory in 4 th stage l Alignment of memory operands - Memory access takes only one cycle

Hazards q Situations that prevent starting the next instruction in the next cycle q

Hazards q Situations that prevent starting the next instruction in the next cycle q Structure hazards l q Data hazard l q A required resource is busy Need to wait for previous instruction to complete its data read/write Control hazard l Deciding on control action depends on previous instruction

Structure Hazards q Conflict for use of a resource q In MIPS pipeline with

Structure Hazards q Conflict for use of a resource q In MIPS pipeline with a single memory l l Load/store requires data access Instruction fetch would have to stall for that cycle - Would cause a pipeline “bubble” q Hence, pipelined datapaths require separate instruction/data memories l Or separate instruction/data caches

Data Hazards q An instruction depends on completion of data access by a previous

Data Hazards q An instruction depends on completion of data access by a previous instruction l add sub $s 0, $t 1 $t 2, $s 0, $t 3

Forwarding (aka Bypassing) q Use result when it is computed l l Don’t wait

Forwarding (aka Bypassing) q Use result when it is computed l l Don’t wait for it to be stored in a register Requires extra connections in the datapath

Load-Use Data Hazard q Can’t always avoid stalls by forwarding l l If value

Load-Use Data Hazard q Can’t always avoid stalls by forwarding l l If value not computed when needed Can’t forward backward in time!

Code Scheduling to Avoid Stalls q Reorder code to avoid use of load result

Code Scheduling to Avoid Stalls q Reorder code to avoid use of load result in the next instruction q C code for A = B + E; C = B + F; stall lw lw add sw $t 1, $t 2, $t 3, $t 4, $t 5, 0($t 0) 4($t 0) $t 1, $t 2 12($t 0) 8($t 0) $t 1, $t 4 16($t 0) 13 cycles lw lw lw add sw $t 1, $t 2, $t 4, $t 3, $t 5, 0($t 0) 4($t 0) 8($t 0) $t 1, $t 2 12($t 0) $t 1, $t 4 16($t 0) 11 cycles