1011 Lecture Topics Execution cycle Introduction to pipelining





















- Slides: 21
10/11: Lecture Topics • Execution cycle • Introduction to pipelining • Data hazards
Office Hours • Changing Th 8: 30 -9: 30 to Mo 2 -3 • New office hours are – Mo 10: 30 -11: 30 – Mo 2: 00 -3: 00 – Tu 2: 30 -3: 30
Execution Cycle IF ID EX MEM WB • Five steps to executing an instruction: 1. Fetch • Get the next instruction to execute from memory onto the chip 2. Decode • Figure out what the instruction says to do • Get values from registers 3. Execute • Do what the instruction says; for example, – On a memory reference, add up base and offset – On an arithmetic instruction, do the math
More Execution Cycle IF ID EX MEM WB 4. Memory Access • If it’s a load or store, access memory • If it’s a branch, replace the PC with the destination address • Otherwise do nothing 5. Write back • Place the result of the operation in the appropriate register
add $s 0, $s 1, $s 2 • IF get instruction at PC from memory – it’s 000000 10001 10010 100000 100000 • ID determine what 000000 … 100000 is – 000000 … 100000 is add – get contents of $s 1 and $s 2 ($s 1=7, $s 2=12) • EX add 7 and 12 = 19 • MEM do nothing • WB store 19 in $s 0
lw $t 2, 16($s 0) • IF get instruction at PC from memory – it’s 010111 10000 01000 00000010000 • ID determine what 010111 is – 010111 is lw – get contents of $s 0 and $t 2 (we don’t know that we don’t care about $t 2) $s 0=0 x 200 D 1 C 00, $t 2=77763 • EX add 16 to 0 x 200 D 1 C 00 = 0 x 200 D 1 C 10 • MEM load the word stored at 0 x 200 D 1 C 10 • WB store loaded value in $t 2
Latency & Throughput 1 2 3 4 5 IF ID EX MEM WB 6 IF 7 ID 8 EX 9 MEM 10 WB • Latency—the time it takes for an individual instruction to execute – What’s the latency for this implementation? • Throughput—the number of instructions that execute per minute – What’s the throughput of this implementation? inst 1 inst 2
A case for pipelining • The functional units are being underutilized – the instruction fetcher is used once every five clock cycles – why not have it fetch a new instruction every clock cycle? • Pipelining overlaps the stages of execution so every stage has something to due each cycle • A pipeline with N stages could speedup by N times, but – each stage must take the same amount of time – each stage must always have work to do • Also, latency for each instruction may go up, but why don’t we care?
Unpipelined Assembly Line • What is the latency of this assembly line, i. e. for how many cycles is the plane on the assembly line? • What is the throughput of this assembly line, i. e. how many planes are manufactured each cycle?
Pipelined Assembly Line • The assembly line has 5 stages • If a plane isn’t ready to go to the next stage then the pipeline stalls – that stage and all stages before it freeze • The gap in the assembly line is known as a bubble
Pipelined Analysis • What is the latency? • What is the throughput? • What is the speed up? (Speed up = Old Time / New Time)
Pipeline Example 1 2 3 4 5 6 7 8 9 10 add $s 0, $s 1, $s 2 sub $s 3, $s 2, $s 3 lw $s 2, 20($t 0) sw $s 0, 16($s 1) and $t 1, $t 2, $t 3 IF ID EX MEM WB
Pipelined Xput and Latency 1 2 3 4 5 6 7 8 IF ID EX MEM WB IF ID EX MEM 9 inst 1 inst 2 inst 3 inst 4 WB inst 5 • What’s the throughput of this implementation? • What’s the latency of this implementation?
Data Hazards • What happens in the following code? add $s 0, $s 1, $s 2 add $s 4, $s 3, $s 0 IF ID EX MEM IF ID $s 0 is read here WB EX MEM WB $s 0 is written here • This is called as a data dependency • When it causes a pipeline stall it is called a data hazard
Solution: Stall • Stall the pipeline until the result is available add s 0, s 1, s 2 add s 4, s 3, s 0 IF ID EX MEM IF stall WB ID EX MEM • Stall the pipeline until the result is available WB
Solution: Read & Write in same Cycle • Write the register in the first part of the clock cycle • Read it in the second part of the clock cycle write $s 0 add s 0, s 1, s 2 add s 4, s 3, s 0 IF ID EX MEM WB IF stall ID read $s 0 EX MEM • A stall of two cycles is still required WB
Solution: Forwarding • The value of $s 0 is known after cycle 3 (after the first instruction’s EX stage) • The value of $s 0 isn’t needed until cycle 4 (before the second instruction’s EX stage) • If we forward the result there isn’t a stall add s 0, s 1, s 2 add s 4, s 3, s 0 IF ID EX MEM IF ID WB EX MEM WB
Another data hazard • What if the first instruction is lw? lw s 0, 0(s 2) add s 4, s 3, s 0 IF ID EX MEM IF ID WB EX MEM WB • s 0 isn’t known until after the MEM stage • We can’t forward back into the past • Either stall or reorder instructions
Solutions to the lw hazard • We can stall for one cycle, but we hate to stall lw s 0, 0(s 2) IF add s 4, s 3, s 0 ID EX MEM WB IF ID EX MEM WB • Try to execute an unrelated instruction between the two instructions lw s 0, 0(s 2) sub t 4, t 2, t 3 add s 4, s 3, s 0 sub t 4, t 2, t 3 IF ID EX MEM WB IF ID EX MEM IF ID WB EX MEM WB
Reordering Instructions • Reordering instructions is a common technique for avoiding pipeline stalls • Sometimes the compiler does the reordering statically • Almost all modern processors do this reordering dynamically – they can see several instructions and they execute anyone that has no dependency – this is known as out-of-order execution and is very complicated to implement
Control Hazards • Branch instructions cause control hazards because we don’t know which instruction to execute next bne $s 0, $s 1, next add $s 4, $s 3, $s 0. . . IF ID EX MEM WB IF ID EX MEM do we fetch add or sub? we don’t know until here next: sub $s 4, $s 3, $s 0 WB