Introduction to Computer Organization and Architecture Lecture 12

Outline p Pipelining n n n Basics Throughput Execution Time Pipeline Registers Program Execution

ILP: Instruction Level Parallelism Single-cycle and multi-cycle datapaths execute one instruction at a time.

Automobile Team Assembly 1 hour 1 car assembled every four hours 6 cars per

Automobile Assembly Line Task 1 1 hour Mecahnical Task 2 1 hour Task 3

Throughput: Team Assembly Red car started Red car completed Mechanical Electrical Painting Testing Blue

Throughput: Assembly Line Car 1 Mechanical Electrical Car 2 Painting Mechanical Electrical Car 3

Some Features of Assembly Line Electrical parts delivered (JIT) Task 1 1 hour Mechanical

Pipelining in a Computer n n Divide datapath into nearly equal tasks, to be

Single-Cycle Datapath Instruction class Instr. fetch (IF) Instr. Decode (also reg. file read) (ID)

Execution Time: Single-Cycle 0 lw $1, 100($0) lw $2, 200($0) lw $3, 300($0) 2

Pipelined Datapath Instruction class Instr. fetch (IF) Instr. Decode (also reg. file read) (ID)

Execution Time: Pipeline 0 lw $1, 100($0) lw $2, 200($0) lw $3, 300($0) 2

Pipeline Performance Clock cycle time = 2 ns 1, 003 lw instructions: Total time

Single-Cycle Datapath 1 mux 0 PC 16 -20 11 -15 Reg. Dst 0 -15

Pipelining of RISC Instructions Fetch Instruction Examine Opcode Fetch Operands Perform Operation Store Result

This requires a CONTROL not too different from single-cycle 0 -15 11 -15 Reg.

Pipeline Register Functions p Four pipeline registers are added: Register name Data held IF/ID

ID/EX EX/MEM Shift left 2 opcode 1 mux 0 IF/ID ALU 4 Add Pipelined

Five-Cycle Pipeline CC 5 MEM/WB REG. FILE WRITE DM CC 4 EX/MEM CC 3

Add Instruction add IF ID read $s 1 read $s 2 CC 3 CC

11 -15 for R-type 16 -20 for I-type lw t 0 16 -20 s

Load Instruction lw $t 0, 1200 ($t 1) p 100011 01000 0001 0010 0000

PC Instr mem 16 -20 11 -15 for R-type 16 -20 for I-type lw

Store Instruction sw $t 0, 1200 ($t 1) p 101011 01000 0001 0010 0000

16 -20 t 0 $t 1 $t 0 zero Sign ext. 11 -15 for

Executing a Program Consider a five-instruction segment: lw sub add lw add $10, 20($1)

ID/EX ALU EX/MEM IF/ID ID, REG. FILE READ ID/EX Introduction to Computer Organization and

CC 5 IF/ID ID/EX EX/MEM Add 4 Shift left 2 opcode ALU IF: add

Advantages of Pipeline n After the fifth cycle (CC 5), one instruction is completed

Pipeline Hazards Definition: Hazard in a pipeline is a situation in which the next

Structural Hazard Two instructions cannot execute due to a resource conflict. p Example: Consider

add $12, $3, $4 lw $13, 24($1) EX/MEM IM/DM ID/EX ALU Introduction to Computer

Possible Remedies for Structural Hazards Provide duplicate hardware resources in datapath. p Control unit

lw $13, 24($1) Introduction to Computer Organization and Architecture Stall (bubble) Program instructions lw

Data Hazard Data hazard means that an instruction cannot be completed because the needed

Example of Data Hazard time Write s 0 in CC 5 add $s 0,

Forwarding or Bypassing Output of a resource used by an instruction is forwarded to

Forwarding for Data Hazard time Write s 0 in CC 5 add $s 0,

Forwarding Unit Hardware Source reg. IDs from opcode MEM/WB Data Mem. MUX ALU FORW.

Forwarding Alone May Not Work Write s 0 in CC 5 lw $s 0,

Use Bubble and Forwarding time CC 5 Write s 0 in CC 5 DM

Hazard Detection Unit Hardware Source reg. IDs from opcode to reg. file ALU 0

Resolving Hazards are resolved by Hazard detection and forwarding units. p Compiler’s understanding of

Avoiding Stall by Code Reorder C code: A = B + E; C =

Reordered Code C code: A = B + E; C = B + F;

Control Hazard Instruction to be fetched is not known! p Example: Instruction being executed

beq $1, $2, 40 MEM/WB REG. FILE WRITE Program instructions MEM/WB REG. FILE WRITE

Why Only One Stall? p Extra hardware in ID phase: Additional ALU to compute

Ways to Handle Branch Stall or bubble p Branch prediction: p n Heuristics Next

Delayed Branch Example p Stall on branch n n n p add $4, $5,

next instruction or skip or $7, $8, $9 Introduction to Computer Organization and Architecture

Summary: Hazards n Structural hazards Cause: resource conflict p Remedies: (i) hardware resources, (ii)

Slides: 54

Download presentation

Introduction to Computer Organization and Architecture Lecture 12 By Juthawut Chantharamalee http: //dusithost. dusit. ac. th/~juthawut_cha/ home. htm

Outline p Pipelining n n n Basics Throughput Execution Time Pipeline Registers Program Execution Hazards Structural p Data p Control p Introduction to Computer Organization and Architecture 2

ILP: Instruction Level Parallelism Single-cycle and multi-cycle datapaths execute one instruction at a time. p How can we get better performance? p Answer: Execute multiple instruction at a time: p Pipelining – Enhance a multi-cycle datapath to fetch one instruction every cycle. p Parallelism – Fetch multiple instructions every cycle. p Introduction to Computer Organization and Architecture 3

Automobile Team Assembly 1 hour 1 car assembled every four hours 6 cars per day 180 cars per month 2, 040 cars per year Introduction to Computer Organization and Architecture 4

Automobile Assembly Line Task 1 1 hour Mecahnical Task 2 1 hour Task 3 1 hour Electrical Painting Task 4 1 hour Testing First car assembled in 4 hours (pipeline latency) thereafter 1 car per hour 21 cars on first day, thereafter 24 cars per day 717 cars per month 8, 637 cars per year Introduction to Computer Organization and Architecture 5

Throughput: Team Assembly Red car started Red car completed Mechanical Electrical Painting Testing Blue car started Blue car completed Time of assembling one car Time = n hours where n is the number of nearly equal subtasks, each requiring 1 unit of time Throughput = 1/n cars per unit time Introduction to Computer Organization and Architecture 6

Throughput: Assembly Line Car 1 Mechanical Electrical Car 2 Painting Mechanical Electrical Car 3 Testing Painting Mechanical Electrical Car 4 Car 1 complete . Testing Painting Testing Car 2 complete time . Time to complete first car Cars completed in time T Throughput =n time units (latency) =T–n+1 = 1 - (n - 1)/ T car per unit time Throughput (assembly line) ────────── Throughput (team assembly) = 1 – (n - 1)/ T ──── = 1/n n (n – 1) n – ───── → T Introduction to Computer Organization and Architecture n as T→∞ 7

Some Features of Assembly Line Electrical parts delivered (JIT) Task 1 1 hour Mechanical Stall assembly line to fix the cause of defect Task 2 1 hour Task 3 1 hour Task 4 1 hour Electrical Painting Testing 3 cars in the assembly line are suspects, to be removed (flush pipeline) Introduction to Computer Organization and Architecture Defect found 8

Pipelining in a Computer n n Divide datapath into nearly equal tasks, to be performed serially and requiring non-overlapping resources. Insert registers at task boundaries in the datapath; registers pass the output data from one task as input data to the next task. Synchronize tasks with a clock having a cycle time that just exceeds the time required by the longest task. Break each instruction down into a fixed number of tasks so that instructions can be executed in a staggered fashion. Introduction to Computer Organization and Architecture 9

Single-Cycle Datapath Instruction class Instr. fetch (IF) Instr. Decode (also reg. file read) (ID) Execution (ALU Operation) (EX) Data access (MEM) Write Back (Reg. file write) (WB) Total time lw 2 ns 1 ns 8 ns sw 2 ns 1 ns 2 ns R-format add, sub, and, or, slt 2 ns 1 ns 2 ns B-format, beq 2 ns 1 ns 2 ns 8 ns 1 ns 8 ns No operation on data; idle time equalizes instruction length to a fixed clock period. Introduction to Computer Organization and Architecture 10

Execution Time: Single-Cycle 0 lw $1, 100($0) lw $2, 200($0) lw $3, 300($0) 2 IF ID 4 EX 6 8 10 12 14 16 . . Time (ns) MEM WB IF ID EX MEM WB Clock cycle time = 8 ns Total time for executing three lw instructions = 24 ns Introduction to Computer Organization and Architecture 11

Pipelined Datapath Instruction class Instr. fetch (IF) Instr. Decode (also reg. file read) (ID) Execution (ALU Operation) (EX) Data access (MEM) lw Write Back (Reg. file write) (WB) Total time 2 ns 1 ns 2 ns 2 ns 10 ns sw 2 ns 1 ns 2 ns 2 ns 10 ns R-format: add, sub, and, or, slt 2 ns 1 ns 2 ns 2 ns 10 ns B-format: beq 2 ns 1 ns 2 ns 2 ns 10 ns No operation on data; idle time inserted to equalize instruction lengths. Introduction to Computer Organization and Architecture 12

Execution Time: Pipeline 0 lw $1, 100($0) lw $2, 200($0) lw $3, 300($0) 2 IF 4 6 8 ID EX MEM IF ID EX IF ID 10 12 14 16 . . Time (ns) RW MEM RW EX MEM RW Clock cycle time = 2 ns, four times faster than single-cycle clock Total time for executing three lw instructions = 14 ns Performance ratio = Single-cycle time ────── Pipeline time Introduction to Computer Organization and Architecture = 24 ── = 1. 7 14 13

Pipeline Performance Clock cycle time = 2 ns 1, 003 lw instructions: Total time for executing 1, 003 lw instructions Performance ratio = = 2, 014 ns Single-cycle time ────── Pipeline time = 8, 024 ──── = 3. 98 2, 014 80, 024 / 20, 014 = 3. 998 → Clock cycle ratio (4) 10, 003 lw instructions: Performance ratio = Pipeline performance approaches clock-cycle ratio for long programs. Introduction to Computer Organization and Architecture 14

Single-Cycle Datapath 1 mux 0 PC 16 -20 11 -15 Reg. Dst 0 -15 Sign ext. Shift left 2 ALUOp WB: writeback ALU 1 mux 0 MEM: mem. access Memto. Reg zero Data mem. Mem. Write Mem. Read 0 mux 1 Reg. Write ALUSrc 21 -25 Instr. mem. Branch ALU 26 -31 EX: Execute, address calc. 1 mux 0 opcode Reg. File Add 4 CONTROL ID: Instr. decode, reg. file read IF: Instr. fetch ALU Cont. 0 -5 Introduction to Computer Organization and Architecture 15

Pipelining of RISC Instructions Fetch Instruction Examine Opcode Fetch Operands Perform Operation Store Result IF ID EX MEM WB Instruction Fetch Decode instruction and Fetch operands Execute Memory Operation Write Back to Reg file Although an instruction takes five clock cycles, one instruction is completed every cycle. Introduction to Computer Organization and Architecture 16

This requires a CONTROL not too different from single-cycle 0 -15 11 -15 Reg. Dst Sign ext. ALU MEM/WB Shift left 2 ALU zero Data mem. Mem. Write Mem. Read 0 mux 1 16 -20 1 mux 0 Instr. mem. Memto. Reg. Write ALUSrc 21 -25 1 mux 0 26 -31 PC EX/MEM Branch Reg. File opcode CONTROL 4 ID/EX Add IF/ID 1 mux 0 Pipeline Registers ALUOp ALU Cont. 0 -5 Introduction to Computer Organization and Architecture 17

Pipeline Register Functions p Four pipeline registers are added: Register name Data held IF/ID PC+4, Instruction word (IW) ID/EX PC+4, R 1, R 2, IW(0 -15) sign ext. , IW(11 -15) EX/MEM PC+4, zero, ALUResult, R 2, IW(11 -15) or IW(16 -20) MEM/WB M[ALUResult], ALUResult, IW(11 -15) or IW(16 -20) Introduction to Computer Organization and Architecture 18

ID/EX EX/MEM Shift left 2 opcode 1 mux 0 IF/ID ALU 4 Add Pipelined Datapath MEM/WB 26 -31 11 -15 for R-type 16 -20 for I-type lw Sgn ext Data mem 0 mux 1 ALU 16 -20 1 mux 0 PC Instr mem zero Reg. File 21 -25 0 -15 Introduction to Computer Organization and Architecture 19

Five-Cycle Pipeline CC 5 MEM/WB REG. FILE WRITE DM CC 4 EX/MEM CC 3 ALU CC 2 IF/ID ID, REG. FILE READ ID/EX IM CC 1 Introduction to Computer Organization and Architecture 20

Add Instruction add IF ID read $s 1 read $s 2 CC 3 CC 4 EX add $s 1+$s 2 MEM CC 5 MEM/WB REG. FILE WRITE CC 2 DM CC 1 EX/MEM n ALU n Machine instruction word 000000 10001 10010 01000 00000 100000 opcode $s 1 $s 2 $t 0 function IF/ID ID, REG. FILE READ ID/EX n $t 0, $s 1, $s 2 IM p WB write $t 0 Introduction to Computer Organization and Architecture 21

11 -15 for R-type 16 -20 for I-type lw t 0 16 -20 s 2 $s 1 $s 2 MEM/WB zero Sign ext. addr Data mem data 0 mux 1 PC Instr mem Reg. File 26 -31 s 1 1 mux 0 Shift left 2 opcode 21 -25 EX/MEM ALU ID/EX ALU IF/ID 1 mux 0 4 Add Pipelined Datapath Executing add 0 -15 Introduction to Computer Organization and Architecture 22

Load Instruction lw $t 0, 1200 ($t 1) p 100011 01000 0001 0010 0000 p opcode $t 1 $t 0 1200 p IF ID read $t 1 sign ext 1200 CC 5 MEM/WB REG. FILE WRITE DM CC 4 EX/MEM CC 3 ALU CC 2 IF/ID ID, REG. FILE READ ID/EX IM CC 1 EX MEM add read $t 1+1200 M[addr] Introduction to Computer Organization and Architecture WB write $t 0 23

PC Instr mem 16 -20 11 -15 for R-type 16 -20 for I-type lw t 0 0 -15 1200 $t 1 MEM/WB zero Sign ext. Introduction to Computer Organization and Architecture addr Data mem data 0 mux 1 21 -25 Reg. File 26 -31 t 1 Shift left 2 1 mux 0 opcode EX/MEM ALU ID/EX ALU IF/ID 1 mux 0 4 Add Pipelined Datapath Executing lw 24

Store Instruction sw $t 0, 1200 ($t 1) p 101011 01000 0001 0010 0000 p opcode $t 1 $t 0 1200 p IF ID read $t 1 sign ext 1200 CC 5 MEM/WB REG. FILE WRITE DM CC 4 EX/MEM CC 3 ALU CC 2 IF/ID ID, REG. FILE READ ID/EX IM CC 1 EX MEM add write $t 1+1200 M[addr] (addr) ← $t 0 Introduction to Computer Organization and Architecture WB 25

16 -20 t 0 $t 1 $t 0 zero Sign ext. 11 -15 for R-type 16 -20 for I-type lw MEM/WB addr Data mem data 0 mux 1 PC Instr mem Reg. File 26 -31 t 1 1 mux 0 Shift left 2 opcode 21 -25 EX/MEM ALU ID/EX ALU IF/ID 1 mux 0 4 Add Pipelined Datapath Executing sw 0 -15 1200 Introduction to Computer Organization and Architecture 26

Executing a Program Consider a five-instruction segment: lw sub add lw add $10, 20($1) $11, $2, $3 $12, $3, $4 $13, 24($1) $14, $5, $6 Introduction to Computer Organization and Architecture 27

ID/EX ALU EX/MEM IF/ID ID, REG. FILE READ ID/EX Introduction to Computer Organization and Architecture REG. FILE WRITE MEM/WB REG. FILE WRITE lw $10, 20($1) sub $11, $2, $3 Program instructions REG. FILE WRITE MEM/WB DM MEM/WB CC 5 REG. FILE WRITE MEM/WB DM EX/MEM DM ALU ID, REG. FILE READ IM EX/MEM CC 4 EX/MEM ID/EX IF/ID DM ALU ID, REG. FILE READ IM EX/MEM CC 3 ALU ID, REG. FILE READ ID/EX $14, $5, $6 IF/ID add ALU $13, 24($1) ID, REG. FILE READ lw IM $12, $3, $4 ID/EX add IF/ID IM CC 2 IF/ID CC 1 IM Program Execution time 28

CC 5 IF/ID ID/EX EX/MEM Add 4 Shift left 2 opcode ALU IF: add $14, $5, $6 1 mux 0 MEM: ID: lw $13, 24($1) EX: add $12, $3, $4 sub $11, $2, $3 WB: lw $10, 20($1) MEM/WB 26 -31 Data mem. 0 mux 1 16 -20 1 mux 0 PC Instr mem Reg. File 21 -25 ALU zero Sign ext. 11 -15 for R-type 16 -20 for I-type lw 0 -15 Introduction to Computer Organization and Architecture 29

Advantages of Pipeline n After the fifth cycle (CC 5), one instruction is completed each cycle; CPI ≈ 1, neglecting the initial pipeline latency of 5 cycles. Pipeline latency is defined as the number of stages in the pipeline, or p The number of clock cycles after which the first instruction is completed. p n n n The clock cycle time is about four times shorter than that of single-cycle datapath and about the same as that of multicycle datapath. For multicycle datapath, CPI = 3. …. So, pipelined execution is faster, but. . . Introduction to Computer Organization and Architecture 30

Pipeline Hazards Definition: Hazard in a pipeline is a situation in which the next instruction cannot complete execution one clock cycle after completion of the present instruction. p Three types of hazards: p n n n Structural hazard (resource conflict) Data hazard Control hazard Introduction to Computer Organization and Architecture 31

Structural Hazard Two instructions cannot execute due to a resource conflict. p Example: Consider a computer with a common data and instruction memory. The fourth cycle of a lw instruction requires memory access (memory read) and at the same time the first cycle of the fourth instruction requires instruction fetch (memory read). This will cause a memory resource conflict. p Introduction to Computer Organization and Architecture 32

add $12, $3, $4 lw $13, 24($1) EX/MEM IM/DM ID/EX ALU Introduction to Computer Organization and Architecture REG. FILE WRITE MEM/WB IM/DM REG. FILE WRITE MEM/WB CC 5 lw $10, 20($1) sub $11, $2, $3 Program instructions REG. FILE WRITE MEM/WB IM/DM MEM/WB ALU ID, REG. FILE READ EX/MEM IM/DM ALU CC 4 EX/MEM ID/EX ID, REG. FILE READ EX/MEM ID/EX CC 3 IF/ID IM/DM Common data and instr. Mem. IF/ID ALU ID, REG. FILE READ CC 2 IM/DM ID/EX CC 1 IF/ID IM/DM ID, REG. FILE READ IF/ID IM/DM Example of Structural Hazard time Needed by two instructions 33

Possible Remedies for Structural Hazards Provide duplicate hardware resources in datapath. p Control unit or compiler can insert delays (no-op cycles) between instructions. This is known as pipeline stall or bubble. p Introduction to Computer Organization and Architecture 34

lw $13, 24($1) Introduction to Computer Organization and Architecture Stall (bubble) Program instructions lw $10, 20($1) sub $11, $2, $3 REG. FILE WRITE MEM/WB IM/DM REG. FILE WRITE MEM/WB CC 5 MEM/WB IM/DM EX/MEM ALU ID/EX ID, REG. FILE READ IM/DM ALU MEM/WB EX/MEM ID/EX IM/DM CC 4 EX/MEM ALU ID, REG. FILE READ EX/MEM ID/EX CC 3 IF/ID $12, $3, $4 IF/ID ALU ID, REG. FILE READ CC 2 IM/DM ID/EX IF/ID IM/DM ID, REG. FILE READ CC 1 IM/DM add IF/ID IM/DM Stall (Bubble) for Structural Hazard time 35

Data Hazard Data hazard means that an instruction cannot be completed because the needed data, being generated by another instruction in the pipeline, is not available. p Example: consider two instructions: p p p add $s 0, $t 1 sub $t 2, $s 0, $t 3 # needs $s 0 Introduction to Computer Organization and Architecture 36

Example of Data Hazard time Write s 0 in CC 5 add $s 0, $t 1 sub $t 2, $s 0, $t 3 Program instructions MEM/WB REG. FILE WRITE DM MEM/WB REG. FILE WRITE CC 5 EX/MEM DM ALU EX/MEM ALU CC 4 IF/ID ID, REG. FILE READ ID/EX IM CC 3 ID/EX CC 2 IF/ID ID, REG. FILE READ IM CC 1 Read s 0 and t 3 in CC 3 We need to read s 0 from reg file in cycle 3 But s 0 will not be written in reg file until cycle 5 However, s 0 will only be used in cycle 4 And it is available at the end of cycle 3 Introduction to Computer Organization and Architecture 37

Forwarding or Bypassing Output of a resource used by an instruction is forwarded to the input of some resource being used by another instruction. p Forwarding can eliminate some, but not all, data hazards. p Introduction to Computer Organization and Architecture 38

Forwarding for Data Hazard time Write s 0 in CC 5 add $s 0, $t 1 sub $t 2, $s 0, $t 3 Program instructions MEM/WB REG. FILE WRITE DM ng DM IF/ID ID, REG. FILE READ ID/EX IM di CC 5 EX/MEM rw ar ALU Fo CC 4 EX/MEM CC 3 ALU CC 2 IF/ID ID, REG. FILE READ ID/EX IM CC 1 Read s 0 and t 3 in CC 3 Introduction to Computer Organization and Architecture 39

Forwarding Unit Hardware Source reg. IDs from opcode MEM/WB Data Mem. MUX ALU FORW. MUX Data to reg. file EX/MEM FORW. MUX ID/EX Destination registers Forwarding Unit Introduction to Computer Organization and Architecture 40

Forwarding Alone May Not Work Write s 0 in CC 5 lw $s 0, 20($s 1) sub $t 2, $s 0, $t 3 Program instructions MEM/WB REG. FILE WRITE DM ALU time CC 5 EX/MEM DM EX/MEM ALU CC 4 IF/ID ID, REG. FILE READ ID/EX IM CC 3 ID/EX CC 2 IF/ID ID, REG. FILE READ IM CC 1 Read s 0 and t 3 in CC 3 data needed by sub (data hazard) data available from memory only at the end of cycle 4 Introduction to Computer Organization and Architecture 41

Use Bubble and Forwarding time CC 5 Write s 0 in CC 5 DM MEM/WB REG. FILE WRITE CC 4 EX/MEM ALU CC 3 ID/EX CC 2 IF/ID ID, REG. FILE READ IM CC 1 lw $s 0, 20($s 1) rw Fo Program instructions MEM/WB REG. FILE WRITE DM EX/MEM ALU ID/EX IF/ID ID, REG. FILE READ IM $t 2, $s 0, $t 3 ing sub ard stall (bubble) Introduction to Computer Organization and Architecture 42

Hazard Detection Unit Hardware Source reg. IDs from opcode to reg. file ALU 0 EX/MEM FORW. MUX IF/ID PC Instruction Control ID/EX FORW. MUX Hazard Detection Unit NOP MUX Disable write Forwarding Unit Introduction to Computer Organization and Architecture MEM/WB Data Mem. Control signals 43

Resolving Hazards are resolved by Hazard detection and forwarding units. p Compiler’s understanding of how these units work can improve performance. p Introduction to Computer Organization and Architecture 44

Avoiding Stall by Code Reorder C code: A = B + E; C = B + F; MIPS code: lw $t 1, lw $t 2, add $t 3, sw $t 3, lw $t 4, add $t 5, sw $t 5, 0($t 0) 4($t 0) $t 1, $t 2 12($t 0) 8($t 0) $t 1, $t 4 16, ($t 0) . . . . $t 1 written $t 2 written $t 1, $t 2 needed. . . . $t 4 written $t 4 needed. . . Introduction to Computer Organization and Architecture 45

Reordered Code C code: A = B + E; C = B + F; MIPS code: lw $t 1, lw $t 2, lw $t 4, add $t 3, sw $t 3, add $t 5, sw $t 5, 0($t 0) 4($t 0) 8($t 0) $t 1, $t 2 12($t 0) $t 1, $t 4 16, ($t 0) no hazard Introduction to Computer Organization and Architecture 46

Control Hazard Instruction to be fetched is not known! p Example: Instruction being executed is branchtype, which will determine the next instruction: p n n n add $4, $5, $6 beq $1, $2, 40 next instruction. . . 40 and $7, $8, $9 Introduction to Computer Organization and Architecture 47

beq $1, $2, 40 MEM/WB REG. FILE WRITE Program instructions MEM/WB REG. FILE WRITE DM add DM EX/MEM ALU MEM/WB REG. FILE WRITE ALU CC 4 DM EX/MEM Stall (bubble) IF/ID ID, REG. FILE READ ID/EX EX/MEM CC 3 ALU IF/ID ID, REG. FILE READ ID/EX CC 2 IM next instruction or and $7, $8, $9 IM CC 1 IM Stall on Branch CC 5 time $4, $5, $6

Why Only One Stall? p Extra hardware in ID phase: Additional ALU to compute branch address p Comparator to generate zero signal p Hazard detection unit writes the branch address in PC p Introduction to Computer Organization and Architecture 49

Ways to Handle Branch Stall or bubble p Branch prediction: p n Heuristics Next instruction p Prediction based on statistics (dynamic) p Hardware decision (dynamic) p n p Prediction error: pipeline flush Delayed branch Introduction to Computer Organization and Architecture 50

Delayed Branch Example p Stall on branch n n n p add $4, $5, $6 beq $1, $2, skip next instruction. . . skip or $7, $8, $9 Delayed branch n n n beq $1, $2, skip add $4, $5, $6 next instruction. . . skip or $7, $8, $9 Instruction executed irrespective of branch decision Introduction to Computer Organization and Architecture 51

next instruction or skip or $7, $8, $9 Introduction to Computer Organization and Architecture REG. FILE WRITE Program instructions REG. FILE WRITE MEM/WB DM CC 5 MEM/WB DM EX/MEM ALU ID, REG. FILE READ EX/MEM ALU ID/EX CC 4 ID/EX ID, REG. FILE READ IF/ID ID, REG. FILE READ CC 3 IF/ID IM add $4, $5, $6 CC 2 IM beq $1, $2, skip IF/ID CC 1 IM Delayed Branch time 52

Summary: Hazards n Structural hazards Cause: resource conflict p Remedies: (i) hardware resources, (ii) stall (bubble) p n Data hazards Cause: data unavailablity p Remedies: (i) forwarding, (ii) stall (bubble), (iii) code reordering p n Control hazards Cause: out-of-sequence execution (branch or jump) p Remedies: (i) stall (bubble), (ii) branch prediction/pipeline flush, (iii) delayed branch/pipeline flush p Introduction to Computer Organization and Architecture 53

The End Lecture 12