Computer Architecture Microarchitectures Virendra Singh Associate Professor Computer

  • Slides: 56
Download presentation
Computer Architecture Micro-architectures Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department

Computer Architecture Micro-architectures Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http: //www. ee. iitb. ac. in/~viren/ E-mail: viren@ee. iitb. ac. in EE-717/453 Advanced Computing for Electrical Engineers Lecture 17

Instruction Set Design • How complex instruction? • Church’s thesis: A very primitive computer

Instruction Set Design • How complex instruction? • Church’s thesis: A very primitive computer can compute anything that a fancy computer can compute – you need only arithmetic/logical functions, read and write memory, and datadependent decisions • Therefore, ISA selected for practical reasons: – Performance and cost, not computability • Regularity tends to improve both – E. g. H/W to handle arbitrary number of operands is complex and slow and UNNECESSARY 28 Aug 2012 EE-717/EE-453@IITB 2

Instruction Set v Should be complete Ø One should be able to construct a

Instruction Set v Should be complete Ø One should be able to construct a machine level program to evaluate any function v Should be efficient Ø Frequently required functions can be completed quickly using relatively few instructions v Should be regular Ø Should contain expected opcodes and addressing modes v Compatible with existing machines 28 Aug 2012 EE-717/EE-453@IITB 3

How to Choose ISA • Minimize what? – Instrs/prog x cycles/instr x sec/cycle !!!

How to Choose ISA • Minimize what? – Instrs/prog x cycles/instr x sec/cycle !!! • In 1985 -1995 technology, simple modes like MIPS were great – As technology changes, computer design options change • If memory is limited, dense instructions are important [CISC case] • For high speed, pipelining and ease of pipelining is important [RISC case] 28 Aug 2012 EE-717/EE-453@IITB 4

Processor Architecture From memory PI Controlle r Control Signals Datapath Status Signals PO 28

Processor Architecture From memory PI Controlle r Control Signals Datapath Status Signals PO 28 Aug 2012 To memory EE-717/EE-453@IITB 6

Instruction Format Op-code Operands Addressing • Register Specification • Effective Address • Implicit Reference

Instruction Format Op-code Operands Addressing • Register Specification • Effective Address • Implicit Reference 28 Aug 2012 EE-717/EE-453@IITB 7

Micro-coded Implementation Clock-Phase Generator Control Store Bus Controller Instruction Decoder State Sequencer Control Word

Micro-coded Implementation Clock-Phase Generator Control Store Bus Controller Instruction Decoder State Sequencer Control Word Decoder Controller Program Counter Registers R 0 R 1 Rn Shifter ALU Datapath 28 Aug 2012 EE-717/EE-453@IITB 8

Random Logic Implementation Clock-Phase Generator Bus Controller Random Logic Controller Program Counter Registers R

Random Logic Implementation Clock-Phase Generator Bus Controller Random Logic Controller Program Counter Registers R 0 R 1 Rn Shifter ALU Datapath 28 Aug 2012 EE-717/EE-453@IITB 9

Instruction ADD R 1, D 2(B 2) ‘ 5 A’ 0 R 1 8

Instruction ADD R 1, D 2(B 2) ‘ 5 A’ 0 R 1 8 B 2 12 D 2 16 31 v The second operand is added in the first v The sum is placed in the first operand location v The operand the sum are treated as 16 -bit signed binary integers v The first operand is in the register specified by the R 1 field v The second operand is in the memory – address is calculated by adding the displacement specified by the D 2 field to the content of the base register specified by the B 2 field 28 Aug 2012 EE-717/EE-453@IITB 10

Execution Steps for ADD instruction Execution 1. Fetch the first half word 2. Find

Execution Steps for ADD instruction Execution 1. Fetch the first half word 2. Find ADD control word sequence 3. Fetch the remaining instruction word 4. Calculate the operand address 5. Fetch the operand 6. Add 7. Store the result 28 Aug 2012 EE-717/EE-453@IITB 11

Microcoded Processor Design Clock-Phase Generator Reset & Power-On Logic Next State Control Store Interrupt

Microcoded Processor Design Clock-Phase Generator Reset & Power-On Logic Next State Control Store Interrupt Logic Bus Controller IR Decoder Branch Control unit Encoded Control Word Fields Instruction Prefetch Register Control Word Decoder Decoded Datapath Control Address Out Reg. PC Datapath 28 Aug 2012 R 0 R 1 Internal A Bus Rn Shifter Data Reg. ALU Internal B Bus EE-717/EE-453@IITB 12

RISC ISA: DLX Example v simple instructions, all 32 bits wide v Arithmetic, Control

RISC ISA: DLX Example v simple instructions, all 32 bits wide v Arithmetic, Control Flow, Load/Store v Load/Store Architecture v very structured, no unnecessary baggage v only three R I J op op op instruction formats rs 1 rs 2 rd funct rd 16 bit address 26 bit address v rely on compiler to achieve performance 28 Aug 2012 EE-717/EE-453@IITB 13

Processor Architecture From memory PI Controlle r Control Signals Datapath Status Signals PO 28

Processor Architecture From memory PI Controlle r Control Signals Datapath Status Signals PO 28 Aug 2012 To memory EE-717/EE-453@IITB 14

Single-Cycle/Multi-Cycle Datapath Instruction class Instr. fetch (IF) Load 2 ns 1 ns Store 2

Single-Cycle/Multi-Cycle Datapath Instruction class Instr. fetch (IF) Load 2 ns 1 ns Store 2 ns Arithmetic and 2 ns Logical Branch/Jump 2 ns 28 Aug 2012 Instr. Execution Decode (ALU (also reg. Operation) file read) (EX) (ID) Data access (MEM) Write Back (Reg. file write) (WB) Total time 2 ns 1 ns 8 ns 1 ns 2 ns EE-717/EE-453@IITB 8 ns 1 ns 8 ns 15

ILP: Instruction Level Parallelism • Single-cycle and multi-cycle datapaths execute one instruction at a

ILP: Instruction Level Parallelism • Single-cycle and multi-cycle datapaths execute one instruction at a time. • How can we get better performance? • Answer: Execute multiple instruction at a time: ØPipelining – Enhance a multi-cycle datapath to fetch one instruction every cycle. ØParallelism – Fetch multiple instructions every cycle. 28 Aug 2012 EE-717/EE-453@IITB 16

Automobile Team Assembly 1 hour 1 car assembled every four hours 6 cars per

Automobile Team Assembly 1 hour 1 car assembled every four hours 6 cars per day 180 cars per month 2, 040 cars per year 28 Aug 2012 EE-717/EE-453@IITB 17

Automobile Assembly Line Task 1 1 hour Mecahnical Task 2 1 hour Task 3

Automobile Assembly Line Task 1 1 hour Mecahnical Task 2 1 hour Task 3 1 hour Task 4 1 hour Electrical Painting Testing First car assembled in 4 hours (pipeline latency) thereafter 1 car per hour 21 cars on first day, thereafter 24 cars per day 717 cars per month 8, 637 cars per year 28 Aug 2012 EE-717/EE-453@IITB 18

Throughput: Team Assembly Red car started Red car completed Mechanical Electrical Painting Testing Blue

Throughput: Team Assembly Red car started Red car completed Mechanical Electrical Painting Testing Blue car started Blue car completed Time of assembling one car Time = n hours where n is the number of nearly equal subtasks, each requiring 1 unit of time Throughput 28 Aug 2012 = 1/n cars per unit time EE-717/EE-453@IITB 19

Throughput: Assembly Line Car 1 Mechanical Electrical Painting Testing Mechanical Electrical Car 2 Painting

Throughput: Assembly Line Car 1 Mechanical Electrical Painting Testing Mechanical Electrical Car 2 Painting Mechanical Electrical Car 3 Painting Mechanical Electrical Car 4 Car 1 complete . . Testing Time to complete first car =n Cars completed in time T =T–n+1 Throughput = 1 - (n - 1)/ T Throughput (assembly line) ────────── Throughput (team assembly) 28 Aug 2012 Testing Painting Car 2 complete Testing time units (latency) = car per unit time 1 – (n - 1)/ T ──── = 1/n EE-717/EE-453@IITB n(n – 1) n – ───── → T 20 n as T→∞

Some Features of Assembly Line Electrical parts delivered (JIT) Task 1 1 hour Mechanical

Some Features of Assembly Line Electrical parts delivered (JIT) Task 1 1 hour Mechanical Stall assembly line to fix the cause of defect 28 Aug 2012 Task 2 1 hour Task 3 1 hour Task 4 1 hour Electrical Painting Testing 3 cars in the assembly line are suspects, to be removed (flush pipeline) EE-717/EE-453@IITB 21 Defect found

Pipelining in a Computer Ø Divide datapath into nearly equal tasks, to be performed

Pipelining in a Computer Ø Divide datapath into nearly equal tasks, to be performed serially and requiring non-overlapping resources. Ø Insert registers at task boundaries in the datapath; registers pass the output data from one task as input data to the next task. Ø Synchronize tasks with a clock having a cycle time that just exceeds the time required by the longest task. Ø Break each instruction down into a fixed number of tasks so that instructions can be executed in a staggered fashion. 28 Aug 2012 EE-717/EE-453@IITB 22

Ideal Pipelining • Bandwidth increases linearly with pipeline depth • Latency increases by latch

Ideal Pipelining • Bandwidth increases linearly with pipeline depth • Latency increases by latch delays 28 Aug 2012 EE-717/EE-453@IITB 23

Pipelining Idealisms • Uniform subcomputations – Can pipeline into stages with equal delay –

Pipelining Idealisms • Uniform subcomputations – Can pipeline into stages with equal delay – Balance pipeline stages • Identical computations – Can fill pipeline with identical work – Unify instruction types • Independent computations – No relationships between work units • Are these practical? – No, but can get close enough to get significant speedup

Single-Cycle/Multi-Cycle Datapath Instruction class Instr. fetch (IF) Load 2 ns 1 ns Store 2

Single-Cycle/Multi-Cycle Datapath Instruction class Instr. fetch (IF) Load 2 ns 1 ns Store 2 ns Arithmetic and 2 ns Logical Branch/Jump 2 ns 28 Aug 2012 Instr. Execution Decode (ALU (also reg. Operation) file read) (EX) (ID) Data access (MEM) Write Back (Reg. file write) (WB) Total time 2 ns 1 ns 8 ns 1 ns 2 ns EE-717/EE-453@IITB 8 ns 1 ns 8 ns 25

Execution Time: Single-Cycle 0 lw $1, 100($0) lw $2, 200($0) lw $3, 300($0) 2

Execution Time: Single-Cycle 0 lw $1, 100($0) lw $2, 200($0) lw $3, 300($0) 2 IF ID 4 EX 6 8 10 12 14 16 . . Time (ns) MEM WB IF ID EX MEM WB Clock cycle time = 8 ns Total time for executing three lw instructions = 24 ns 28 Aug 2012 EE-717/EE-453@IITB 26

Pipelined Datapath Instruction class Instr. fetch (IF) Instr. Decode (also reg. file read) (ID)

Pipelined Datapath Instruction class Instr. fetch (IF) Instr. Decode (also reg. file read) (ID) Execution (ALU Operation) (EX) Data access (MEM) Write Back (Reg. file write) (WB) 2 ns 1 ns 2 ns Arithmetic and Logical 2 ns 1 ns 2 ns Branch/Jump 2 ns 1 ns 2 ns Load Store 28 Aug 2012 2 ns EE-717/EE-453@IITB Total time 2 ns 1 ns 2 ns 10 ns 27

Execution Time: Pipeline 0 lw $1, 100($0) 2 IF lw $2, 200($0) 4 6

Execution Time: Pipeline 0 lw $1, 100($0) 2 IF lw $2, 200($0) 4 6 8 ID EX MEM IF ID EX IF ID lw $3, 300($0) 10 12 14 16 . . Time (ns) RW MEM RW EX MEM RW Clock cycle time = 2 ns, four times faster than single-cycle clock Total time for executing three lw instructions = 14 ns Performance ratio 28 Aug 2012 = Single-cycle time ────── Pipeline time EE-717/EE-453@IITB = 24 ── = 1. 7 14 28

Pipeline Performance Clock cycle time = 2 ns 1, 003 lw instructions: Total time

Pipeline Performance Clock cycle time = 2 ns 1, 003 lw instructions: Total time for executing 1, 003 lw instructions Performance ratio = Single-cycle time ────── Pipeline time = 2, 014 ns = 8, 024 ──── = 3. 98 2, 014 = 3. 998 → Clock cycle 10, 003 lw instructions: Performance ratio (4) = 80, 024 / 20, 014 Pipeline performance approaches clock-cycle ratio for long programs. 28 Aug 2012 EE-717/EE-453@IITB 29

Pipelining of RISC Instructions Fetch Instruction Examine Opcode Fetch Operands Perform Operation Store Result

Pipelining of RISC Instructions Fetch Instruction Examine Opcode Fetch Operands Perform Operation Store Result IF ID EX MEM WB Instruction Fetch Instruction Decode and Fetch operands Execute Memory Operation Write Back to Reg file Although an instruction takes five clock cycles, one instruction is completed every cycle. 28 Aug 2012 EE-717/EE-453@IITB 31

Five-Cycle Pipeline 28 Aug 2012 EE-717/EE-453@IITB CC 5 MEM/WB REG. FILE WRITE DM CC

Five-Cycle Pipeline 28 Aug 2012 EE-717/EE-453@IITB CC 5 MEM/WB REG. FILE WRITE DM CC 4 EX/MEM CC 3 ALU CC 2 IF/ID ID, REG. FILE READ ID/EX IM CC 1 35

Executing a Program Consider a five-instruction segment: lw sub add lw add 28 Aug

Executing a Program Consider a five-instruction segment: lw sub add lw add 28 Aug 2012 $10, 20($1) $11, $2, $3 $12, $3, $4 $13, 24($1) $14, $5, $6 EE-717/EE-453@IITB 42

lw $13, 24($1) add $14, $5, $6 28 Aug 2012 EE-717/EE-453@IITB DM MEM/WB REG.

lw $13, 24($1) add $14, $5, $6 28 Aug 2012 EE-717/EE-453@IITB DM MEM/WB REG. FILE WRITE ALU EX/MEM 43 MEM/WB REG. FILE WRITE DM EX/MEM IF/ID ID, REG. FILE READ ID/EX MEM/WB REG. FILE WRITE EX/MEM MEM/WB REG. FILE WRITE DM DM ALU EX/MEM CC 5 lw $10, 20($1) sub $11, $2, $3 Program instructions CC 4 ALU IF/ID ID, REG. FILE READ ID/EX IM $12, $3, $4 IF/ID ID, REG. FILE READ ID/EX CC 3 IM add ALU CC 2 IM IF/ID ID, REG. FILE READ ID/EX IM CC 1 IM Program Execution time

Advantages of Pipeline • After the fifth cycle (CC 5), one instruction is completed

Advantages of Pipeline • After the fifth cycle (CC 5), one instruction is completed each cycle; CPI ≈ 1, neglecting the initial pipeline latency of 5 cycles. Ø Pipeline latency is defined as the number of stages in the pipeline, or Ø The number of clock cycles after which the first instruction is completed. • The clock cycle time is about four times shorter than that of single-cycle datapath and about the same as that of multicycle datapath. CPI = 3. …. • So, pipelined execution is faster, but. . . 28 Aug 2012 EE-717/EE-453@IITB 45

Science is always wrong. It never solves a problem without creating ten more. George

Science is always wrong. It never solves a problem without creating ten more. George Bernard Shaw 28 Aug 2012 EE-717/EE-453@IITB 46

Pipeline Hazards • Definition: Hazard in a pipeline is a situation in which the

Pipeline Hazards • Definition: Hazard in a pipeline is a situation in which the next instruction cannot complete execution one clock cycle after completion of the present instruction. • Three types of hazards: Structural hazard (resource conflict) Data hazard Control hazard 28 Aug 2012 EE-717/EE-453@IITB 47

Structural Hazard • Two instructions cannot execute due to a resource conflict. • Example:

Structural Hazard • Two instructions cannot execute due to a resource conflict. • Example: Consider a computer with a common data and instruction memory. The fourth cycle of a lw instruction requires memory access (memory read) and at the same time the first cycle of the fourth instruction requires instruction fetch (memory read). This will cause a memory resource conflict. 28 Aug 2012 EE-717/EE-453@IITB 48

lw 28 Aug 2012 $13, 24($1) EE-717/EE-453@IITB CC 5 lw $10, 20($1) sub $11,

lw 28 Aug 2012 $13, 24($1) EE-717/EE-453@IITB CC 5 lw $10, 20($1) sub $11, $2, $3 Nedded by two instructions 49 Program instructions MEM/WB REG. FILE WRITE IM/DM ALU MEM/WB REG. FILE WRITE CC 4 EX/MEM IM/DM IF/ID ID, REG. FILE READ ID/EX ALU IM/DM ALU EX/MEM IF/ID ID, REG. FILE READ ID/EX CC 3 IM/DM Common data and instr. Mem. add $12, $3, $4 ALU CC 2 IM/DM ID/EX IF/ID ID, REG. FILE READ ID/EX IM/DM CC 1 IF/ID ID, REG. FILE READ IM/DM Example of Structural Hazard time

Possible Remedies for Structural Hazards • Provide duplicate hardware resources in datapath. • Control

Possible Remedies for Structural Hazards • Provide duplicate hardware resources in datapath. • Control unit or compiler can insert delays (noop cycles) between instructions. This is known as pipeline stall or bubble. 28 Aug 2012 EE-717/EE-453@IITB 50

lw $13, 24($1) 28 Aug 2012 EE-717/EE-453@IITB Stall (bubble) Program instructions MEM/WB REG. FILE

lw $13, 24($1) 28 Aug 2012 EE-717/EE-453@IITB Stall (bubble) Program instructions MEM/WB REG. FILE WRITE IM/DM MEM/WB REG. FILE WRITE CC 5 lw $10, 20($1) sub $11, $2, $3 MEM/WB REG. FILE WRITE IM/DM EX/MEM ALU IM/DM ALU MEM/WB REG. FILE WRITE IM/DM EX/MEM CC 4 EX/MEM ALU IF/ID ID, REG. FILE READ ID/EX CC 3 IF/ID ID, REG. FILE READ ID/EX $12, $3, $4 IM/DM CC 2 IM/DM add IF/ID ID, REG. FILE READ ID/EX IM/DM CC 1 IM/DM Stall (Bubble) for Structural Hazard time 51

Data Hazard • Data hazard means that an instruction cannot be completed because the

Data Hazard • Data hazard means that an instruction cannot be completed because the needed data, to be generated by another instruction in the pipeline, is not available. • Example: consider two instructions: – add $s 0, $t 1 – sub $t 2, $s 0, $t 3 28 Aug 2012 # needs $s 0 EE-717/EE-453@IITB 52

Example of Data Hazard CC 5 add $s 0, $t 1 sub $t 2,

Example of Data Hazard CC 5 add $s 0, $t 1 sub $t 2, $s 0, $t 3 Program instructions MEM/WB REG. FILE WRITE DM time Write s 0 in CC 5 MEM/WB REG. FILE WRITE EX/MEM DM ALU EX/MEM ALU CC 4 IF/ID ID, REG. FILE READ ID/EX IM CC 3 ID/EX CC 2 IF/ID ID, REG. FILE READ IM CC 1 Read s 0 and t 3 in CC 3 We need to read s 0 from reg file in cycle 3 But s 0 will not be written in reg file until cycle 5 However, s 0 will only be used in cycle 4 And it is available at the end of cycle 3 28 Aug 2012 EE-717/EE-453@IITB 53

Forwarding or Bypassing • Output of a resource used by an instruction is forwarded

Forwarding or Bypassing • Output of a resource used by an instruction is forwarded to the input of some resource being used by another instruction. • Forwarding can eliminate some, but not all, data hazards. 28 Aug 2012 EE-717/EE-453@IITB 54

add $s 0, $t 1 sub $t 2, $s 0, $t 3 Program instructions

add $s 0, $t 1 sub $t 2, $s 0, $t 3 Program instructions MEM/WB REG. FILE WRITE DM ng ALU IF/ID ID, REG. FILE READ ID/EX IM di Write s 0 in CC 5 MEM/WB REG. FILE WRITE rw ar time CC 5 EX/MEM Fo CC 4 DM IF/ID ID, REG. FILE READ ID/EX CC 3 EX/MEM CC 2 ALU CC 1 IM Forwarding for Data Hazard Read s 0 and t 3 in CC 3 28 Aug 2012 EE-717/EE-453@IITB 55

Forwarding Alone May Not Work CC 5 lw $s 0, 20($s 1) sub $t

Forwarding Alone May Not Work CC 5 lw $s 0, 20($s 1) sub $t 2, $s 0, $t 3 Program instructions MEM/WB REG. FILE WRITE DM time Write s 0 in CC 5 MEM/WB REG. FILE WRITE EX/MEM DM ALU EX/MEM ALU CC 4 IF/ID ID, REG. FILE READ ID/EX IM CC 3 ID/EX CC 2 IF/ID ID, REG. FILE READ IM CC 1 Read s 0 and t 3 in CC 3 data needed by sub (data hazard) data available from memory only at the end of cycle 4 28 Aug 2012 EE-717/EE-453@IITB 57

EE-717/EE-453@IITB lw Program instructions MEM/WB REG. FILE WRITE DM EX/MEM CC 4 ALU MEM/WB

EE-717/EE-453@IITB lw Program instructions MEM/WB REG. FILE WRITE DM EX/MEM CC 4 ALU MEM/WB REG. FILE WRITE DM EX/MEM ALU CC 3 ID/EX IF/ID ID, REG. FILE READ stall (bubble) ing $t 2, $s 0, $t 3 IM CC 2 ard 28 Aug 2012 ID/EX CC 1 rw Fo sub IF/ID ID, REG. FILE READ IM Use Bubble and Forwarding CC 5 time Write s 0 in CC 5 $s 0, 20($s 1) 58

Resolving Hazards ØHazards are resolved by Hazard detection and forwarding units. ØCompiler’s understanding of

Resolving Hazards ØHazards are resolved by Hazard detection and forwarding units. ØCompiler’s understanding of how these units work can improve performance. 28 Aug 2012 EE-717/EE-453@IITB 60

Avoiding Stall by Code Reorder C code: A = B + E; C =

Avoiding Stall by Code Reorder C code: A = B + E; C = B + F; MIPS code: lw $t 1, lw $t 2, add $t 3, sw $t 3, lw $t 4, add $t 5, sw $t 5, 28 Aug 2012 0($t 0) 4($t 0) $t 1, $t 2 12($t 0) 8($t 0) $t 1, $t 4 16, ($t 0) . . . EE-717/EE-453@IITB . . $t 1 written $t 2 written $t 1, $t 2 needed. . . . $t 4 written $t 4 needed. . . 61

Reordered Code C code: A = B + E; C = B + F;

Reordered Code C code: A = B + E; C = B + F; MIPS code: lw $t 1, lw $t 2, lw $t 4, add $t 3, sw $t 3, add $t 5, sw $t 5, 28 Aug 2012 0($t 0) 4($t 0) 8($t 0) $t 1, $t 2 12($t 0) $t 1, $t 4 16, ($t 0) EE-717/EE-453@IITB no hazard 62

Control Hazard • Instruction to be fetched is not known! • Example: Instruction being

Control Hazard • Instruction to be fetched is not known! • Example: Instruction being executed is branchtype, which will determine the next instruction: add $4, $5, $6 beq $1, $2, 40 next instruction . . . 40 and $7, $8, $9 28 Aug 2012 EE-717/EE-453@IITB 63

beq $1, $2, 40 28 Aug 2012 EE-717/EE-453@IITB MEM/WB REG. FILE WRITE Program instructions

beq $1, $2, 40 28 Aug 2012 EE-717/EE-453@IITB MEM/WB REG. FILE WRITE Program instructions MEM/WB REG. FILE WRITE DM add DM EX/MEM ALU MEM/WB REG. FILE WRITE ALU CC 4 DM EX/MEM Stall (bubble) IF/ID ID, REG. FILE READ ID/EX EX/MEM CC 3 ALU IF/ID ID, REG. FILE READ ID/EX CC 2 IM next instruction or and $7, $8, $9 IM CC 1 IM Stall on Branch CC 5 time $4, $5, $6 64

Why Only One Stall? • Extra hardware in ID phase: Ø Additional ALU to

Why Only One Stall? • Extra hardware in ID phase: Ø Additional ALU to compute branch address Ø Comparator to generate zero signal Ø Hazard detection unit writes the branch address in PC 28 Aug 2012 EE-717/EE-453@IITB 65

Ways to Handle Branch • Stall or bubble • Delayed branch • Branch prediction:

Ways to Handle Branch • Stall or bubble • Delayed branch • Branch prediction: – Heuristics • Next instruction • Prediction based on statistics (dynamic) • Hardware decision (dynamic) – Prediction error: pipeline flush 28 Aug 2012 EE-717/EE-453@IITB 66

Delayed Branch Example • Stall on branch add $4, $5, $6 beq $1, $2,

Delayed Branch Example • Stall on branch add $4, $5, $6 beq $1, $2, skip next instruction. . . skip or $7, $8, $9 • Delayed branch beq $1, $2, skip add $4, $5, $6 next instruction. . . skip or $7, $8, $9 Instruction executed irrespective of branch decision 28 Aug 2012 EE-717/EE-453@IITB 67

next instruction or skip or $7, $8, $9 28 Aug 2012 EE-717/EE-453@IITB 68 Program

next instruction or skip or $7, $8, $9 28 Aug 2012 EE-717/EE-453@IITB 68 Program instructions MEM/WB REG. FILE WRITE CC 5 MEM/WB REG. FILE WRITE DM ALU EX/MEM DM ALU CC 4 ID/EX EX/MEM ALU ID/EX IF/ID ID, REG. FILE READ ID/EX CC 3 IF/ID ID, REG. FILE READ add $4, $5, $6 IM beq $1, $2, skip IM CC 2 IF/ID ID, REG. FILE READ CC 1 IM Delayed Branch time

Summary: Hazards • Structural hazards – Cause: resource conflict – Remedies: (i) hardware resources,

Summary: Hazards • Structural hazards – Cause: resource conflict – Remedies: (i) hardware resources, (ii) stall (bubble) • Data hazards – Cause: data unavailablity – Remedies: (i) forwarding, (ii) stall (bubble), (iii) code reordering • Control hazards – Cause: out-of-sequence execution (branch or jump) – Remedies: (i) stall (bubble), (ii) branch prediction/pipeline flush, (iii) delayed branch/pipeline flush 28 Aug 2012 EE-717/EE-453@IITB 69

Thank You 28 Aug 2012 EE-717/EE-453@IITB 70

Thank You 28 Aug 2012 EE-717/EE-453@IITB 70