ELEC 5200 0016200 001 Computer Architecture and Design

  • Slides: 43
Download presentation
ELEC 5200 -001/6200 -001 Computer Architecture and Design Spring 2016 Pipeline Control and Performance

ELEC 5200 -001/6200 -001 Computer Architecture and Design Spring 2016 Pipeline Control and Performance (Chapter 6) Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849 http: //www. eng. auburn. edu/~vagrawal@eng. auburn. edu Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 1

EX/MEM Shift left 2 opcode 26 -31 ALU 4 ID/EX Add IF/ID 1 mux

EX/MEM Shift left 2 opcode 26 -31 ALU 4 ID/EX Add IF/ID 1 mux 0 Pipelined Datapath (without Jump) MEM/WB 1 mux 0 Sign ext. Data mem. 0 mux 1 16 -20 ALU 21 -25 1 mux 0 PC Instr mem Reg. File zero 16 -20 for I-type lw 11 -15 for R-type 0 -15 Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 2

16 -20 ALU 1 mux 0 Sign ext. Data mem. 0 mux 1 21

16 -20 ALU 1 mux 0 Sign ext. Data mem. 0 mux 1 21 -25 zero MEM/WB Mem. Write Mem. Read Instr mem Shift left 2 ALU PC EX/MEM 1 mux 0 26 -31 Reg. File opcode Reg. Write 4 ID/EX Add IF/ID 1 mux 0 Mem. and Reg. File Need Controls 16 -20 for I-type lw 11 -15 for R-type 0 -15 Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 3

1 mux 0 Sign ext. 16 -20 for I-type lw 11 -15 for R-type

1 mux 0 Sign ext. 16 -20 for I-type lw 11 -15 for R-type Memto. Reg Branch 1 mux 0 PCSrc Data mem. MEM/WB 0 mux 1 16 -20 Mem. Write Mem. Read 21 -25 zero ALU PC Instr mem Shift left 2 ALUSrc 26 -31 Reg. File opcode EX/MEM ALU ID/EX 1 mux 0 IF/ID Reg. Write 4 Add Multiplexers Need Controls Reg. Dst 0 -15 Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 4

Sign ext. 16 -20 for I-type lw 11 -15 for R-type 0 -5 ALU

Sign ext. 16 -20 for I-type lw 11 -15 for R-type 0 -5 ALU cont. Memto. Reg 1 mux 0 Mem. Write Mem. Read Branch PCSrc Data mem. MEM/WB 0 mux 1 16 -20 ALUSrc 21 -25 zero ALU PC Instr mem 1 mux 0 26 -31 Shift left 2 Reg. File opcode EX/MEM ALU ID/EX 1 mux 0 IF/ID Reg. Write 4 Add ALU Needs a Control ALUOp Reg. Dst 0 -15 Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 5

Compare with Single-Cycle Control signals are the same as those needed for a single-cycle

Compare with Single-Cycle Control signals are the same as those needed for a single-cycle datapath. Control signals are generated using the Opcode in the ID (instruction decode) cycle and then distributed to other cycles. Let us reexamine the implementation of the single-cycle control (slides 19 -21 of Lecture 5). Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 6

Hardwired CU: Single-Cycle Implemented by combinational logic. Datapath 6 opcode Control logic Control signals

Hardwired CU: Single-Cycle Implemented by combinational logic. Datapath 6 opcode Control logic Control signals funct. code 6 ALUOp Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 2 To ALU 3 ALU control 7

Instr. mem. 16 -20 Single-cycle datapath 0 -15 11 -15 Reg. Write 0 mux

Instr. mem. 16 -20 Single-cycle datapath 0 -15 11 -15 Reg. Write 0 mux 1 1 mux 0 ALU Memto. Reg ALUSrc zero Mem. Write Mem. Read Data mem. 0 mux 1 PC 1 mux 0 21 -25 ALU 26 -31 Branch Reg. File opcode CONTROL Add 4 Jump Shift left 2 1 mux 0 0 -25 Reg. Dst Sign ext. Shift left 2 ALUOp ALU Cont. 0 -5 Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 8

Single-Cycle Control Logic Jump ALUOp 0 ALUOp 1 Branch Mem. Write Mem. Read Reg.

Single-Cycle Control Logic Jump ALUOp 0 ALUOp 1 Branch Mem. Write Mem. Read Reg. Write Memto. Reg ALUSrc Instruction bits 31 31 29 28 27 26 Outputs Reg. Dst Instr. type Inputs Opcode R 0 0 0 1 0 0 lw 1 0 0 0 1 1 1 1 0 0 0 sw 1 0 1 1 X 0 0 1 0 0 beq J 0 0 0 1 0 0 X 0 0 0 1 0 X X X 1 Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 9

Single-Cycle Control Circuit Op 5 Op 4 Op 3 Op 2 Op 1 Op

Single-Cycle Control Circuit Op 5 Op 4 Op 3 Op 2 Op 1 Op 0 R lw sw beq J Reg. Dst ALUSrc Memto. Reg. Write Mem. Read Mem. Write Branch ALUOp 1 ALUOp 0 Jump Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 10

ALU Control Logic Instr. type Inputs From CU Outputs to ALU Funct. Code from

ALU Control Logic Instr. type Inputs From CU Outputs to ALU Funct. Code from IR (bits 0 -5) ALUOp 1 ALUOp 0 F 5 F 4 F 3 F 2 F 1 F 0 lw, sw B R Spr 2016, Mar 9. . . 0 0 1 1 1 0 1 X X X X X X 0 0 1 X X 0 0 1 1 0 X X 0 1 0 0 1 ELEC 5200 -001/6200 -001 Lecture 7 X X 0 0 0 1 0 3 -bit code Operation Add 010 110 Subtract 000 AND OR 001 slt 111 11

ALU Control Operation select from control From Control Circuit ALUOp 1 ALUOp 0 3

ALU Control Operation select from control From Control Circuit ALUOp 1 ALUOp 0 3 zero ALU F 3 result overflow F 2 F 1 F 0 Spr 2016, Mar 9. . . ALU control Operation select ALU function 000 001 010 111 AND OR Add Subtract Set on less than ELEC 5200 -001/6200 -001 Lecture 7 12

Returning to Pipelined Control Opcode input to control is supplied by the pipeline register

Returning to Pipelined Control Opcode input to control is supplied by the pipeline register IF/ID in the ID (instruction decode) cycle. Nine control signals are generated in the ID cycle, but none is used. They are saved in the pipeline register ID/EX. ALUSrc, Reg. Dst and ALUOp (2 bits) are used in the EX (execute) cycle. Remaining 5 control signals are saved in the pipeline register EX/MEM. Branch, Mem. Write and Mem. Read are used in the MEM (memory access) cycle. Remaining 2 control signals are saved in the pipeline register MEM/WB. Memto. Reg and Reg. Write are used in the WB (write back) cycle. Pipelined control is shown without Jump. Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 13

16 -20 for I-type lw 11 -15 for R-type 0 -5 1 mux 0

16 -20 for I-type lw 11 -15 for R-type 0 -5 1 mux 0 Data mem. Memto. Reg. Write Mem. Read MEM/WB 0 mux 1 ALU cont. PCSrc ALUSrc Sign ext. zero ALU 21 -25 16 -20 1 mux 0 PC Instr mem 1 mux 0 26 -31 Shift left 2 Reg. File opcode EX/MEM Branch ID/EX ALU IF/ID CONTROL 4 Add Placing Control in Pipelined Datapath ALUOp Reg. Dst 0 -15 Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 14

16 -20 for I-type lw 11 -15 for R-type 0 -5 1 mux 0

16 -20 for I-type lw 11 -15 for R-type 0 -5 1 mux 0 Data mem Memto. Reg. Write Mem. Read MEM/WB 0 mux 1 ALU cont. PCSrc ALUSrc Sign ext. zero ALU 21 -25 16 -20 1 mux 0 PC Instr mem 1 mux 0 26 -31 Shift left 2 Reg. File opcode EX/MEM Branch ID/EX ALU IF/ID CONTROL 4 Add Highlighted Pipelined Control ALUOp Reg. Dst 0 -15 Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 15

Single-Cycle Performance Assume 200 ps for memory access 100 ps for ALU operation 50

Single-Cycle Performance Assume 200 ps for memory access 100 ps for ALU operation 50 ps for register file read or write Cycle time set according to longest instruction: lw ≡ IF + ID/Reg. Read + ALU + MEM + Reg. Write = 200 + 50 +100 + 200 + 50 = 600 ps § Av. instruction execution time = clock cycle time = 600 ps Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 16

Multicycle Performance Consider SPECINT 2000* instruction mix: 25% lw 10% sw 11% branch 2%

Multicycle Performance Consider SPECINT 2000* instruction mix: 25% lw 10% sw 11% branch 2% jump 52% ALU instr. Av. CPI 5 cycles 4 cycles 3 cycles 4 cycles = 0. 25× 5 + 0. 10× 4 + 0. 11× 3 + 0. 02× 3 + 0. 52× 4 = 4. 12 Clock cycle time determined from longest operation (memory access) = 200 ps Av. instruction execution time = 4. 12× 200 = 824 ps *Set of benchmark programs used for performance evaluation, to be discussed in a later lecture. Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 17

Pipeline Performance Neglect initial latency (reasonable for long programs). One instruction completed every clock

Pipeline Performance Neglect initial latency (reasonable for long programs). One instruction completed every clock cycle unless delayed by hazard. Average CPI: lw sw ALU branch jump 2 cycles in 50% cases due to hazard 2 cycles in 25% cases due to hazard 1. 5 cycles 1 cycle 1. 25 cycles 2 cycles For SPECINT 2000 Av. CPI = 0. 25× 1. 5 + 0. 10× 1 + 0. 11× 1. 25 + 0. 02× 2. 0 + 0. 52× 1 = 1. 17 Clock cycle time (longest operation: memory access) = 200 ps Av. instruction execution time = 1. 17× 200 = 234 ps Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 18

Comparing Alternatives Type of Clock cycle Average datapath time CPI and control Single 600

Comparing Alternatives Type of Clock cycle Average datapath time CPI and control Single 600 ps 1. 00 cycle Multicycle 200 ps 4. 12 Pipelined Spr 2016, Mar 9. . . 200 ps 1. 17 ELEC 5200 -001/6200 -001 Lecture 7 Av. instruction execution time 600 ps 824 ps 234 ps 19

Other Controls for Pipeline Forwarding Stall Branch hazard and branch prediction Instruction flush Exceptions

Other Controls for Pipeline Forwarding Stall Branch hazard and branch prediction Instruction flush Exceptions Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 20

Forwarding Consider a data hazard: Spr 2016, Mar 9. . . CC 5 CC

Forwarding Consider a data hazard: Spr 2016, Mar 9. . . CC 5 CC 6 CC 7 CC 8 CC 3: ALU saves new WB: REG. WRITE data in EX/MEM, to be written to $2 in CC 5 MEM/WB MEM : DM EX/MEM EX: ALU CC 4 EX/MEM IF/ID ID: REG. FILE READ ID/EX CC 3 EX: ALU CC 3: and reads $2 to ID/EX, but the correct data is in EX/MEM IF: IM and $12, $5 CC 2 IF/ID sub $2, $1, $3 IF: IM CC 1 # computes result in CC 3, writes in $2 in CC 5 # reads $2 in CC 3, adds in CC 4 ID: REG. FILE READ ID/EX sub $2, $1, $3 and $12, $5 CC 4: forwarding allows execution of “and” with correct data ELEC 5200 -001/6200 -001 Lecture 7 21

Understanding Forwarding Let’s ask following questions: Q: A: Spr 2016, Mar 9. . .

Understanding Forwarding Let’s ask following questions: Q: A: Spr 2016, Mar 9. . . Why is there a hazard? Source register for the present instruction is the same as the destination register of the previous instruction. When is the source register data needed? In the execute cycle (CC 4). Is source register data available in CC 4? Yes – use forwarding. No – use stall. Where is the required data in CC 4? In the pipeline register EX/MEM as ALU output. ELEC 5200 -001/6200 -001 Lecture 7 22

Forwarding Hardware A forwarding unit is added to execute (ALU) cycle hardware. Functions of

Forwarding Hardware A forwarding unit is added to execute (ALU) cycle hardware. Functions of forwarding unit: – Hazard detection – Forward correct data to ALU Inputs to forwarding unit: – Source registers of the instruction in EX – Destination registers of instructions in DM and WB Outputs of forwarding unit: multiplexer controls to route correct data to the ALU. Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 23

Recall Register Definitions R-type instruction (add, sub, and, or, . . . ) opcode

Recall Register Definitions R-type instruction (add, sub, and, or, . . . ) opcode Rs Rt Rd shamt funct I-type instruction (beq, lw, sw, addi, . . . ) opcode Rs Rt constant_or_address J-type instruction (j, jal, jr) opcode a___d___r___e___s where Rs is the first source register Rt is the second source register Rd is the destination register Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 24

Forwarding Implemented EX/MEM ALU ID/EX PC+4 opcode Shift left 2 26 -31 21 -25

Forwarding Implemented EX/MEM ALU ID/EX PC+4 opcode Shift left 2 26 -31 21 -25 ALU MUX 1 mux 0 16 -20 21 -25 Rs 16 -20 Rt 1 mux 0 Sign ext. 16 -20 11 -15 Forwarding unit 0 -15 Spr 2016, Mar 9. . . Branch addr. zero Reg. File Addr mem MEM/WB ELEC 5200 -001/6200 -001 Lecture 7 Data mem. 0 mux 1 IF/ID Rd Rd 25

Stall Delay next instruction by sending nop through pipeline. Necessary when hazard not resolved

Stall Delay next instruction by sending nop through pipeline. Necessary when hazard not resolved by forwarding. CC 6 CC 4: new data in MEM/WB, to be written to $2 MEM/WB REG. FILE WRITE DM MEM/WB REG. FILE WRITE CC 5 EX/MEM ALU DM CC 4 EX/MEM ALU CC 3 IF/ID ID, REG. FILE READ ID/EX IM and $4, $2, $5 CC 2 IF/ID ID, REG. FILE READ ID/EX lw $2, 20($1) IM CC 1 CC 4: execution of and is impossible; correct data unavailable until end of CC 4 Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 26

Detecting Hazard Requiring Stall Consider instruction in IF/ID being decoded: If Previous instruction (lw)

Detecting Hazard Requiring Stall Consider instruction in IF/ID being decoded: If Previous instruction (lw) activated Mem. Read, and Instruction being decoded has a source register (Rs or Rt) same as the destination register (Rt for lw) of the previous instruction Then, stall the pipeline: Force all control outputs to 0 Prevent PC from changing Prevent IF/ID from changing Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 27

Stall Implementation 21 -25 Rs 16 -20 Rt 1 mux 0 Forwarding unit 0

Stall Implementation 21 -25 Rs 16 -20 Rt 1 mux 0 Forwarding unit 0 -15 Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 Data mem. 0 mux 1 Sign ext. 16 -20 11 -15 MEM/WB zero ALU 16 -20 Shift left 2 MUX 21 -25 Addr mem PC 0 MUX 26 -31 EX/MEM 1 mux 0 opcode ID/EX MUX Rs Mem. Read Reg. File IF/ID Hazard detection unit Control IF/IDWrite PCWrite Rt Rd Rd 28

next Spr 2016, Mar 9. . . next is fetched twice since PC was

next Spr 2016, Mar 9. . . next is fetched twice since PC was frozen WB: REG. WRITE EX/MEM MEM: DM MEM/WB WB: REG. WRITE EX: ALU EX/MEM MEM: DM MEM/WB WB: REG. WRITE IF/ID ID: REG. FILE READ ID/EX EX: ALU EX/MEM MEM: DM CC 5 ELEC 5200 -001/6200 -001 Lecture 7 CC 6 WB: REG. WRITE CC 4 MEM/WB EX: ALU IF: IM CC 3 ID/EX MEM: DM ID: REG. FILE READ EX/MEM ID/EX IF/ID State of IF/ID is frozen in CC 3 IF/ID EX: ALU CC 2 ID: REG. FILE READ IF/ID ID: REG. FILE READ ID/EX IF/ID CC 1 IF: IM and $4, $2, $5 IF: IM lw $2, 20($1) IF: IM Stall Execution with stall and forwarding: CC 7 CC 4: new data in MEM/WB, to be written to $2 bubble (nop) 29

Branch Hazard Consider heuristic – branch not taken. Continue fetching instructions in sequence following

Branch Hazard Consider heuristic – branch not taken. Continue fetching instructions in sequence following the branch instructions. If branch is taken (indicated by zero output of ALU): – Control generates branch signal in ID cycle. – branch activates PCSource signal in the MEM cycle to load PC with new branch address. – Three instructions in the pipeline must be flushed if branch is taken – can this penalty be reduced? Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 30

16 -20 for I-type lw 11 -15 for R-type 0 -5 1 mux 0

16 -20 for I-type lw 11 -15 for R-type 0 -5 1 mux 0 Data mem. Memto. Reg. Write Mem. Read MEM/WB 0 mux 1 ALU cont. PCSrc ALUSrc Sign ext. zero ALU 21 -25 16 -20 1 mux 0 PC Instr mem 1 mux 0 beq 26 -31 Shift left 2 Reg. File opcode EX/MEM Branch ID/EX ALU IF/ID CONTROL 4 Add Branch Hazard ALUOp Reg. Dst 0 -15 Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 31

Branch Not Taken Branch on condition to Z A B C D Z cycle

Branch Not Taken Branch on condition to Z A B C D Z cycle b Branch fetched cycle b+1 cycle b+2 cycle b+3 Branch decoded Branch decision PC keeps D (br. not taken) A fetched A decoded A executed B fetched cycle b+4 A continues B decoded B executed C fetched C decoded D fetched Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 32

Branch Taken Branch on condition to Z A B C D Z cycle b

Branch Taken Branch on condition to Z A B C D Z cycle b Branch fetched cycle b+1 cycle b+2 cycle b+3 Branch decoded Branch decision PC gets Z (br. taken) A fetched A decoded A executed B fetched Three-cycle penalty ELEC 5200 -001/6200 -001 Lecture 7 Nop B decoded Nop C fetched Nop Three instructions are flushed if branch is taken Spr 2016, Mar 9. . . cycle b+4 Z fetched 33

16 -20 for I-type lw 11 -15 for R-type 0 -5 1 mux 0

16 -20 for I-type lw 11 -15 for R-type 0 -5 1 mux 0 Data mem. Memto. Reg. Write Mem. Read MEM/WB 0 mux 1 ALU cont. PCSrc ALUSrc Sign ext. zero ALU 21 -25 16 -20 1 mux 0 PC Instr mem 1 mux 0 beq 26 -31 Shift left 2 Reg. File opcode EX/MEM Branch ID/EX Add IF/ID CONTROL 4 Add Branch Penalty Reduction ALUOp Reg. Dst 0 -15 Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 34

Branch Taken Branch to Z A B C D Z cycle b Branch fetched

Branch Taken Branch to Z A B C D Z cycle b Branch fetched cycle b+1 cycle b+2 Branch decision PC gets Z A fetched A flushed Z fetched cycle b+3 cycle b+4 Nop Z decoded Z executed One-cycle penalty One instructions is flushed if branch is taken Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 35

Pipeline Flush If branch is taken (as indicated by zero), then control does the

Pipeline Flush If branch is taken (as indicated by zero), then control does the following: – Change all control signals to 0, similar to the case of stall for data hazard, i. e. , insert bubble in the pipeline. – Generate a signal IF. Flush that changes the instruction in the pipeline register IF/ID to 0 (nop). Penalty of branch hazard is reduced by – Adding branch detection and address generation hardware in the decode cycle – one bubble needed – a next address generation logic in the decode stage writes PC+4, branch address, or jump address into PC. – Using branch prediction. – Unrolling loops. Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 36

Branch Prediction Useful for program loops. A one-bit prediction scheme: a one-bit buffer carries

Branch Prediction Useful for program loops. A one-bit prediction scheme: a one-bit buffer carries a “history bit” that tells what happened on the last branch instruction History bit = 1, branch was taken History bit = 0, branch was not taken Not taken Predict branch taken 1 taken Spr 2016, Mar 9. . . Predict branch not taken 0 ELEC 5200 -001/6200 -001 Lecture 7 Not taken 37

Branch Prediction Address of recent branch instructions Target addresses History bit(s) Low-order bits used

Branch Prediction Address of recent branch instructions Target addresses History bit(s) Low-order bits used as index PC+4 Next PC 0 1 = Prediction Logic PC Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 38

Branch Prediction for a Loop Execution of Instruction d a I=0 I=I+1 b X

Branch Prediction for a Loop Execution of Instruction d a I=0 I=I+1 b X = X + R(I) c N d I – 10 = 0? Y e Store X in memory Execu -tion seq. Old hist. bit Pred. I 1 0 e 2 1 3 Act. New hist. bit Predi ction 1 b 1 Bad b 2 b 1 Good 1 b 3 b 1 Good 4 1 b 4 b 1 Good 5 1 b 5 b 1 Good 6 1 b 6 b 1 Good 7 1 b 7 b 1 Good 8 1 b 8 b 1 Good 9 1 b 9 b 1 Good 10 1 b 10 e 0 Bad Next instr. h. bit = 0 branch not taken, h. bit = 1 branch taken. Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 39

Prediction Accuracy One-bit predictor: 2 errors out of 10 predictions Prediction accuracy = 80%

Prediction Accuracy One-bit predictor: 2 errors out of 10 predictions Prediction accuracy = 80% To improve prediction accuracy, use twobit predictor: A prediction must be wrong twice before it is changed Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 40

Two-Bit Prediction Buffer Implemented as a two-bit counter. Can improve correct prediction statistics. Not

Two-Bit Prediction Buffer Implemented as a two-bit counter. Can improve correct prediction statistics. Not taken Predict branch taken 11 Predict branch taken 10 taken Not taken Spr 2016, Mar 9. . . Predict branch not taken 00 taken ELEC 5200 -001/6200 -001 Lecture 7 Predict branch not taken 01 41

Branch Prediction for a Loop Execution of Instruction 4 1 I=0 2 I=I+1 X

Branch Prediction for a Loop Execution of Instruction 4 1 I=0 2 I=I+1 X = X + R(I) 3 N 4 I – 10 = 0? Y 5 Store X in memory Spr 2016, Mar 9. . . Execu -tion seq. Old Pred. Buf Pred. I 1 10 2 11 Good 2 11 2 2 2 11 Good 3 11 2 3 2 11 Good 4 11 2 4 2 11 Good 5 11 2 5 2 11 Good 6 11 2 6 2 11 Good 7 11 2 7 2 11 Good 8 11 2 8 2 11 Good 9 11 2 9 2 11 Good 10 11 2 10 5 10 Bad New Predi pred. ction Act. Buf Next instr. ELEC 5200 -001/6200 -001 Lecture 7 42

Exceptions A typical exception occurs when ALU produces an overflow signal. Control asserts following

Exceptions A typical exception occurs when ALU produces an overflow signal. Control asserts following actions on exception: – Change the PC address to 4000 0040 hex. This is the location of the exception routine. This is done by adding an additional input to the PC input multiplexer. – Overflow is detected in the EX cycle. Similar to data hazard and pipeline flush, Set IF/ID to 0 (nop). Generate ID. Flush and EX. Flush signals to set all control signals to 0 in ID/EX and EX/MEM registers. This also prevents the ALU result (presumed contaminated) from being written in the WB cycle. Spr 2016, Mar 9. . . ELEC 5200 -001/6200 -001 Lecture 7 43