COMP 206 Computer Architecture and Implementation Montek Singh

  • Slides: 24
Download presentation
COMP 206: Computer Architecture and Implementation Montek Singh Wed. , Sep 24, 2003 Topic:

COMP 206: Computer Architecture and Implementation Montek Singh Wed. , Sep 24, 2003 Topic: Pipelining -- Intermediate Concepts (Multicycle Operations; Exceptions) 1

Outline ã Multi-cycle operations l Floating-point operations l Structural and data hazards ã Interrupts,

Outline ã Multi-cycle operations l Floating-point operations l Structural and data hazards ã Interrupts, Faults and Exceptions l Precise exceptions l Complications in pipelines READING: Appendix A 2

Pipelining Multicycle Operations ã Assume five-stage pipeline ã Third stage (execution) has two functional

Pipelining Multicycle Operations ã Assume five-stage pipeline ã Third stage (execution) has two functional units E 1 and E 2 l Instruction goes through either E 1 or E 2, but not both l E 1 and E 2 are not pipelined l Stage delay of E 1 = 2 cycles l Stage delay of E 2 = 4 cycles l No buffering on inputs of E 1 and E 2 ã Stage delay of other stages = 1 cycle ã Consider an instruction sequence of five instructions l Instructions 1, 3, 5 need E 1 l Instructions 2, 4 need E 2 3

Space-Time Diagram: Multicycle Operations ã Out-of-order completion l 3 finishes before 2, and 5

Space-Time Diagram: Multicycle Operations ã Out-of-order completion l 3 finishes before 2, and 5 finishes before 4 ã Instructions may be delayed after entering the pipeline because of structural hazards l Instructions 2 and 4 both want to use E 2 unit at same time l Instruction 4 stalls in ID unit l This causes instruction 5 to stall in IF unit 4

Floating-Point Operations in MIPS IF ID WAW hazards possible; WAR hazards not possible Out-of-order

Floating-Point Operations in MIPS IF ID WAW hazards possible; WAR hazards not possible Out-of-order completion; has ramifications for exceptions EX M 1 M 2 M 3 M 4 M 5 M 6 M 7 A 1 A 2 A 3 A 4 Longer operation latency implies more frequent stalls for RAW hazards MEM DIV (25) Structural hazard: not fully pipelined Structural hazard: instructions have varying running times WB 5

Structural Hazard on WB Unit ã This is worst-case scenario: max steady-state number of

Structural Hazard on WB Unit ã This is worst-case scenario: max steady-state number of write ports is 1 l Don’t replicate resources; detect and serialize access as needed ã Early resolution l Track use of WB in ID stage (using shift register), stall instructions there Ø reservation register l Simplifies pipeline control; all stalls occur in ID Ø adds shift register and write-conflict logic ã Late resolution l Stall instructions at entry to MEM or WB stage l Complicates pipeline control (two stall locations) 6

WAW Hazards ã WAW hazard arises only when no instruction between ADD. D and

WAW Hazards ã WAW hazard arises only when no instruction between ADD. D and L. D uses result computed by ADD. D l Adding an instruction like “ADD. D F 8, F 2, F 4” before L. D would stall pipeline enough for RAW hazard to avoid WAW hazard l Can happen through a branch/trap (example in HP 3, Section A. 9) l Rare situation, but must still handle correctly ã Hazard resolution l Delay the issue of L. D until ADD. D enters MEM l Cancel write of ADD. D 7

RAW Hazards ã Longer delays of FP operations increases number of stalls in response

RAW Hazards ã Longer delays of FP operations increases number of stalls in response to RAW hazards ã Two methods for reducing stalls l Compiler could have moved instruction D between instructions M and A, which would allow D to complete earlier; or hardware could detect this possibility and issue instruction D out of order l ID stage is a bottleneck because instructions wait there for their operands to be available; could add buffers (reservation stations) to functional units and let instructions await their operands there 8

Responsibilities of ID (all stalls in ID) ã Three sets of checks l Structural

Responsibilities of ID (all stalls in ID) ã Three sets of checks l Structural hazards Ø Check for availability of FP unit Ø Ensure WB unit will be available when needed l RAW hazards Ø Stall current instruction until its source registers are not listed as pending registers in a pipeline register that will not be available when current instruction needs the result l WAW hazards Ø If any instruction in adder, divider, or multiplier has same register destination as current instruction, stall current instruction ã Hazards between FP and integer instructions l Integer and FP instructions use disjoint sets of registers, except for FP-integer register moves l FP load-stores can conflict with integer load-stores in MEM stage 9

MIPS R 4000 Floating-Point Pipeline Multiply Add Subtract Divide 10

MIPS R 4000 Floating-Point Pipeline Multiply Add Subtract Divide 10

Instruction Mixes in FP Pipeline: Adds Only Can’t initiate another add on cycle 2

Instruction Mixes in FP Pipeline: Adds Only Can’t initiate another add on cycle 2 Conflict here Add Subtract Can’t initiate another add on cycle 3 Conflict here • Forbidden latencies: 1 and 2 • Steady-state utilization (cycles 4 through 18) = (5*7)/(8*15) = 35/120 = 29. 17% • Total utilization (cycles 1 through 19) = (5+5*7+2)/(8*19) = 42/152 = 27. 63% 11

FP Pipeline: Multiplies Only Multiply • Collision vector: 1 indicates forbidden latency 0 indicates

FP Pipeline: Multiplies Only Multiply • Collision vector: 1 indicates forbidden latency 0 indicates allowed latency • Steady-state utilization (cycles 5 -24) = (5*10)/(8*20) = 50/160 = 31. 25% • Total utilization (cycles 1 -28) = (5+5*10+5)/(8*28) = 60/224 = 26. 79% 12

FP Pipeline: Adds and Multiplies Add Subtract • Note out-of-order completion • Steady-state utilization

FP Pipeline: Adds and Multiplies Add Subtract • Note out-of-order completion • Steady-state utilization (cycles 6 -21) = (4*17)/(8*16) = 68/128 = 53. 13% • Total utilization = (12+4*17+22)/(8*28) = 85/224 = 37. 95% Multiply 13

Interrupts, Faults, or Exceptions I/O Async request Coerce Betwee Resum d n instr. e

Interrupts, Faults, or Exceptions I/O Async request Coerce Betwee Resum d n instr. e OS call Sync User Betwee Resum request n instr. e Breakp Sync oint User Betwee Resum request n instr. e Power fail Coerce Within d instr. Async Termin ate ã Synchronous, coerced interrupts that occur within instructions and after which execution must resume are the hardest to implement ã See Figure A. 27 in HP 3 14

Precise Interrupts (Sequential Processor) ã When interrupt occurs, state of interrupted process is saved,

Precise Interrupts (Sequential Processor) ã When interrupt occurs, state of interrupted process is saved, including PC (= u), registers, and memory ã Interrupt is precise if the following three conditions hold l All instructions preceding u have been executed, and have modified the state correctly l All instructions following u are unexecuted, and have not modified the state l If the interrupt was caused by an instruction, it was caused by instruction u, which is either completely executed (overflow) or completely unexecuted (VM page fault) ã Precise interrupts are desirable if software is to fix up error that caused interrupt and execution has to be resumed l Easy for external interrupts, could be complex and costly for internal l Imperative for some interrupts (VM page faults, IEEE FP standard) 15

Problems on Sequential Processors ã Instruction modifies state early, then causes an interrupt l

Problems on Sequential Processors ã Instruction modifies state early, then causes an interrupt l State change must be undone l Example: First operand of VAX instruction uses autodecrement addressing mode, which writes a register. Trying to access second operand causes a page fault. Since instruction execution cannot be completed, we must restore the register written by autodecrement to its original value ã Long-running instructions l Not enough to be able to restore state, must make progress from interrupt to interrupt l Example: MVC on IBM 360 copies 256 bytes Ø No virtual memory, so interrupts not allowed to stop MVC l Example: MVC on IBM 370 copies 256 bytes Ø Has virtual memory, so first access all pages involved; after that, no interrupts allowed l Example: MVCL on IBM 370 copies up to 224 bytes Ø Has VM; two addresses and length are in registers Ø Registers saved and restored on interrupts (making progress) 16

Interrupts in MIPS Pipeline ã How do we stop and restart execution on an

Interrupts in MIPS Pipeline ã How do we stop and restart execution on an interrupt to keep it ã ã precise? What problems do delayed branches cause? What happens if multiple exceptions occur in the pipeline? Can exceptions occur out-of-order? What problems do multi-cycle instructions cause? 17

MIPS Integer Pipeline, Single Interrupt ã Force TRAP instruction in pipeline on next IF

MIPS Integer Pipeline, Single Interrupt ã Force TRAP instruction in pipeline on next IF ã Turn off all writes for faulting instruction and subsequent instructions ã After exception-handling routine in OS receives control, save PC of faulting instruction ã When exception has been handled, the RFE instruction reloads PC and restarts sequential instruction execution 18

Complications with Delayed Branches ã Suppose instruction 2 causes an exception (e. g. ,

Complications with Delayed Branches ã Suppose instruction 2 causes an exception (e. g. , a page fault) after the taken branch completes (determining that the branch outcome is true) l Instruction 2 cannot complete l Neither can instruction u ã On restart, we do not have sequential execution l We must remember two PC values: 2 and u 19

Complications with Multiple Exceptions ã At same cycle, LW takes a data page fault

Complications with Multiple Exceptions ã At same cycle, LW takes a data page fault and ADD takes an arithmetic exception ã On an unpipelined machine, LW’s exception would occur first l Handle the page fault l Restart execution l ADD will cause arithmetic exception to reoccur; handle it then 20

Complications with Out-of-order Exceptions ã LW takes data page fault, ADD takes instruction page

Complications with Out-of-order Exceptions ã LW takes data page fault, ADD takes instruction page fault ã Relative timing differs between unpipelined and pipelined machines l To maintain precise interrupts, we need to consider both when they occur and the instructions that caused them l Post exceptions in exception status vector, turn off state modifications, and check vector in WB unit 21

Complications with Multicycle Operations ã Instructions are independent (no hazards) and therefore issue ã

Complications with Multicycle Operations ã Instructions are independent (no hazards) and therefore issue ã ã immediately Differences in running times causes out-of-order termination DIVF throws arithmetic exception late in its execution At that point, ADDF and SUBF have both completed execution and destroyed one of their operands Can we maintain precise interrupts under these conditions? 22

FP Pipeline Exceptions: Solns. 1 and 2 ã Settle for imprecise interrupts (CRAY, with

FP Pipeline Exceptions: Solns. 1 and 2 ã Settle for imprecise interrupts (CRAY, with checkpointing) l Done on Alpha 21064 and 21164, IBM Power-1 and Power-2, MIPS R 8000 by supporting a fast imprecise mode and a slow precise mode l Not an option if you have to support virtual memory or IEEE floating point standard ã Software finishes certain instructions (SPARC) l Keep enough state around for trap handler to create a precise sequence for exception and finish work for some instruction stages l Only FP instructions cause this problem 23

FP Pipeline Exceptions: Solns. 3 and 4 ã Stalling (MIPS R 2000/3000, MIPS R

FP Pipeline Exceptions: Solns. 3 and 4 ã Stalling (MIPS R 2000/3000, MIPS R 4000, Pentium) l An instruction is allowed to issue only if it is certain that all the instructions before the issuing instruction will complete without causing an exception l To prevent excessive stalling, FP units must decide on possibility of exceptions early in pipeline ã General methods (Power. PC 620, MIPS R 10000) l Reorder buffer, history file, future file l An instruction is allowed to finalize its writes only when all previously issued instructions are complete l More naturally used in connection with ILP (Chapter 4) l Significant complexity (to be discussed later) 24