Chapter 13 Reduced Instruction Set Computers RISC Pipelining

Pipelining Review Pipelining: — Break instruction cycle into n phases (one stage per phase)

Data Hazards • Read after Write (RAW) – true dependency — A Hazard occurs

Control Hazards occur when a wrong fetch decision results in a new instruction fetch

Recall Key Features of RISC — Limited and simple instruction set — Memory access

Supporting Pipelining with Registers • Software contribution — Require compiler to allocate registers –

Register uses • Store local scalar variables in registers — Reduces memory accesses •

Using “Register Windows” Observations: • Typically only a few Local & Pass parameters •

Using “Register Windows” • Partition register set into: — Parameter registers (Passed Parameters) —

Overlapping “Register Windows” Picture of Calls & Returns:

Circular Buffer diagram of Overlapping “Register Windows”

Operation of Circular Buffer • When a call is made, a current window pointer

Global Variables How should we accommodate Global Variables? • Allocate by the compiler to

Registers v Cache – which is better? Large Register File Cache All local scalars

Referencing a Scalar Window Based Register File

Compiler Based Register Optimization Basis: • Assuming relatively small number of registers (16 -32)

Graph Coloring Algorithm for Register Assignment Given: • A graph of nodes and edges

RISC Features Again • Key features — Large number of general purpose registers (and

Memory to Memory vs Register to Memory Operations (RISC uses only Register to memory)

RISC Pipelining Basics • Define two phases of execution for register based instructions —

Effects of RISC Pipelining (2 stage since ED are effectively one stage) (Allows 2

Optimization of RISC Pipelining • Delayed branch — Leverages branch that does not take

Normal vs Delayed Branch (Text diagram is wrong)

Slides: 25

Download presentation

Chapter 13 Reduced Instruction Set Computers (RISC) Pipelining

Pipelining Review Pipelining: — Break instruction cycle into n phases (one stage per phase) – e. g. Fetch, Decode, Read. OPs, Execute 1, Execute 2, Write. Back — Fetch a new instruction each phase — Maximum speed gain is n — Hazards reduce the ability to achieve a gain of n – Types of Hazards + Resource o Hazard occurs when instruction needs a resource being used by another instruction + Data o RAW (hazard if read can occur before write has finished) o WAR (hazard if write can occur before read is finished) o WAW (hazard if writes occur in the unintended order) + Control o Hazard occurs when a wrong fetch decision at a branch results in an extra instruction fetch and a pipeline flush — Stalling can always “fix” a hazard

Data Hazards • Read after Write (RAW) – true dependency — A Hazard occurs if the Read occurs before the Write is complete – e. g. Reg 1 + Reg 2 {write occurs after execution} Reg 3 reg 1 – Reg 3 {read occurs before execution} • Write after Read (WAR) – anti-dependency — A Hazard occurs if the Write occurs before the Read happens – e. g. Reg M(ptr) {2 memory accesses – long read} M(pc) Reg {1 memory access – short write} {M(ptr) & M(pc) are same loc} • Write after Write (WAW) – output dependency — A Hazard occurs if the two Writes occur in the reverse order than intended – e. g. Reg A M(PTR) {2 memory accesses – long write} Reg A Reg B {0 memory accesses – short write}

Control Hazards occur when a wrong fetch decision results in a new instruction fetch and the pipeline being flushed Solutions include: — Multiple Pipeline streams — — — Prefetching the branch target Using a Loop Buffer Branch Prediction Delayed Branch Reordering of Instructions Multiple Copies of Registers (backups)

Recall Key Features of RISC — Limited and simple instruction set — Memory access instructions limited to memory <-> registers — Operations are register to register — Large number of general purpose registers (and use of compiler technology to optimize register use) — Emphasis on optimising the instruction pipeline (& memory management) — Hardwired for speed (no microcode)

Supporting Pipelining with Registers • Software contribution — Require compiler to allocate registers – Allocate based on most used variables in a given time + Requires sophisticated program analysis • Hardware contribution — Have more registers – Thus more variables will be in registers

Register uses • Store local scalar variables in registers — Reduces memory accesses • Every procedure (function) call changes locality (typically lots of procedure calls are encountered) — Parameters must be passed — — Partial context switch Results must be returned Variables from calling program must be restored Partial Context switch • Store Global Variables in Registers ?

Using “Register Windows” Observations: • Typically only a few Local & Pass parameters • Typically limited range of depth of calls Implications: If we Partition register set • We can use multiple small sets of registers per context • Let Calls switch to a new set of registers • Let Returns switch back to the previously used set of registers

Using “Register Windows” • Partition register set into: — Parameter registers (Passed Parameters) — Local registers (includes local variables) — Temporary registers (Passing Parameters) • Then: — Temporary registers from one set overlap parameter registers from the next • And: — This provides parameter passing without moving data (just move one pointer)

Overlapping “Register Windows” Picture of Calls & Returns:

Circular Buffer diagram of Overlapping “Register Windows”

Operation of Circular Buffer • When a call is made, a current window pointer is moved to show the currently active register window • If all windows are in use, an interrupt is generated and the oldest window (the one furthest back in the call nesting) is saved to memory • A saved window pointer indicates where the next saved windows should be restored

Global Variables How should we accommodate Global Variables? • Allocate by the compiler to memory ? • Have a static set of registers for global variables ? • Put them in cache ?

Registers v Cache – which is better? Large Register File Cache All local scalars Recently-used local scalars Individual variables Blocks of memory Compiler-assigned global variables Recently-used global variables Save/Restore based on procedure nesting depth Save/Restore based on cache replacement algorithm Register addressing Memory addressing

Referencing a Scalar Window Based Register File

Referencing a Scalar - Cache

Compiler Based Register Optimization Basis: • Assuming relatively small number of registers (16 -32) • Optimizing the use is given to the compiler • HLL programs have no explicit references to registers Then: • Assign symbolic, or virtual, register to each candidate variable • Map (unlimited) symbolic registers to (limited) real registers • Symbolic registers that are not used at the same time can share real registers • If you run out of real registers some variables will use memory

Graph Coloring Algorithm for Register Assignment Given: • A graph of nodes and edges • Nodes represent symbolic registers • Two symbolic registers that are used in the same program fragment are joined by an edge Then: • Assign a color to each node • Adjacent nodes must have different colors (connected by an edge) • Assign a minimum number of colors And then: • Try to color the graph with n colors, where n is the number of real registers • Nodes that can not be colored must be placed in memory

Graph Coloring Algorithm Example

RISC Features Again • Key features — Large number of general purpose registers (and use of compiler technology to optimize register use) — Limited and simple instruction set — Memory access instructions – memory <-> registers — Operations are register to register — Emphasis on optimising the instruction pipeline & memory management — Hardwired for speed (no microcode)

Memory to Memory vs Register to Memory Operations (RISC uses only Register to memory) Actually these numbers are bits, not bytes

RISC Pipelining Basics • Define two phases of execution for register based instructions — I: Instruction fetch — E: Execute – ALU operation with register input and output • For load and store there will be three — I: Instruction fetch — E: Execute – Calculate memory address — D: Memory – Register to memory or memory to register operation

Effects of RISC Pipelining (2 stage since ED are effectively one stage) (Allows 2 memory accesses per stage) (E 1 register read, E 2 execute & register write Particularly beneficial if E phase is long)

Optimization of RISC Pipelining • Delayed branch — Leverages branch that does not take effect until after execution of following instruction — The following instruction becomes the delay slot

Normal vs Delayed Branch (Text diagram is wrong)