Chapter 13 Reduced Instruction Set Computers RISC Pipelining

  • Slides: 25
Download presentation
Chapter 13 Reduced Instruction Set Computers (RISC) Pipelining

Chapter 13 Reduced Instruction Set Computers (RISC) Pipelining

Pipelining Review Pipelining: — Break instruction cycle into n phases (one stage per phase)

Pipelining Review Pipelining: — Break instruction cycle into n phases (one stage per phase) – e. g. Fetch, Decode, Read. OPs, Execute 1, Execute 2, Write. Back — Fetch a new instruction each phase — Maximum speed gain is n — Hazards reduce the ability to achieve a gain of n – Types of Hazards + Resource o Hazard occurs when instruction needs a resource being used by another instruction + Data o RAW (hazard if read can occur before write has finished) o WAR (hazard if write can occur before read is finished) o WAW (hazard if writes occur in the unintended order) + Control o Hazard occurs when a wrong fetch decision at a branch results in an extra instruction fetch and a pipeline flush — Stalling can always “fix” a hazard

Data Hazards • Read after Write (RAW) – true dependency — A Hazard occurs

Data Hazards • Read after Write (RAW) – true dependency — A Hazard occurs if the Read occurs before the Write is complete – e. g. Reg 1 + Reg 2 {write occurs after execution} Reg 3 reg 1 – Reg 3 {read occurs before execution} • Write after Read (WAR) – anti-dependency — A Hazard occurs if the Write occurs before the Read happens – e. g. Reg M(ptr) {2 memory accesses – long read} M(pc) Reg {1 memory access – short write} {M(ptr) & M(pc) are same loc} • Write after Write (WAW) – output dependency — A Hazard occurs if the two Writes occur in the reverse order than intended – e. g. Reg A M(PTR) {2 memory accesses – long write} Reg A Reg B {0 memory accesses – short write}

Control Hazards occur when a wrong fetch decision results in a new instruction fetch

Control Hazards occur when a wrong fetch decision results in a new instruction fetch and the pipeline being flushed Solutions include: — Multiple Pipeline streams — — — Prefetching the branch target Using a Loop Buffer Branch Prediction Delayed Branch Reordering of Instructions Multiple Copies of Registers (backups)

Recall Key Features of RISC — Limited and simple instruction set — Memory access

Recall Key Features of RISC — Limited and simple instruction set — Memory access instructions limited to memory <-> registers — Operations are register to register — Large number of general purpose registers (and use of compiler technology to optimize register use) — Emphasis on optimising the instruction pipeline (& memory management) — Hardwired for speed (no microcode)

Supporting Pipelining with Registers • Software contribution — Require compiler to allocate registers –

Supporting Pipelining with Registers • Software contribution — Require compiler to allocate registers – Allocate based on most used variables in a given time + Requires sophisticated program analysis • Hardware contribution — Have more registers – Thus more variables will be in registers

Register uses • Store local scalar variables in registers — Reduces memory accesses •

Register uses • Store local scalar variables in registers — Reduces memory accesses • Every procedure (function) call changes locality (typically lots of procedure calls are encountered) — Parameters must be passed — — Partial context switch Results must be returned Variables from calling program must be restored Partial Context switch • Store Global Variables in Registers ?

Using “Register Windows” Observations: • Typically only a few Local & Pass parameters •

Using “Register Windows” Observations: • Typically only a few Local & Pass parameters • Typically limited range of depth of calls Implications: If we Partition register set • We can use multiple small sets of registers per context • Let Calls switch to a new set of registers • Let Returns switch back to the previously used set of registers

Using “Register Windows” • Partition register set into: — Parameter registers (Passed Parameters) —

Using “Register Windows” • Partition register set into: — Parameter registers (Passed Parameters) — Local registers (includes local variables) — Temporary registers (Passing Parameters) • Then: — Temporary registers from one set overlap parameter registers from the next • And: — This provides parameter passing without moving data (just move one pointer)

Overlapping “Register Windows” Picture of Calls & Returns:

Overlapping “Register Windows” Picture of Calls & Returns:

Circular Buffer diagram of Overlapping “Register Windows”

Circular Buffer diagram of Overlapping “Register Windows”

Operation of Circular Buffer • When a call is made, a current window pointer

Operation of Circular Buffer • When a call is made, a current window pointer is moved to show the currently active register window • If all windows are in use, an interrupt is generated and the oldest window (the one furthest back in the call nesting) is saved to memory • A saved window pointer indicates where the next saved windows should be restored

Global Variables How should we accommodate Global Variables? • Allocate by the compiler to

Global Variables How should we accommodate Global Variables? • Allocate by the compiler to memory ? • Have a static set of registers for global variables ? • Put them in cache ?

Registers v Cache – which is better? Large Register File Cache All local scalars

Registers v Cache – which is better? Large Register File Cache All local scalars Recently-used local scalars Individual variables Blocks of memory Compiler-assigned global variables Recently-used global variables Save/Restore based on procedure nesting depth Save/Restore based on cache replacement algorithm Register addressing Memory addressing

Referencing a Scalar Window Based Register File

Referencing a Scalar Window Based Register File

Referencing a Scalar - Cache

Referencing a Scalar - Cache

Compiler Based Register Optimization Basis: • Assuming relatively small number of registers (16 -32)

Compiler Based Register Optimization Basis: • Assuming relatively small number of registers (16 -32) • Optimizing the use is given to the compiler • HLL programs have no explicit references to registers Then: • Assign symbolic, or virtual, register to each candidate variable • Map (unlimited) symbolic registers to (limited) real registers • Symbolic registers that are not used at the same time can share real registers • If you run out of real registers some variables will use memory

Graph Coloring Algorithm for Register Assignment Given: • A graph of nodes and edges

Graph Coloring Algorithm for Register Assignment Given: • A graph of nodes and edges • Nodes represent symbolic registers • Two symbolic registers that are used in the same program fragment are joined by an edge Then: • Assign a color to each node • Adjacent nodes must have different colors (connected by an edge) • Assign a minimum number of colors And then: • Try to color the graph with n colors, where n is the number of real registers • Nodes that can not be colored must be placed in memory

Graph Coloring Algorithm Example

Graph Coloring Algorithm Example

RISC Features Again • Key features — Large number of general purpose registers (and

RISC Features Again • Key features — Large number of general purpose registers (and use of compiler technology to optimize register use) — Limited and simple instruction set — Memory access instructions – memory <-> registers — Operations are register to register — Emphasis on optimising the instruction pipeline & memory management — Hardwired for speed (no microcode)

Memory to Memory vs Register to Memory Operations (RISC uses only Register to memory)

Memory to Memory vs Register to Memory Operations (RISC uses only Register to memory) Actually these numbers are bits, not bytes

RISC Pipelining Basics • Define two phases of execution for register based instructions —

RISC Pipelining Basics • Define two phases of execution for register based instructions — I: Instruction fetch — E: Execute – ALU operation with register input and output • For load and store there will be three — I: Instruction fetch — E: Execute – Calculate memory address — D: Memory – Register to memory or memory to register operation

Effects of RISC Pipelining (2 stage since ED are effectively one stage) (Allows 2

Effects of RISC Pipelining (2 stage since ED are effectively one stage) (Allows 2 memory accesses per stage) (E 1 register read, E 2 execute & register write Particularly beneficial if E phase is long)

Optimization of RISC Pipelining • Delayed branch — Leverages branch that does not take

Optimization of RISC Pipelining • Delayed branch — Leverages branch that does not take effect until after execution of following instruction — The following instruction becomes the delay slot

Normal vs Delayed Branch (Text diagram is wrong)

Normal vs Delayed Branch (Text diagram is wrong)