Lecture 4 Instruction Set DesignPipelining Instruction set design

  • Slides: 19
Download presentation
Lecture 4: Instruction Set Design/Pipelining • Instruction set design (Sections 2. 9 -2. 12)

Lecture 4: Instruction Set Design/Pipelining • Instruction set design (Sections 2. 9 -2. 12) Ø control instructions Ø instruction encoding • Basic pipelining implementation (Section A. 1) 1

Control Transfer Instructions • Conditional branches (75% - Int) (82% - FP) • Jumps

Control Transfer Instructions • Conditional branches (75% - Int) (82% - FP) • Jumps (6% - Int) (10% - FP) • Procedure calls/returns (19% - Int) (8% - FP) • Design issues: Ø How do you specify the target address? Ø How do you specify the condition? Ø What happens on a procedure call/return? 2

Specifying the Target Address • PC-Relative: needs fewer bits to encode, independent of how/where

Specifying the Target Address • PC-Relative: needs fewer bits to encode, independent of how/where the compiled code is linked, used for branches and jumps – typically, the displacement needs 4 -8 bits • Register-indirect jumps: the address is not known at compile-time and has to be computed at run-time (note: can use any other addressing mode too) Ø procedure returns Ø case statements Ø virtual functions Ø function pointers Ø dynamically shared libraries 3

Specifying the Condition Name How condition is tested Advantages Disadvantages Condition 80 x 86,

Specifying the Condition Name How condition is tested Advantages Disadvantages Condition 80 x 86, Tests special bits Code ARM, set by ALU ops (CC) Power. PC, SPARC Sometimes condition is set for free CC is extra state. Instructions cannot be re-ordered Simple Register pressure One instruction instead of two Complex pipelines Condition Register Compare and branch Examples Alpha, MIPS Comparison sets register and this is tested PA-RISC, Comparison is VAX part of the branch 4

Procedure Call/Returns • Need to maintain a stack of return addresses (in memory or

Procedure Call/Returns • Need to maintain a stack of return addresses (in memory or in hardware) • Can copy and save all registers together or this can be done selectively • Who is responsible for saving registers? Ø Caller saving: correctness issues (global register has to be made available to other procedures), it only saves values that it cares about Ø Callee saving: it saves only as many registers as it needs (provided it doesn’t call other procedures) Ø A combination of both is typically employed 5

Instruction Set Encoding • Operations are easy to encode efficiently – the key issues

Instruction Set Encoding • Operations are easy to encode efficiently – the key issues are the number of operands and their addressing modes • Few addressing modes low complexity in decoding and pipelining, but greater code size • Fixed instruction lengths low complexity in decoding, but greater code size 6

Instruction Lengths 7

Instruction Lengths 7

Dealing with Code Size in RISC • Some hybrid versions allow for 16 and

Dealing with Code Size in RISC • Some hybrid versions allow for 16 and 32 -bit instructions (40% reduction in code size) – useful for embedded apps • IBM Power. PC stores 32 -bit instructions in compressed form in memory – more hardware complexity on an I-cache miss (need to translate from uncompressed to compressed in addition to virtual to physical) • Reducing the register file size can also reduce the instruction length 8

Compiler Optimizations • The phase-ordering problem…early phases have to assume that register allocation will

Compiler Optimizations • The phase-ordering problem…early phases have to assume that register allocation will find a register, else, optimizations such as common subexpression elimination may increase memory traffic 9

Register Allocation Issues • Graph coloring: determine when variables are live and avoid allocating

Register Allocation Issues • Graph coloring: determine when variables are live and avoid allocating the same register to variables that are simultaneously live • Stack variables (typically local to a procedure): easy to allocate registers for • Global data: can be accessed from multiple places (aliasing), difficult to allocate to registers • Heap data: dynamically created objects, accessed with pointers, difficult to allocate to registers because of aliasing 10

Case Study: The MIPS ISA • Load-store architecture • Focus on pipelining, decoding, and

Case Study: The MIPS ISA • Load-store architecture • Focus on pipelining, decoding, and compiler efficiency • In other words, RISC 11

Registers • 32 GPRs (general-purpose/integer registers) and 32 FPRs • 64 -bit registers; two

Registers • 32 GPRs (general-purpose/integer registers) and 32 FPRs • 64 -bit registers; two single-precision FP values can fit in one register • Register R 0 is hardwired to zero – with displacement addressing mode, we can also accomplish absolute addressing; other uses for R 0? 12

Instruction Format 13

Instruction Format 13

Control Instructions • Comparisons with zero can happen as part of the branch •

Control Instructions • Comparisons with zero can happen as part of the branch • Compares between registers are placed in other registers that are tested by branches • Jump-and-link places the return address in register R 31 14

Instruction Frequencies 15

Instruction Frequencies 15

Summary • In the 1960 s, stack architectures were considered a good match for

Summary • In the 1960 s, stack architectures were considered a good match for high-level languages • In the 1970 s, software costs were a concern – ISAs were enriched to make the compiler’s job easier – CISC • In the 1980 s, there was a push for simpler architectures – high clock speed and high parallelism – RISC • ISAs designed in 1980 are still around! 16

The Assembly Line Unpipelined Start and finish a job before moving to the next

The Assembly Line Unpipelined Start and finish a job before moving to the next Jobs Time A B A C B A Break the job into smaller stages C B C Pipelined 17

Performance Improvements? • Does it take longer to finish each individual job? • Does

Performance Improvements? • Does it take longer to finish each individual job? • Does it take shorter to finish a series of jobs? • What assumptions were made while answering these questions? • Is a 10 -stage pipeline better than a 5 -stage pipeline? 18

Title • Bullet 19

Title • Bullet 19