Appendix A Instruction Set Principles and Examples Classifying

  • Slides: 40
Download presentation
Appendix A: Instruction Set Principles and Examples • • Classifying Instruction Set Architecture Memory

Appendix A: Instruction Set Principles and Examples • • Classifying Instruction Set Architecture Memory addressing mode Operations in the instruction set Control flow instructions Instruction format Structure of recent compilers MMX technology MIPS instruction set 1

Introduction • An instruction set architecture is a specification of a standardized programmer-visible interface

Introduction • An instruction set architecture is a specification of a standardized programmer-visible interface to hardware, comprised of: – A set of instructions (really, instruction types) • With associated argument fields, assembly syntax, and machine encoding. – A set of named storage locations • Registers, memory, … Programmer-accessible caches? – A set of addressing modes (ways to name locations) – Often an I/O interface (usually memory-mapped) 2

Classifying Architectures • One important classification scheme is by the type of addressing modes

Classifying Architectures • One important classification scheme is by the type of addressing modes supported. – Stack architecture: Operands implicitly on top of a stack. (Early machines. ) – Accumulator architecture: One operand is implicitly an accumulator (a special register). (Early machs. ) – General-purpose register architecture: Operands may be any of a large (typically 10 s-100 s) # of registers. • Register-memory architectures: One op may be memory. • Load-store architectures: All ops are registers, except in special load and store instructions. 3

Four Architecture Classes Assembly for C: =A+B: 4

Four Architecture Classes Assembly for C: =A+B: 4

Number of Operands A further classification is by the maximum number of operands, and

Number of Operands A further classification is by the maximum number of operands, and # that can be memory: e. g. , – 2 -operand (e. g. a += b) • src/dest(reg), src(reg) • src/dest(reg), src(mem) IBM 360, x 86, 68 k • src/dest(mem), src(mem) VAX – 3 -operand (e. g. a = b+c) • dest(reg), src 1(reg), src 2(reg) MIPS, PPC, SPARC, &c. • dest(reg), src 1(reg), src 2(mem) IBM 370 • dest(mem), src 1(mem), src 2(mem) IBM 370, VAX 5

Further Classification # of Memory Operands # of Operands Type of Architecture Examples 0

Further Classification # of Memory Operands # of Operands Type of Architecture Examples 0 3 Register-register Alpha, ARM, MIPS, Power. PC, Sparc, etc 1 2 Register-memory IBM 360/370, Intel 80 x 86, Motorola 68000, TI C 54 x 2 2 Memory-memory VAX 3 3 Memory-memory VAX 6

Comparison of Architecture Types Type Instruction Encoding Code Generation # of Clock Cycles/Inst. Code

Comparison of Architecture Types Type Instruction Encoding Code Generation # of Clock Cycles/Inst. Code Size Registerregister Fixed-length Simple Similar Large Registermemory Easy Moderate Different Medium Memorymemory Variablelength Complex Large variation Compact Advantages Disadvantages 7

Endians & Alignment 7 6 5 4 3 2 1 Increasing byte address 0

Endians & Alignment 7 6 5 4 3 2 1 Increasing byte address 0 4 Word-aligned word at byte address 4. 2 Halfword-aligned word at byte address 2. 1 Byte-aligned (non-aligned) word, at byte address 1. word 3 (MSB) 2 1 0 (LSB) word 0 (LSB) 1 2 3 (MSB) Little-endian byte order (least-significant byte “first”). Big-endian byte order (most-significant byte “first”). 8

Addressing Modes • In example assembly syntax in middle column, ( ) indicates memory

Addressing Modes • In example assembly syntax in middle column, ( ) indicates memory access. (A typical syntax. ) • In RTL syntax on right, [ ] denotes accessing a member of an array, Register or Memory. 9

Addressing Mode Usage 3 SPEC 89 on VAX 10

Addressing Mode Usage 3 SPEC 89 on VAX 10

Displacement Distribution SPEC CPU 2000 on Alpha Sign bit is not counted 11

Displacement Distribution SPEC CPU 2000 on Alpha Sign bit is not counted 11

Use of Immediate Operand 12

Use of Immediate Operand 12

Distribution of Immediate SPEC CPU 2000 on Alpha Sign bit is not counted 13

Distribution of Immediate SPEC CPU 2000 on Alpha Sign bit is not counted 13

Instruction Type 14

Instruction Type 14

Instruction Distribution (5 SPECint 92) 15

Instruction Distribution (5 SPECint 92) 15

Control Flow Instructions • Four basic types: – – (Conditional) branches (Unconditional) jumps Procedure

Control Flow Instructions • Four basic types: – – (Conditional) branches (Unconditional) jumps Procedure calls Procedure returns • Control flow addressing modes: – Often PC-relative (PC + displacement). Relocatable. – Also useful: register indirect jumps (reg. has addr. ). Uses: • Procedure returns • Case / switch statements • Virtual functions / methods (abstract class method calls) • High-order functions / function pointers • Dynamically shared libraries 16

Conditional Branch Options • Condition Code (CC) Register – E. g. : X 86,

Conditional Branch Options • Condition Code (CC) Register – E. g. : X 86, ARM, PPC, SPARC, … – ALU ops set condition code flags in the CCR – Branch just checks the flag • Condition register – E. g. : Alpha, MIPS – Comparison instruction puts result in a GPR – Branch instruction checks the register • Compare & Branch – E. g. : PA-RISC, VAX – Compare & branch in 1 instruction. 17

Procedure Calling Conventions • Two major calling conventions: – Caller saves: • Before the

Procedure Calling Conventions • Two major calling conventions: – Caller saves: • Before the call, procedure caller saves registers that will be needed later, even if callee did not use them – Callee saves: • Inside the call, called procedure saves registers that it will overwrite • Can be more efficient if many small procedures • Many architectures use a combination of schemes: – E. g. , MIPS: Some registers caller-saves, some callee-saves 18

Three Classes of Control Instructions SPEC CPU 2000 on Alpha 19

Three Classes of Control Instructions SPEC CPU 2000 on Alpha 19

Branch Distance Distribution SPEC CPU 2000 on Alpha 20

Branch Distance Distribution SPEC CPU 2000 on Alpha 20

Branch Comparison Types SPEC CPU 2000 on Alpha 21

Branch Comparison Types SPEC CPU 2000 on Alpha 21

Encoding An Instruction Set 22

Encoding An Instruction Set 22

Compiler Structure 23

Compiler Structure 23

Compiler Optimizations 24

Compiler Optimizations 24

Compiler Optimizations (cont. ) 25

Compiler Optimizations (cont. ) 25

Effect of Optimization 26

Effect of Optimization 26

Architectural Support for Compiler • Provide regularity – Orthogonality (independence) of: • Registers used

Architectural Support for Compiler • Provide regularity – Orthogonality (independence) of: • Registers used • Addressing modes • Operations used • Provide primitives, not solutions – Don’t directly support specific kernels or languages • Simplify trade-offs among alternatives – Make easy to tell fastest code sequence @ compile time • Don’t interpret values known at compile time – Allow compile-time constants to be provided in immediates 27

MIPS Architecture • RISC, load-store architecture, simple address • 32 -bit instructions, fixed format

MIPS Architecture • RISC, load-store architecture, simple address • 32 -bit instructions, fixed format • 32 64 -bit GPRs, R 0 -R 31. – Really, only 31 – R 0 is just a constant 0. • 32 64 -bit FPRs, F 0 -F 31 – Can hold 32 -bit floats also (with other ½ unused). – “SIMD” extensions operate on more floats in 1 FPR • A few special registers – Floating-point status register • Load/store 8 -, 16 -, 32 -, 64 -bit integers – All sign-extended to fill 64 -bit GPR – Also 32 - bit floats/doubles 28

MIPS Addressing Modes • Register (arith. /logical ops only) • Immediate (arith. /logical only)

MIPS Addressing Modes • Register (arith. /logical ops only) • Immediate (arith. /logical only) & Displacement (load/stores only) – 16 -bit immediate / offset field – Register indirect: use 0 as displacement offset – Direct (absolute): use R 0 as displacement base • Byte-addressed memory, 64 -bit address • Software-settable big-endian/little-endian flag • Alignment required 29

Inst. Format: I-type Instructions 30

Inst. Format: I-type Instructions 30

Inst. Format: R-type Instructions 31

Inst. Format: R-type Instructions 31

Inst. Format: J-type Instructions 32

Inst. Format: J-type Instructions 32

MIPS Instruction Set • Go through Figures A. 23 -A. 25 in textbook, –

MIPS Instruction Set • Go through Figures A. 23 -A. 25 in textbook, – Loads and stores in MIPS, Figure A. 23 – Arithmetic and logical instructions, Figure A. 24 – Control flow instructions, Figure A. 25 • More on Appendix A: Figure A. 26 – A. 30. 33

MIPS Dynamic Instr. Frequencies Integer benchmarks FP benchmarks 34

MIPS Dynamic Instr. Frequencies Integer benchmarks FP benchmarks 34

Multimedia Extensions • Graphics displays work on pixels: 8, 16, 32 bits per pixel

Multimedia Extensions • Graphics displays work on pixels: 8, 16, 32 bits per pixel to define pixel colors • Audio samples of 16, 24 bits • Exploit subword parallelism using existing 64/128 bit registers and ALUs • Intel i 860, first (1989) to operate on 8 8 -bit, 4 16 bit, or 2 32 -bit operands on 64 -bit ALUs • Almost all microprocessors have media extensions • Intel use SIMD to describe MMX extensions, only limit in the width of registers, e. g. 64 bits 35

Intel MMX Technology • MMX registers: 64 -bit MM 0 to MM 7 shared

Intel MMX Technology • MMX registers: 64 -bit MM 0 to MM 7 shared with FP registers R 0, R 7, has side-effect on FPU state, only use for operands • Four MMX data types: MMX Register 63 0 Packed Byte 8 x 8 Packed Word 16 x 4 Packed Doubleword 32 x 2 Quadword 64 • 64 -bit / 32 -bit access mode from memory to MMX registers • SIMD techniques for arithmetic/logical operations on bytes, words, doublewords from/to 64 -bit registers 36

MMX Instruction Set • MMX instruction set consists of 57 instructions, group into 7

MMX Instruction Set • MMX instruction set consists of 57 instructions, group into 7 categories: (See Intel Architecture Software Developer’s Manual Vol. 1 Basic Architecture (order#: 143190); Vol. 2 Instruction Set Ref. (order#: 243191); Vol. 3 System Programming Guide (order#: 243192) at: http: //developer. intel. com/design/archives/proces sors/mmx/index. htm – – – – Arithmetic instructions Data transfer instructions Comparison instructions Conversion instructions Logical instructions Shift instructions Empty MMX state instruction (EMMS) 37

SIMD – Parallel Operations • Conventional scalar operations vs. SIMD - PADDW A 4

SIMD – Parallel Operations • Conventional scalar operations vs. SIMD - PADDW A 4 B 4 A 3 B 3 A 2 B 2 A 1 B 1 A 2 A 3 + A 4+B 4 A 3+B 3 A 4 B 1 B 2 B 3 B 4 + A 1+B 1 A 2+B 2 A 3+B 3 A 4+B 4 A 2+B 2 A 1+B 1 • 4 -time faster, but require to move data in/out of the MMX registers 38

Packed Multiply Add • 4 multiplications and 2 adds in one PMADDWD instruction A

Packed Multiply Add • 4 multiplications and 2 adds in one PMADDWD instruction A 3 A 2 A 1 B 3 B 2 B 1 x A 3 x. B 3 x A 2 x. B 2 A 3 x. B 3 + A 2 x. B 2 x A 0 x B 0 A 1 x. B 1 + A 0+B 0 Source 1 Source 2 A 0 x. B 0 Intermediate Destination (Result DW) • PMADDWD produces 2 DW (32 bits) results – Useful inst. for many media and signal applications – Need arrange and pack input / output results to/from MMX registers, add programming complexity and performance overhead 39

Data Move Instructions • MOVD m 32, mm 63 xx xx 0 xx xx

Data Move Instructions • MOVD m 32, mm 63 xx xx 0 xx xx A 3 A 2 A 1 A 0 15 mm 0 A 3 A 2 A 1 A 0 Memory m 32 • MOVD mm, r 32 63 00 00 31 A 3 A 2 0 00 00 A 3 A 2 A 1 A 0 0 A 1 A 0 Move data between MMX registers and memory or regular register for SIMD instructions 40