CS 704 Advanced Computer Architecture Lecture 6 Instruction

  • Slides: 36
Download presentation
CS 704 Advanced Computer Architecture Lecture 6 Instruction Set Principles (ISA Performance Analysis, Fallacies

CS 704 Advanced Computer Architecture Lecture 6 Instruction Set Principles (ISA Performance Analysis, Fallacies and Pitfalls) Prof. Dr. M. Ashraf Chughtai MAC/VU-Advanced Computer Architecture Lecture 6 - Instruction Set Principles (3) 1

Today’s Topics Recap Lecture 5 DSP Media Operations ISA Performance Putting it all Together

Today’s Topics Recap Lecture 5 DSP Media Operations ISA Performance Putting it all Together Summary MAC/VU-Advanced Computer Architecture Lecture 6 - Instruction Set Principles (3) 2

Recap: Lecture 5 Instruction encoding - Essential elements of computer instruction - - word:

Recap: Lecture 5 Instruction encoding - Essential elements of computer instruction - - word: - Type of operands - Places of source and destinations - Place of next instruction Instruction word length - Variable Length - Fixed length - Hybrid – variable fixed Categories of Hybrid length 4, 3, 2, 1 and 0 address format MAC/VU-Advanced Computer Architecture Lecture 5 - Instruction Set Principles. . Cont'd 3

Recap: Lecture 5 …. . Cont’d - Comparison of hybrid instruction word format Minimum

Recap: Lecture 5 …. . Cont’d - Comparison of hybrid instruction word format Minimum number of memory bytes are required in case of 1 address (accumulator) format Maximum for 4 -address format - MIPS Instruction word format - RISC and MIPS a fixed length, 64 -bit LOAD/STORE Architecture - It supports: - 8 -, 16 -, 32 - and 64 -bit operand - R-type, I-type and J-type - Arithmetic and logic operation - data transfer operations - Control flow operations MAC/VU-Advanced Computer Architecture Lecture 6 - Instruction Set Principles (3) 4

Media and Signal Processing Operands § § § Graphic applications deal with 2 D

Media and Signal Processing Operands § § § Graphic applications deal with 2 D and 3 D images 3 D data type is called vertex Vertex structure has 4 -components - x- coordinate y- coordinate z- coordinate w-coordinate § The three vertices specify a graphic primitive, such as a triangle; and the fourth to help with color and hidden surfaces § Vertex values are usually 32 -bit Floating point values § DSP adds fixed point to the data types – binary point just to the right of the sign-bit MAC/VU-Advanced Computer Architecture Lecture 6 - Instruction Set Principles (3) 5

3 D Data Type § A triangle is visible when it is depicted as

3 D Data Type § A triangle is visible when it is depicted as filled with pixels § Pixels are typically 32 -bits, usually consisting of four 8 -bit channels - R -red G-green B-blue A: Transparency of pixel when it is depicted MAC/VU-Advanced Computer Architecture Lecture 6 - Instruction Set Principles (3) 6

Media and Signal Processing Operations § Data for multimedia operations is usually much narrower

Media and Signal Processing Operations § Data for multimedia operations is usually much narrower than the 64 -bit data word of modern processors § Thus, 64 -bit may be partitioned in to four 16 -bit data values so that the 64 -bit ALU to perform four 16 -bit operations (say add operation) in a single clock cycle MAC/VU-Advanced Computer Architecture Lecture 6 - Instruction Set Principles (3) 7

Media and Signal Processing Operations § Here, extra hardware is added to prevent the

Media and Signal Processing Operations § Here, extra hardware is added to prevent the ‘CARRY’ between the four 16 -bit partitions of 64 -bit ALU § These operations are called Single- Instruction Multiple-Data (SIMD) or vector operations MAC/VU-Advanced Computer Architecture Lecture 6 - Instruction Set Principles (3) 8

Multimedia Operations § Most graphic multimedia applications use 32 -bit floating point operations allowing

Multimedia Operations § Most graphic multimedia applications use 32 -bit floating point operations allowing a single instruction to launch two 32 -bit operations on operands found side-by-side in double precision register § The table shown here summarizes SIMD instructions found in recent computers MAC/VU-Advanced Computer Architecture Lecture 6 - Instruction Set Principles (3) 9

Summary of SIMD instructions in recent computers Insert Table given in Fig. 2. 17

Summary of SIMD instructions in recent computers Insert Table given in Fig. 2. 17 from page 110 MAC/VU-Advanced Computer Architecture Lecture 6 - Instruction Set Principles (3) 10

Multimedia Operations § You may note that there is very little common across the

Multimedia Operations § You may note that there is very little common across the five architectures § All are fixed-width operation , performing multiple narrow operations on either 64 -bit or 128 -bit ALU § The narrow operation are shown as B-byte, H-half word W-word and 8 B double word MAC/VU-Advanced Computer Architecture Lecture 6 - Instruction Set Principles (3) 11

Digital Signals Processing Issues § Saturating Add/Subtract Too Large Result and Overflow § Result

Digital Signals Processing Issues § Saturating Add/Subtract Too Large Result and Overflow § Result Rounding Choose from IEEE 754 mode algorithms § Multiply Accumulate Vector and Matrix dot product operations MAC/VU-Advanced Computer Architecture Lecture 6 - Instruction Set Principles (3) 12

DSP Operations § Saturating Add/Sub DSP cannot ignore results of overflow otherwise it may

DSP Operations § Saturating Add/Sub DSP cannot ignore results of overflow otherwise it may miss an event, therefore, it uses saturating arithmetic. - Here, if the result is too large to be presented it is set to the largest representable number, based on the sign of the number - MAC/VU-Advanced Computer Architecture Lecture 6 - Instruction Set Principles (3) 13

DSP Operations § Result Rounding IEEE 754 has several algorithms to round the wider

DSP Operations § Result Rounding IEEE 754 has several algorithms to round the wider accumulator into narrower one, DSPs select the appropriate mode to round the result § Multiply-Accumulate (MAC) MAC operations are the key to dot product operations of vector and matrix multiply which need to accumulate a series of product MAC/VU-Advanced Computer Architecture Lecture 6 - Instruction Set Principles (3) 14

ISA Performance § Role of Compiler - The interaction of compiler and high-level languages

ISA Performance § Role of Compiler - The interaction of compiler and high-level languages significantly effects how program uses an ISA - Optimizations performed by the compilers can be classified as follows: MAC/VU-Advanced Computer Architecture Lecture 6 - Instruction Set Principles (3) 15

Classification of Performance optimization - High-level optimization: is often done on the source with

Classification of Performance optimization - High-level optimization: is often done on the source with the output fed to the later optimization passes. Local Optimization: is done within a straightline code fragment (basic block) Global Optimization: extends the optimization across branches Register Allocation: associate registers with operands Processor-dependent optimization: using the specific architecture MAC/VU-Advanced Computer Architecture Lecture 6 - Instruction Set Principles (3) 16

Impact of Compiler Technology - Interaction of compiler and high-level language affects how a

Impact of Compiler Technology - Interaction of compiler and high-level language affects how a program uses an ISA - Here, two important questions are: 1: 2: - How are variables allocated? How many registers are needed to allocate variables appropriately? These questions are addressed by using three areas in which high-level language allocates data MAC/VU-Advanced Computer Architecture Lecture 6 - Instruction Set Principles (3) 17

Three areas of data allocation 1: Local Variable area – Stack - It is

Three areas of data allocation 1: Local Variable area – Stack - It is used to allocate local variable it grows or shrinks on procedure call or return Objects on stack are primarily scalar – single variable rather than arrays and are addressed by stack-pointer Register allocation is much more effective for stack-allocated objects MAC/VU-Advanced Computer Architecture Lecture 6 - Instruction Set Principles (3) 18

Three areas of data allocation … Cont’d: 2: Global Data Area - It is

Three areas of data allocation … Cont’d: 2: Global Data Area - It is used to allocate statically declared objects such as global variables and constants These objects are mostly arrays and other aggregate data structures Register allocation is relatively less effective for global variables Global variables are aliased – there are multiple way to address so make it illegal to put on registers MAC/VU-Advanced Computer Architecture Lecture 6 - Instruction Set Principles (3) 19

Three areas of data allocation … Cont’d: 3: Dynamic Object Allocation: Heap - It

Three areas of data allocation … Cont’d: 3: Dynamic Object Allocation: Heap - It is used to allocate the objects that do not adhere to stack - The objects in heap are accessed with pointer but are not scalars - Most heap variable are aliased so register allocation is almost impossible for heap MAC/VU-Advanced Computer Architecture Lecture 6 - Instruction Set Principles (3) 20

ISA Performance … Cont’d § MIPS Floating-point Operations - - The instructions manipulate the

ISA Performance … Cont’d § MIPS Floating-point Operations - - The instructions manipulate the floatingpoint registers They indicate whether the operation is to be performed on single precision or double precision MOV. S copies a single precision register to another of the same type MOV. D copies a Double precision register to another of the same type MAC/VU-Advanced Computer Architecture Lecture 6 - Instruction Set Principles (3) 21

MIPS Floating-point Operations … Cont’d To get greater performance for graphic routines, MIPS 64

MIPS Floating-point Operations … Cont’d To get greater performance for graphic routines, MIPS 64 offers Paired-Single Instructions - These instructions perform two 32 -bit floating point operations on each half of the 64 -bit floating point register Examples: - ADD. PS SUB. PS MUL. PS DIV. PS MAC/VU-Advanced Computer Architecture Lecture 6 - Instruction Set Principles (3) 22

Putting it All Together - The earliest architectures were limited to instruction sets by

Putting it All Together - The earliest architectures were limited to instruction sets by the hardware technology of that time - In the 1960 s, stack architecture became popular, viewed as being good match of high-level language - In the 1970 s, the main concern of the architectures was to reduce the software cost, thus produced high-level architectures such as VAX machine MAC/VU-Advanced Computer Architecture Lecture 6 - Instruction Set Principles (3) 23

Putting it All Together. . Cont’d - In the 1980 s, return to simpler

Putting it All Together. . Cont’d - In the 1980 s, return to simpler architecture took place due to sophisticated compiler technology - In the 1990 s, new architectures were introduced; these include: MAC/VU-Advanced Computer Architecture Lecture 6 - Instruction Set Principles (3) 24

Putting it All Together. . Cont’d 1990 s Architectures 1: Address size doubles –

Putting it All Together. . Cont’d 1990 s Architectures 1: Address size doubles – 32 -bit to 64 -bit 2: Optimization of conditional branches via conditional execution e. g. ; conditional move 3: Optimization of Cache performance via pre-fetch that increased the role of memory hierarchy in performance of computers 4: Multimedia support 5: Faster Floating point instructions 6: Long Instruction Word MAC/VU-Advanced Computer Architecture Lecture 6 - Instruction Set Principles (3) 25

Concluding the Instruction set Principles – Three pillars of Computer Architecture Hardware, Software and

Concluding the Instruction set Principles – Three pillars of Computer Architecture Hardware, Software and Instruction Set – Instruction Set Interface between hardware and software – Taxonomy of Instruction Set: Stack, Accumulator and General Purpose Register – Types and Size of Operands: Types: Integer, FP and Character Size: Half word, double word – Classification of operations Arithmetic, data transfer, control and support MAC/VU-Advanced Computer Architecture Lecture 6 - Instruction Set Principles (3) 26

Concluding the Instruction set Principles… Cont’d Operand Addressing Modes Immediate, register, direct (absolute) and

Concluding the Instruction set Principles… Cont’d Operand Addressing Modes Immediate, register, direct (absolute) and Indirect Classification of Indirect Addressing Register, indexed, relative (i. e. with displacement) and memory Special Addressing Modes Auto-increment, auto-decrement and scaled Control Instruction Addressing modes Branch, jump and procedure call/return MAC/VU-Advanced Computer Architecture Lecture 5 - Instruction Set Principles. . Cont'd 27

Concluding the Instruction set Principles… Principles Cont’d Instruction encoding - Essential elements of computer

Concluding the Instruction set Principles… Principles Cont’d Instruction encoding - Essential elements of computer instructions: - type of operands, places of source and destinations and place of next instruction Instruction word length Variable, fixed length and hybrid Hybrid length taxonomy 4, 3, 2, 1 and 0 address format Comparison of hybrid instruction word format Minimum number of memory bytes are required in case of 1 address (accumulator) format and maximum for 4 -address format MAC/VU-Advanced Computer Architecture Lecture 5 - Instruction Set Principles. . Cont'd 28

Concluding the Instruction set Principles… Principles Cont’d MIPS Instruction word format - RISC and

Concluding the Instruction set Principles… Principles Cont’d MIPS Instruction word format - RISC and MIPS a fixed length, 64 -bit LOAD/STORE Architecture - It supports: - 8 -, 16 -, 32 - and 64 -bit operand - R-type, I-type and J-type - Arithmetic and logic operation - data transfer operations - Control flow operations MAC/VU-Advanced Computer Architecture Lecture 5 - Instruction Set Principles. . Cont'd 29

Concluding the Instruction set Principles… Principles Cont’d § Multimedia and Digital Signal Processing Operands

Concluding the Instruction set Principles… Principles Cont’d § Multimedia and Digital Signal Processing Operands - Graphic applications deal with 2 D and 3 D images DSP adds fixed point to the data types – binary point just to the right of the sign-bit § Multimedia and Digital Signal Processing operations All are fixed-width operation , performing multiple narrow operations on either 64 -bit or 128 -bit ALU The narrow operation B-byte, H-half word, W-word and 8 B double word § Multimedia and Digital Signal Processing issues Saturating Add/Subtract Result Rounding Multiply Accumulate MAC/VU-Advanced Computer Architecture Lecture 5 - Instruction Set Principles. . Cont'd 30

Concluding the Instruction set Principles… Principles Cont’d § ISA Performance § Role of Compiler:

Concluding the Instruction set Principles… Principles Cont’d § ISA Performance § Role of Compiler: The interaction of compiler and highlevel languages significantly effects how program uses an ISA - MAC/VU-Advanced Computer Architecture Lecture 5 - Instruction Set Principles. . Cont'd 31

Allah Hafiz and Asalm-u-Alacum MAC/VU-Advanced Computer Architecture Lecture 5 - Instruction Set Principles. .

Allah Hafiz and Asalm-u-Alacum MAC/VU-Advanced Computer Architecture Lecture 5 - Instruction Set Principles. . Cont'd 32

Practice Problems Quantitative Principles [Lecture 2 -3] Instruction Set Principles [Lecture 4 -5] MAC/VU-Advanced

Practice Problems Quantitative Principles [Lecture 2 -3] Instruction Set Principles [Lecture 4 -5] MAC/VU-Advanced Computer Architecture Lecture 6 - Instruction Set Principles (3) 33

Practice Problems Quantitative Principles [Lecture 2 -3] 1: Computer hardware is designed using ISA

Practice Problems Quantitative Principles [Lecture 2 -3] 1: Computer hardware is designed using ISA having three types (Type A, B and C) of instructions. The clock cycles per instruction (CPI) for each type of instruction is as follows: Type – A 2 CPI Type – B 3 CPI Type – C 4 CPI A compiler writer has written two different code sequences with different instruction count to execute an expression as given below. Code Sequence Instruction count for instruction type A B C 1 2 1 4 2 3 2 1 a) What is the instruction count of each sequence? b) Which of the sequence is faster? c) What is the CPI (average) for each instruction? MAC/VU-Advanced Computer Architecture Lecture 6 - Instruction Set Principles (3) 34

Solution to Practice Problem 1 a) The instruction count of Sequence 1 = 2+4+1

Solution to Practice Problem 1 a) The instruction count of Sequence 1 = 2+4+1 = 7 Sequence 2 = 1+1+4= 6 Result: Sequence 2 executes fewer instructions b) To find which sequence is faster, we have to find the CPU clock cycles for each sequence CPU Clock Cycles for sequence 1 = 2 x 2 + 3 x 4 + 4 x 1 = 20 cycles CPU Clock Cycles for sequence 1 = 2 x 3 + 3 x 2 + 4 x 4 = 28 cycles c) Result: Sequence 1 is faster To find the CPI [ CPU Cycles/Instruction Count) of each sequence CPI for sequence 1 = 20/7 = 2. 85 CPI for sequence 2 = 28/6 = 4. 67 Result: Sequence 2 which has fewer instructions has higher CPI, thus is slower MAC/VU-Advanced Computer Architecture Lecture 6 - Instruction Set Principles (3) 35

Practice Problems Instruction Set Principles [Lecture 4 -5] MAC/VU-Advanced Computer Architecture Lecture 6 -

Practice Problems Instruction Set Principles [Lecture 4 -5] MAC/VU-Advanced Computer Architecture Lecture 6 - Instruction Set Principles (3) 36