CDA 3101 Spring 2020 Introduction to Computer Organization

  • Slides: 37
Download presentation
CDA 3101 Spring 2020 Introduction to Computer Organization Alternative Architectures Arithmetic/Logic Operations 04, 06

CDA 3101 Spring 2020 Introduction to Computer Organization Alternative Architectures Arithmetic/Logic Operations 04, 06 February 2020

Concept Review • Variable allocation – Stack frames (caller/callee), static, heap • Pointers –

Concept Review • Variable allocation – Stack frames (caller/callee), static, heap • Pointers – Memory addresses – Necessary (to pass arguments: arrays, structures) – Efficient (pointer arithmetic) – Problems • Reference wrong memory location (segmentation fault) • Memory leakage

Overview • ISA design alternatives • Design principles (tradeoffs) • CPU performance equation •

Overview • ISA design alternatives • Design principles (tradeoffs) • CPU performance equation • RISC vs. CISC • Historical perspective • Power. PC and 80 x 86

Computer Architectures • Accumulator – Hardware expensive => only one register – Accumulator: one

Computer Architectures • Accumulator – Hardware expensive => only one register – Accumulator: one of the operands and result – Memory-based operand addressing mode • Stack – No registers (simple compilers, compact encoding) • Special purpose registers (e. g. 8086) • General purpose registers – Register-memory – Register-register (load-store) • HLL computer architecture

Instruction Set Architectures A = B + C; Accumulator Load Address. B Stack Push

Instruction Set Architectures A = B + C; Accumulator Load Address. B Stack Push Address. C Load-store Load R 1, Address. B Add Push Address. B Load R 2, Address. C Add R 3, R 1, R 2 Address. C Store Address. A Pop Address. A Store R 3, Address. A Memory-memory Address. A, Address. B, Address. C CPUtime = IC * CPI * Cycle_time

ISA Trends • Hardware and compiler technology trends – Hardware / software boundary swings

ISA Trends • Hardware and compiler technology trends – Hardware / software boundary swings back and forth EDSAC CISC RISC Post RISC (FISC) EPIC Compiler support Multi-Core Hardware support

RISC Architecture • Reduced Instruction Set Computer • Design philosophy – Load-store – Fixed-length

RISC Architecture • Reduced Instruction Set Computer • Design philosophy – Load-store – Fixed-length instructions – Three-address architecture – Plenty of registers – Simple addressing modes – Instruction pipelining • Many ideas used in modern computers have been taken from CDC 6600 (1963)

Power. PC • Similar to MIPS: 32 registers, 32 -bit instructions, RISC • Differences

Power. PC • Similar to MIPS: 32 registers, 32 -bit instructions, RISC • Differences (tradeoffs: simplicity vs. common case) – Indexed addressing • Example: lw $t 1, $a 0+$s 3 #$t 1=Memory[$a 0+$s 3] • MIPS: add $t 0, $a 0, $s 3; lw $t 1, 0($t 0) – Update addressing • Update a register as part of load (for marching through arrays) • Example: lwu $t 0, 4($s 3) #lw $t 0, 4($s 3); addi $s 3, 4 – Unique instructions • Load multiple/store multiple: up to 32 words in a single instruction • Special counter register – bc Loop, $ctr!=0 #decrement counter, if not 0 goto loop – MIPS: addi, $t 0, -1; bne $t 0, $zero, Loop

80 x 86 Milestones • 1978: 8086, 16 bit architecture (64 KB), no GPRs

80 x 86 Milestones • 1978: 8086, 16 bit architecture (64 KB), no GPRs • 1980: 8087 FP coprocessor, 60+ instructions, 80 -bit stack, no GPRs • 1982: 80286, 24 -bit address space, protection model • 1985: 80386, 32 bits, new addressing modes, 8 GPRs • 1989 -1995: The 80486, Pentium Pro add a few instructions (designed for higher performance) • 1997: MMX +57 instructions (SIMD) • 1999: PIII +70 multimedia instructions • 2000: P 4 +144 multimedia instructions Golden handcuffs of upward compatibility => architecture is difficult to explain and impossible to love

X 86 Architecture • Two-address architecture – The destination is also one of the

X 86 Architecture • Two-address architecture – The destination is also one of the sources add $s 1, $s 0 # s 0=s 0+s 1 (C: a += b; ) – Benefit: smaller instructions smaller code faster • Register-memory architecture – One operand can be in memory; other operand is register add 12(%gp), %s 0 # s 0=s 0+Mem[12+gp] – Benefit: fewer instructions smaller code • Variable-length instructions (1 to 17 bytes) – Small code size (30% smaller) – Better instruction cache hit rates – Instructions can include 8 - or 32 -bit immediates

X 86 Features • Operating modes: real (8088), virtual, and protected • Four protection

X 86 Features • Operating modes: real (8088), virtual, and protected • Four protection levels • Memory – Address space: 16, 384 segments (4 GB) – Little endian • 8 32 -bit Registers (16 -bit 8086 names with e prefix): – eax, ecx, edx, ebx, esp, ebp, esi, edi • Data types – Signed/unsigned integers (8, 16, and 32 bits) – Binary coded decimal integers – Floating point (32 and 64 bits) • Floating point uses a separate stack

X 86 Registers Main arithmetic register Pointers (memory addresses) Loops Multiplication and division Pointer

X 86 Registers Main arithmetic register Pointers (memory addresses) Loops Multiplication and division Pointer to source string Pointer to destination string Base of the current stack frame ($fp) Stack pointer Support for 8088 attempt to address 220 bytes using 16 -bit addresses Program counter Processor State Word

X 86 Instruction Formats • Highly complex and irregular • Six variable-length fields •

X 86 Instruction Formats • Highly complex and irregular • Six variable-length fields • Five fields are optional

Examples of X 86 Instruction Formats

Examples of X 86 Instruction Formats

Integer Instructions • Control – JNZ, JZ – JMP – CALL – RET –

Integer Instructions • Control – JNZ, JZ – JMP – CALL – RET – LOOP • Data Transfer – MOV – PUSH, POP – LES • Arithmetic – ADD, SUB – CMP – SHL, SHR, RCR – CBW – TEST – INC, DEC – OR, NOR • String – MOVS – LODS

Examples of X 86 Instructions • leal (load effective address) – Calculate address like

Examples of X 86 Instructions • leal (load effective address) – Calculate address like a load, but load address into register – Load 32 -bit address: leal -4000000(%ebp), %esi # esi = ebp – 4000000 • Memory Stack is part of instruction set – call label (esp-=4; M[esp]=eip+5; eip = label) – push places value onto stack, increments esp – pop gets value from stack, decrements esp • incl, decl (increment, decrement) incl %edx # edx = edx + 1

Addressing Modes Encoding • Highly irregular, non-orthogonal addressing modes • Instruction in 16 -bit

Addressing Modes Encoding • Highly irregular, non-orthogonal addressing modes • Instruction in 16 -bit or 32 -bit mode? • Not all modes apply to all instructions • Not all registers can be used in all modes

Addressing Modes • Base reg + offset (like MIPS) – movl -8000044(%ebp), %eax •

Addressing Modes • Base reg + offset (like MIPS) – movl -8000044(%ebp), %eax • Base reg + index reg (2 regs form addr. ) – movl (%eax, %ebx), %edi # edi = Mem[ebx + eax] • Scaled reg + index (shift one reg by 1, 2) – movl(%eax, %edx, 4), %ebx # ebx = Mem[edx*4 + eax] • Scaled reg + index + offset – movl 12(%eax, %edx, 4), %ebx # ebx = Mem[edx*4 + eax + 12]

Branch Support • Rather than compare registers, x 86 uses special 1 bit registers

Branch Support • Rather than compare registers, x 86 uses special 1 bit registers called “condition codes” that are set as a side-effect of ALU operations – – S - Sign Bit Z - Zero (result is all 0) C - Carry Out P - Parity: set to 1 if even number of ones in rightmost 8 bits of operation • Conditional Branch instructions then use condition flags for all comparisons: <, <=, >, >=, ==, !=

While Loop while (save[i]==k) i = i + j; MIPS X 86 (i, j,

While Loop while (save[i]==k) i = i + j; MIPS X 86 (i, j, k => %edx, %esi, %ebx) leal -400(%ebp), %eax . Loop: cmpl %ebx, (%eax, %edx, 4) jne . Exit addl %esi, %edx j. Exit: . Loop (i, j, k => $s 3, $s 4, $s 5) Loop: sll add lw bne add j Exit: $t 1, $s 3, 2 $t 1, $s 6 $t 0, 0($t 1) $t 0, $s 5, Exit $s 3, $s 4 Loop

PIII, P 4, and AMD • PC World magazine, Nov. 20, 2000 – –

PIII, P 4, and AMD • PC World magazine, Nov. 20, 2000 – – – World. Bench 2000 benchmark (business applications) P 4 score @ 1. 5 GHz: 164 (higher is better) PIII score @ 1. 0 GHz: 167 AMD Althon @ 1. 2 GHz: 180 (Media applications do better on P 4 vs. PIII) • Why? => CPU performance equation – – – Time = Instruction count x CPI x 1/Clock rate Instruction count is the same for x 86 Clock rates: P 4 > Althon > PIII How can P 4 be slower? Average CPI of P 4 must be worse than Althon, PIII

Summary • Instruction complexity is only one variable – lower instruction count vs. higher

Summary • Instruction complexity is only one variable – lower instruction count vs. higher CPI / lower clock rate • Design Principles: – simplicity favors regularity – smaller is faster – good design demands compromise – make the common case fast • Instruction set architecture – a very important abstraction!

New Topic – Arithmetic/Logic Ops • Arithmetic and logic unit (ALU) – Core of

New Topic – Arithmetic/Logic Ops • Arithmetic and logic unit (ALU) – Core of the a computer – Performs arithmetic and logical operations on data • Computer arithmetic issues – Number representation • Integers and floating point • Finite precision (overflow / underflow) – Algorithms used for the basic operations • Properties of number representation – One zero – As many positive numbers as negative numbers – Efficient hardware implementation of algorithms • 2’s complement: negate positive number and add one

Review N decimal 0 1 2 3 4 5 6 7 8 9 10

Review N decimal 0 1 2 3 4 5 6 7 8 9 10 20 50 100 127 128 (+N) Positive 00000001 00000010 00000011 00000100 00000101 00000110 00000111 00001000 00001001 000010100 001100100 01111111 NA (-N) Sign/magnitude 10000000 10000001 10000010 10000011 10000100 10000101 10000110 10000111 10001001 10001010 10010100 10110010 11100100 1111 NA (-N) 1’s complement 11111110 11111101 11111100 11111011 11111010 11111001 11111000 11110111 11110110 111101011 110011011 10000000 NA (-N) 2’s complement 0000 11111110 11111101 11111100 11111011 11111010 11111001 11111000 11110111 111101100 110011100 10000001 10000000

Overview I- instruction 32 -bit memory address

Overview I- instruction 32 -bit memory address

Addition • 5 ten + 6 ten 0000 0000 0101 (5 ten) + 0000

Addition • 5 ten + 6 ten 0000 0000 0101 (5 ten) + 0000 0000 0110 (6 ten) = 0000 0000 1011 (11 ten) + . . . (0). . . 0 (1) (0) (0) 0 0 (0)1 1 1 (1)0 0 1 (0)1 1 0 (0)1 Carries

Subtraction • 12 ten - 5 ten 0000 0000 1100 (12 ten) - 0000

Subtraction • 12 ten - 5 ten 0000 0000 1100 (12 ten) - 0000 0000 0101 ( 5 ten) = 0000 0000 0111 ( 7 ten) • 12 ten - 5 ten = 12 ten + (- 5 ten) 0000 0000 1100 (12 ten) + 1111 1111 1011 ( -5 ten) = 0000 0000 0111 ( 7 ten)

Overflow • Computer arithmetic is not closed w. r. t. + - * /

Overflow • Computer arithmetic is not closed w. r. t. + - * / • Overflow – The result can not be expressed with 32 bits • Overflow can not occur – Addition: if the operands have different signs – Subtraction: if the operands have the same sign • Overflow detection – Result needs 33 bits – Addition: a carry out occurs into the sign bit – Subtraction: a borrow occurs from the sign bit

Examples • 4 bits (instead of 32 in MIPS) => can represent integers in

Examples • 4 bits (instead of 32 in MIPS) => can represent integers in [-8 : 7] 7+6 0 1 1 1 + 0 1 1 0 1 -7 + -6 ( 7 ten) ( 6 ten) (13 ten) -7 – 6 1 0 0 1 -0 1 1 0 0 0 1 1 1 0 0 1 +1 0 0 0 1 1 ( -7 ten) ( -6 ten) (-13 ten) -7 – 6 = -7 + -6 ( -7 ten) ( 6 ten) (-13 ten) 1 0 0 1 + 1 0 0 0 1 1 ( -7 ten) ( -6 ten) (-13 ten)

Overflow Conditions Operation Operand A Operand B Result A+B >= 0 <0 A+B <0

Overflow Conditions Operation Operand A Operand B Result A+B >= 0 <0 A+B <0 <0 >= 0 A–B >= 0 <0 <0 A–B <0 >= 0

MIPS Support • MIPS raises an Exception when overflow occurs – – Exceptions (or

MIPS Support • MIPS raises an Exception when overflow occurs – – Exceptions (or interrupts) act like procedure calls Register EPC stores address of offending instruction mfc 0 $t 1, $epc # moves contents of EPC to $t 1 No conditional branch to test overflow • Two’s complement arithmetic (add, addi, and sub) – Exception on overflow • Unsigned arithmetic (addu and addiu) – No exception on overflow – Used for address arithmetic • Compilers – C ignores overflows (always uses addu, addiu, subu) – Fortran uses the appropriate instructions

Conditional branch on overflow Signed addition addu $t 0, $t 1, $t 2 #

Conditional branch on overflow Signed addition addu $t 0, $t 1, $t 2 # add but do not trap xor $t 3, $t 1, $t 2 # check if sign differ slt $t 3, $0 # $t 3 =1 if signs differ bne $t 3, $0, NO_OVFL # signs of t 1, t 2 different xor $t 3, $t 0, $t 1 # sign of sum (t 0) different? slt $t 3, $0 # $t 3 = 1 if sum has different sign bne $t 3, $0, OVFL # go to overflow Unsigned addition (range = [0 : 232 – 1] => $t 1 + $t 2 <= 232 – 1) addu $t 0, $t 1, $t 2 # $t 0 contains the sum nor $t 3, $t 1, $0 # negate $t 1 ($t 3 = NOT $t 1) sltu $t 3, $t 2 # 232 – 1 – t 1 < t 2? bne $t 3, $0, OVFL # t 1 + t 2 > 232 – 1 => overflow

Registers $k 0 and $k 1 Registers Offending: . . . add $t 0,

Registers $k 0 and $k 1 Registers Offending: . . . add $t 0, $t 1, $t 2. . . Text Data EPC • Exception handling procedure will use registers • Procedure calling conventions do not work • Reserve $k 0 $k 1 for the operating system Exception handling procedure Offending procedure Stack

Logical Operations • Operations on fields of bits within a 32 -bit word –

Logical Operations • Operations on fields of bits within a 32 -bit word – Characters (8 bits) – Bit fields (in C) • Logical operations to pack/unpack bits into words – sll – srl – and, andi – or, ori shift left shift right bitwise AND bitwise OR • Bitwise operators treat operand as vector of bits

C Bit Fields struct { unsigned int ready: unsigned int enable: unsigned int received.

C Bit Fields struct { unsigned int ready: unsigned int enable: unsigned int received. Byte: } receiver; int data = receiver. Byte; receiver. ready = 0; receiver. enable = 1; 31 $s 1 1; 1; 8; #$s 0: data; $s 1: receiver sll srl andi ori $s 0, $s 1, 22 $s 0, 24 $s 1, 0 xfffe $s 1, 0 x 0002 10 9 2 1 0 received. Byte e r $s 1 received. Byte e 0 $s 1 received. Byte 1 0 $s 0

Conclusions • • • ISA supports architectural development Hardware/Software, RISC/CISC emphasis Technology driven ALU

Conclusions • • • ISA supports architectural development Hardware/Software, RISC/CISC emphasis Technology driven ALU = core of computer ALU problem = overflow Exception handling • Think: Weekend! =>