Computer Architecture I Outline and Instruction Set Architecture

  • Slides: 40
Download presentation
Computer Architecture I: Outline and Instruction Set Architecture CENG 331 - Computer Organization Instructors:

Computer Architecture I: Outline and Instruction Set Architecture CENG 331 - Computer Organization Instructors: Murat Manguoglu (Section 1) Erol Sahin (Section 2 & 3)

Course Outline Background n Instruction sets n Logic design Sequential Implementation n A simple,

Course Outline Background n Instruction sets n Logic design Sequential Implementation n A simple, but not very fast processor design Pipelining n Get more things running simultaneously Pipelined Implementation n Make it work Advanced Topics n n – 2– Performance analysis High performance processor design CS: APP 3 e

Coverage Our Approach n Work through designs for particular instruction set l Y 86

Coverage Our Approach n Work through designs for particular instruction set l Y 86 -64 − a simplified version of the Intel x 86 -64 l If you know one, you more-or-less know them all n Work at “microarchitectural” level l Assemble basic hardware blocks into overall processor structure » Memories, functional units, etc. l Surround by control logic to make sure each instruction flows through properly n Use simple hardware description language to describe control logic l Can extend and modify l Test via simulation l Route to design using Verilog Hardware Description Language – 3– » See Web aside ARCH: VLOG CS: APP 3 e

Schedule Lecture #1 n Instruction set architecture n Logic design Lecture #2 Sequential implementation

Schedule Lecture #1 n Instruction set architecture n Logic design Lecture #2 Sequential implementation n Pipelining and initial pipelined implementation Assignment: Add new instructions to sequential implementation n Lecture #3 n n – 4– Making the pipeline work Modern processor design CS: APP 3 e

Instruction Set Architecture Assembly Language View n Processor state l Registers, memory, … n

Instruction Set Architecture Assembly Language View n Processor state l Registers, memory, … n Instructions l addq, pushq, ret, … l How instructions are encoded as bytes Layer of Abstraction n Above: how to program machine l Processor executes instructions in a sequence n Below: what needs to be built l Use variety of tricks to make it – 5– run fast l E. g. , execute multiple instructions simultaneously Application Program Compiler OS ISA CPU Design Circuit Design Chip Layout CS: APP 3 e

Y 86 -64 Processor State RF: Program registers %rax %rsp %r 8 %r 12

Y 86 -64 Processor State RF: Program registers %rax %rsp %r 8 %r 12 %rcx %rdx %rbp %rsi %rdi %r 9 %r 10 %r 11 %r 13 %r 14 n CC: Condition codes Stat: Program status ZF SF OF DMEM: Memory PC Program Registers l 15 registers (omit %r 15). Each 64 bits n Condition Codes l Single-bit flags set by arithmetic or logical instructions » ZF: Zero n SF: Negative OF: Overflow Program Counter l Indicates address of next instruction n Program Status l Indicates either normal operation or some error condition n Memory l Byte-addressable storage array – 6– l Words stored in little-endian byte order CS: APP 3 e

Y 86 -64 Instruction Set #1 Byte 0 halt 0 0 nop 1 0

Y 86 -64 Instruction Set #1 Byte 0 halt 0 0 nop 1 0 cmov. XX r. A, r. B 2 fn r. A r. B irmovq V, r. B 3 0 F r. B V rmmovq r. A, D(r. B) 4 0 r. A r. B D mrmovq D(r. B), r. A 5 0 r. A r. B D OPq r. A, r. B 6 fn r. A r. B j. XX Dest 7 fn Dest call Dest 8 0 Dest ret 9 0 pushq r. A A 0 r. A F popq r. A – 7– B 0 r. A F 1 2 3 4 6 5 7 8 9 CS: APP 3 e

Y 86 -64 Instructions Format n 1– 10 bytes of information read from memory

Y 86 -64 Instructions Format n 1– 10 bytes of information read from memory l Can determine instruction length from first byte l Not as many instruction types, and simpler encoding than with x 86 -64 n – 8– Each accesses and modifies some part(s) of the program state CS: APP 3 e

Y 86 -64 Instruction Set Byte 0 halt 0 0 nop 1 0 cmov.

Y 86 -64 Instruction Set Byte 0 halt 0 0 nop 1 0 cmov. XX r. A, r. B 2 fn r. A r. B irmovq V, r. B 3 0 F r. B V rmmovq r. A, D(r. B) 4 0 r. A r. B D mrmovq D(r. B), r. A 5 0 r. A r. B D OPq r. A, r. B 6 fn r. A r. B j. XX Dest 7 fn Dest call Dest 8 0 Dest ret 9 0 pushq r. A A 0 r. A F popq r. A – 9– B 0 r. A F 1 2 3 4 6 5 7 8 9 CS: APP 3 e

Y 86 -64 Instruction Set #2 Byte 0 halt 0 0 nop 1 0

Y 86 -64 Instruction Set #2 Byte 0 halt 0 0 nop 1 0 cmov. XX r. A, r. B 2 fn r. A r. B irmovq V, r. B 3 0 rmmovq r. A, D(r. B) 4 0 r. A r. B mrmovq D(r. B), r. A 5 0 r. A r. B OPq r. A, r. B 6 fn r. A r. B j. XX Dest 7 fn Dest call Dest 8 0 Dest ret 9 0 pushq r. A A 0 r. A F popq r. A – 10 – B 0 r. A F 1 2 3 4 6 5 F r. B V D D 7 rrmovq 8 cmovle 7 9 7 0 cmovl 7 2 cmove 7 3 cmovne 7 4 cmovge 7 5 cmovg 7 6 1 CS: APP 3 e

Y 86 -64 Instruction Set #3 Byte 0 halt 0 0 nop 1 0

Y 86 -64 Instruction Set #3 Byte 0 halt 0 0 nop 1 0 cmov. XX r. A, r. B 2 fn r. A r. B irmovq V, r. B 3 0 F r. B V rmmovq r. A, D(r. B) 4 0 r. A r. B D mrmovq D(r. B), r. A OPq r. A, r. B j. XX Dest 5 1 2 3 4 6 5 0 r. A r. B 7 8 9 addq 6 0 subq 6 1 andq 6 2 xorq 6 3 D 6 fn r. A r. B 7 fn call Dest 8 0 ret 9 0 pushq r. A A 0 r. A F popq r. A – 11 – B 0 r. A F Dest CS: APP 3 e

Y 86 -64 Instruction Set #4 Byte 0 halt 0 nop 1 cmov. XX

Y 86 -64 Instruction Set #4 Byte 0 halt 0 nop 1 cmov. XX r. A, r. B 2 fn r. A r. B irmovq V, r. B 3 0 F r. B rmmovq r. A, D(r. B) 4 mrmovq D(r. B), r. A 5 OPq r. A, r. B 6 fn r. A r. B j. XX Dest 7 fn Dest call Dest 8 0 Dest ret 9 0 pushq r. A A 0 r. A F popq r. A – 12 – B 0 r. A F 8 jmp 97 0 0 jle 7 1 0 jl 7 2 je 7 3 V jne 7 4 0 r. A r. B D jge 7 5 0 r. A r. B D jg 7 6 1 2 3 4 6 5 7 CS: APP 3 e

Encoding Registers Each register has 4 -bit ID %rax %rcx %rdx %rbx %rsp %rbp

Encoding Registers Each register has 4 -bit ID %rax %rcx %rdx %rbx %rsp %rbp %rsi %rdi n 0 1 2 3 4 5 6 7 %r 8 %r 9 %r 10 %r 11 %r 12 %r 13 %r 14 8 9 A B C D E No Register F Same encoding as in x 86 -64 Register ID 15 (0 x. F) indicates “no register” n – 13 – Will use this in our hardware design in multiple places CS: APP 3 e

Instruction Example Addition Instruction Generic Form Encoded Representation addq r. A, r. B n

Instruction Example Addition Instruction Generic Form Encoded Representation addq r. A, r. B n 6 0 r. A r. B Add value in register r. A to that in register r. B l Store result in register r. B l Note that Y 86 -64 only allows addition to be applied to register data n Set condition codes based on result e. g. , addq %rax, %rsi Encoding: 60 06 n Two-byte encoding n l First indicates instruction type – 14 – l Second gives source and destination registers CS: APP 3 e

Arithmetic and Logical Operations Instruction Code Add addq r. A, r. B Function Code

Arithmetic and Logical Operations Instruction Code Add addq r. A, r. B Function Code 6 0 r. A r. B n Refer to generically as “OPq” n Encodings differ only by “function code” Subtract (r. A from r. B) subq r. A, r. B l Low-order 4 bytes in first instruction word 6 1 r. A r. B n And andq r. A, r. B Set condition codes as side effect 6 2 r. A r. B Exclusive-Or xorq r. A, r. B – 15 – 6 3 r. A r. B CS: APP 3 e

Move Operations Register rrmovq r. A, r. B 2 0 Immediate Register irmovq V,

Move Operations Register rrmovq r. A, r. B 2 0 Immediate Register irmovq V, r. B 3 0 F r. B V Register Memory rmmovq r. A, D(r. B) 4 0 r. A r. B D Memory Register mrmovq D(r. B), r. A – 16 – 5 0 r. A r. B D n Like the x 86 -64 movq instruction n Simpler format for memory addresses n Give different names to keep them distinct CS: APP 3 e

Move Instruction Examples X 86 -64 Y 86 -64 movq $0 xabcd, %rdx irmovq

Move Instruction Examples X 86 -64 Y 86 -64 movq $0 xabcd, %rdx irmovq $0 xabcd, %rdx Encoding: 30 82 cd ab 00 00 00 movq %rsp, %rbx rrmovq %rsp, %rbx Encoding: 20 43 movq -12(%rbp), %rcx Encoding: movq %rsi, 0 x 41 c(%rsp) Encoding: – 17 – mrmovq -12(%rbp), %rcx 50 15 f 4 ff ff rmmovq %rsi, 0 x 41 c(%rsp) 40 64 1 c 04 00 00 00 CS: APP 3 e

Conditional Move Instructions Move Unconditionally rrmovq r. A, r. B 2 0 r. A

Conditional Move Instructions Move Unconditionally rrmovq r. A, r. B 2 0 r. A r. B n Refer to generically as “cmov. XX” 2 1 r. A r. B n Encodings differ only by “function code” Based on values of condition codes Variants of rrmovq instruction Move When Less or Equal cmovle r. A, r. B Move When Less cmovl r. A, r. B 2 2 r. A r. B Move When Equal cmove r. A, r. B n 2 3 r. A r. B l (Conditionally) copy value Move When Not Equal cmovne r. A, r. B 2 4 r. A r. B Move When Greater or Equal cmovge r. A, r. B n from source to destination register 2 5 r. A r. B Move When Greater cmovg r. A, r. B – 18 – 2 6 r. A r. B CS: APP 3 e

Jump Instructions Jump (Conditionally) j. XX Dest 7 fn Dest n Refer to generically

Jump Instructions Jump (Conditionally) j. XX Dest 7 fn Dest n Refer to generically as “j. XX” n Encodings differ only by “function code” fn n Based on values of condition codes Same as x 86 -64 counterparts Encode full destination address n n l Unlike PC-relative addressing seen in x 86 -64 – 19 – CS: APP 3 e

Jump Instructions Jump Unconditionally jmp Dest 7 0 Dest Jump When Less or Equal

Jump Instructions Jump Unconditionally jmp Dest 7 0 Dest Jump When Less or Equal jle Dest 7 1 Dest Jump When Less jl Dest 7 2 Dest Jump When Equal je Dest 7 3 Dest Jump When Not Equal jne Dest 7 4 Dest Jump When Greater or Equal jge Dest 7 5 Dest Jump When Greater jg Dest – 20 – 7 6 Dest CS: APP 3 e

Y 86 -64 Program Stack “Bottom” n Region of memory holding program data n

Y 86 -64 Program Stack “Bottom” n Region of memory holding program data n Used in Y 86 -64 (and x 86 -64) for supporting procedure calls Stack top indicated by %rsp n • Increasing Addresses l Address of top stack element • n • l Top element is at highest %rsp Stack “Top” – 21 – Stack grows toward lower addresses address in the stack l When pushing, must first decrement stack pointer l After popping, increment stack pointer CS: APP 3 e

Stack Operations pushq r. A n Decrement %rsp by 8 Store word from r.

Stack Operations pushq r. A n Decrement %rsp by 8 Store word from r. A to memory at %rsp n Like x 86 -64 n popq r. A – 22 – A 0 r. A F B 0 r. A F n Read word from memory at %rsp n n Save in r. A Increment %rsp by 8 n Like x 86 -64 CS: APP 3 e

Subroutine Call and Return call Dest n Push address of next instruction onto stack

Subroutine Call and Return call Dest n Push address of next instruction onto stack Start executing instructions at Dest n Like x 86 -64 n ret 9 0 n Pop value from stack Use as address for next instruction n Like x 86 -64 n – 23 – 8 0 CS: APP 3 e

Miscellaneous Instructions nop n 1 0 Don’t do anything halt n n – 24

Miscellaneous Instructions nop n 1 0 Don’t do anything halt n n – 24 – 0 0 Stop executing instructions x 86 -64 has comparable instruction, but can’t execute it in user mode We will use it to stop the simulator Encoding ensures that program hitting memory initialized to zero will halt CS: APP 3 e

Status Conditions Mnemonic Code AOK 1 Mnemonic Code HLT 2 Mnemonic Code ADR 3

Status Conditions Mnemonic Code AOK 1 Mnemonic Code HLT 2 Mnemonic Code ADR 3 Mnemonic Code INS 4 n Normal operation n Halt instruction encountered n Bad address (either instruction or data) encountered n Invalid instruction encountered Desired Behavior n n – 25 – If AOK, keep going Otherwise, stop program execution CS: APP 3 e

Writing Y 86 -64 Code Try to Use C Compiler as Much as Possible

Writing Y 86 -64 Code Try to Use C Compiler as Much as Possible n n Write code in C Compile for x 86 -64 with gcc –Og –S Transliterate into Y 86 -64 Modern compilers make this more difficult Coding Example n Find number of elements in null-terminated list int len 1(int a[]); a 5043 6125 7395 3 0 – 26 – CS: APP 3 e

Y 86 -64 Code Generation Example First Try n Write typical array code Problem

Y 86 -64 Code Generation Example First Try n Write typical array code Problem n Hard to do array indexing on Y 86 -64 l Since don’t have scaled /* Find number of elements in null-terminated list */ long len(long a[]) { long len; for (len = 0; a[len]; len++) ; return len; } n – 27 – addressing modes L 3: addq $1, %rax cmpq $0, (%rdi, %rax, 8) jne L 3 Compile with gcc -Og -S CS: APP 3 e

Y 86 -64 Code Generation Example #2 Second Try n Write C code that

Y 86 -64 Code Generation Example #2 Second Try n Write C code that mimics expected Y 86 -64 code long len 2(long *a) { long ip = (long) a; long val = *(long *) ip; long len = 0; while (val) { ip += sizeof(long); len++; val = *(long *) ip; } return len; } – 28 – Result n Compiler generates exact same code as before! n Compiler converts both versions into same intermediate form CS: APP 3 e

Y 86 -64 Code Generation Example #3 len: irmovq $1, %r 8 irmovq $8,

Y 86 -64 Code Generation Example #3 len: irmovq $1, %r 8 irmovq $8, %r 9 irmovq $0, %rax mrmovq (%rdi), %rdx andq %rdx, %rdx je Done Loop: addq %r 8, %rax addq %r 9, %rdi mrmovq (%rdi), %rdx andq %rdx, %rdx jne Loop Done: ret – 29 – # # # Constant 1 Constant 8 len = 0 val = *a Test val If zero, goto Done len++ a++ val = *a Test val If !0, goto Loop Register Use %rdi a %rax len %rdx val %r 8 1 %r 9 8 CS: APP 3 e

Y 86 -64 Sample Program Structure #1 init: . . . call Main halt

Y 86 -64 Sample Program Structure #1 init: . . . call Main halt # Initialization . align 8 array: . . . # Program data Main: . . . call len # Main function Program starts at address 0 n Must set up stack l Where located l Pointer values l Make sure don’t overwrite code! n Must initialize data . . . len: . . . # Length function . pos 0 x 100 Stack: # Placement of stack – 30 – n CS: APP 3 e

Y 86 -64 Program Structure #2 init: # Set up stack pointer irmovq Stack,

Y 86 -64 Program Structure #2 init: # Set up stack pointer irmovq Stack, %rsp # Execute main program call Main # Terminate halt # Array of 4 elements + terminating 0. align 8 Array: . quad 0 x 000 d. quad 0 x 00 c 000 c 0. quad 0 x 0 b 000 b 00. quad 0 xa 000 a 000. quad 0 – 31 – n Program starts at address 0 n Must set up stack Must initialize data Can use symbolic names n n CS: APP 3 e

Y 86 -64 Program Structure #3 Main: irmovq array, %rdi # call len(array) call

Y 86 -64 Program Structure #3 Main: irmovq array, %rdi # call len(array) call len ret Set up call to len – 32 – n Follow x 86 -64 procedure conventions n Push array address as argument CS: APP 3 e

Assembling Y 86 -64 Program unix> yas len. ys n Generates “object code” file

Assembling Y 86 -64 Program unix> yas len. ys n Generates “object code” file len. yo l Actually looks like disassembler output 0 x 054: 0 x 05 e: 0 x 068: 0 x 072: 0 x 07 c: 0 x 07 e: 0 x 087: 0 x 089: 0 x 08 b: 0 x 095: 0 x 097: 0 x 0 a 0: – 33 – 30 f 8010000000 30 f 9080000000 30 f 000000000 502700000000 6222 73 a 00000000 6080 6097 502700000000 6222 74870000000 90 | len: | irmovq $1, %r 8 | irmovq $8, %r 9 | irmovq $0, %rax | mrmovq (%rdi), %rdx | andq %rdx, %rdx | je Done | Loop: | addq %r 8, %rax | addq %r 9, %rdi | mrmovq (%rdi), %rdx | andq %rdx, %rdx | jne Loop | Done: | ret # # # Constant 1 Constant 8 len = 0 val = *a Test val If zero, goto Done # # # len++ a++ val = *a Test val If !0, goto Loop CS: APP 3 e

Simulating Y 86 -64 Program unix> yis len. yo n Instruction set simulator l

Simulating Y 86 -64 Program unix> yis len. yo n Instruction set simulator l Computes effect of each instruction on processor state l Prints changes in state from original Stopped Changes %rax: %rsp: %rdi: %r 8: %r 9: in 33 steps at PC = 0 x 13. Status 'HLT', CC Z=1 S=0 O=0 to registers: 0 x 000000004 0 x 00000000 0 x 0000000100 0 x 0000000000000038 0 x 000000001 0 x 000000008 Changes to memory: 0 x 00 f 0: 0 x 00000000 0 x 00 f 8: 0 x 00000000 – 34 – 0 x 000000053 0 x 000000013 CS: APP 3 e

CISC Instruction Sets n n Complex Instruction Set Computer IA 32 is example Stack-oriented

CISC Instruction Sets n n Complex Instruction Set Computer IA 32 is example Stack-oriented instruction set n n Use stack to pass arguments, save program counter Explicit push and pop instructions Arithmetic instructions can access memory n addq %rax, 12(%rbx, %rcx, 8) l requires memory read and write l Complex address calculation Condition codes n Set as side effect of arithmetic and logical instructions Philosophy n – 35 – Add instructions to perform “typical” programming tasks CS: APP 3 e

RISC Instruction Sets n n Reduced Instruction Set Computer Internal project at IBM, later

RISC Instruction Sets n n Reduced Instruction Set Computer Internal project at IBM, later popularized by Hennessy (Stanford) and Patterson (Berkeley) Fewer, simpler instructions n n Might take more to get given task done Can execute them with small and fast hardware Register-oriented instruction set n n Many more (typically 32) registers Use for arguments, return pointer, temporaries Only load and store instructions can access memory n Similar to Y 86 -64 mrmovq and rmmovq No Condition codes n – 36 – Test instructions return 0/1 in register CS: APP 3 e

MIPS Registers – 37 – CS: APP 3 e

MIPS Registers – 37 – CS: APP 3 e

MIPS Instruction Examples R-R Op Ra addu $3, $2, $1 R-I Op Ra addu

MIPS Instruction Examples R-R Op Ra addu $3, $2, $1 R-I Op Ra addu $3, $2, 3145 sll $3, $2, 2 Branch Op Ra beq $3, $2, dest Load/Store Op – 38 – Ra Rb Rd 00000 Fn # Register add: $3 = $2+$1 Rb Immediate # Immediate add: $3 = $2+3145 # Shift left: $3 = $2 << 2 Rb Offset # Branch when $3 = $2 Rb Offset lw $3, 16($2) # Load Word: $3 = M[$2+16] sw $3, 16($2) # Store Word: M[$2+16] = $3 CS: APP 3 e

CISC vs. RISC Original Debate n Strong opinions! n CISC proponents---easy for compiler, fewer

CISC vs. RISC Original Debate n Strong opinions! n CISC proponents---easy for compiler, fewer code bytes RISC proponents---better for optimizing compilers, can make run fast with simple chip design n Current Status n For desktop processors, choice of ISA not a technical issue l With enough hardware, can make anything run fast l Code compatibility more important n x 86 -64 adopted many RISC features l More registers; use them for argument passing n For embedded processors, RISC makes sense l Smaller, cheaper, less power l Most cell phones use ARM processor – 39 – CS: APP 3 e

Summary Y 86 -64 Instruction Set Architecture n Similar state and instructions as x

Summary Y 86 -64 Instruction Set Architecture n Similar state and instructions as x 86 -64 n Simpler encodings Somewhere between CISC and RISC n How Important is ISA Design? n Less now than before l With enough hardware, can make almost anything go fast – 40 – CS: APP 3 e