MachineLevel Programming Introduction Today n Assembly programmers exec

Machine-Level Programming – Introduction Today n Assembly programmer’s exec model Accessing information n Arithmetic operations n Next time n More of the same Fabián E. Bustamante, Spring 2007

IA 32 Processors Totally dominate computer market Evolutionary design – Starting in 1978 with 8086 – Added more features as time goes on – Backward compatibility: able to run code for earlier version Complex Instruction Set Computer (CISC) – Many different instructions with many different formats • But, only small subset encountered with Linux programs – Hard to match performance of Reduced Instruction Set Computers (RISC) – But, Intel has done just that! X 86 evolution clones: Advanced Micro Devices (AMD) – Historically followed just behind Intel – a little bit slower, a lot cheaper EECS 213 Introduction to Computer Systems Northwestern University 2

X 86 Evolution: Programmer’s view Name Date Transistors Comments 8086 1978 29 k 16 -bit processor, basis for IBM PC & DOS; limited to 1 MB address space 80286 1982 134 K Added elaborate, but not very useful, addressing scheme; basis for IBM PC AT and Windows 386 1985 275 K Extended to 32 b, added “flat addressing”, capable of running Unix, Linux/gcc uses 486 1989 1. 9 M Improved performance; integrated FP unit into chip Pentium 1993 3. 1 M Improved performance Pentium. Pro 1995 6. 5 M Added conditional move instructions; big change in underlying microarch (called P 6 internally) Pentium/M MX 1997 6. 5 M Pentium II 1997 7 M Merged Pentium/MMZ and Pentium. Pro implementing MMX instructions within P 6 Pentium III 1999 8. 2 M Instructions for manipulating vectors of integers or floating point; later versions included Level 2 cache Pentium 4 2001 42 M 8 B ints and floating point formats to vector instructions Added special set of instructions for 64 -bit vectors of 1, 2, or 4 byte integer data EECS 213 Introduction to Computer Systems Northwestern University 3

Assembly programmer’s view CPU %eip Addresses Registers Data Condition Codes Memory Object Code Program Data OS Data Instructions Stack Programmer-Visible State – %eip Program Counter • Address of next instruction – Register file (8 x 32 bit) • Heavily used program data – Condition codes • Store status information about most recent arithmetic operation • Used for conditional branching Memory – Byte addressable array – Code, user data, (some) OS data – Includes stack used to support procedures – Floating point register file EECS 213 Introduction to Computer Systems Northwestern University 4

Turning C into object code Code in files p 1. c p 2. c Compile with command: gcc –O 2 p 1. c p 2. c -o p – Use level 2 optimizations (-O 2); put resulting binary in file p text C program (p 1. c p 2. c) Compiler (gcc -S) text Asm program (p 1. s p 2. s) Assembler (gcc or as) binary Object program (p 1. o p 2. o) Static libraries (. a) Linker (gcc or ld) binary Executable program (p) EECS 213 Introduction to Computer Systems Northwestern University 5

Compiling into assembly C code Generated assembly sum: pushl %ebp movl %esp, %ebp movl 12(%ebp), %eax addl 8(%ebp), %eax movl %ebp, %esp popl %ebp ret int sum(int x, int y) { int t = x+y; return t; } Obtain with command gcc –O 2 -S code. c Produces assembly code in GAS (Gnu Assembler) format code. s EECS 213 Introduction to Computer Systems Northwestern University 6

Assembly characteristics gcc default target architecture: I 386 (flat addressing) Minimal data types – “Integer” data of 1, 2, or 4 bytes • Data values or addresses – Floating point data of 4, 8, or 10 bytes – No aggregate types such as arrays or structures • Just contiguously allocated bytes in memory Primitive operations – Perform arithmetic function on register or memory data – Transfer data between memory and register • Load data from memory into register • Store register data into memory – Transfer control • Unconditional jumps to/from procedures • Conditional branches EECS 213 Introduction to Computer Systems Northwestern University 7

Object code Code for sum Assembler – – 0 x 401040 <sum>: 0 x 55 0 x 89 0 xe 5 0 x 8 b 0 x 45 0 x 0 c 0 x 03 0 x 45 0 x 08 0 x 89 0 xec 0 x 5 d 0 xc 3 Translates. s into. o Binary encoding of each instruction Nearly-complete image of exec code Missing linkages between code in different files Linker – Resolves references between files • One of the object codes must contain function main(); – Combines with static run-time libraries • E. g. , code for malloc, printf – Some libraries are dynamically linked • Total of 13 bytes • Each instruction 1, 2, or 3 bytes • Starts at address 0 x 401040 • Linking occurs when program begins execution EECS 213 Introduction to Computer Systems Northwestern University 8

Machine instruction example int t = x+y; C Code – Add two signed integers Assembly – Add 2 4 -byte integers addl 8(%ebp), %eax Similar to C expression x += y • “Long” words in GCC parlance • Same instruction whether signed or unsigned – Operands: x: Register %eax y: Memory M[%ebp+8] t: Register %eax – Return function value in %eax Object code 0 x 401046: 03 45 08 – 3 -byte instruction – Stored at address 0 x 401046 EECS 213 Introduction to Computer Systems Northwestern University 9

Disassembling object code Disassembled 00401040 <_sum>: 0: 55 1: 89 e 5 3: 8 b 45 0 c 6: 03 45 08 9: 89 ec b: 5 d c: c 3 d: 8 d 76 00 push mov add mov pop ret lea %ebp %esp, %ebp 0 xc(%ebp), %eax 0 x 8(%ebp), %eax %ebp, %esp %ebp 0 x 0(%esi), %esi Disassembler – – – objdump -d p Useful tool for examining object code Analyzes bit pattern of series of instructions Produces approximate rendition of assembly code Can be run on either a. out (complete executable) or. o file EECS 213 Introduction to Computer Systems Northwestern University 10

Alternate disassembly Disassembled Object 0 x 401040: 0 x 55 0 x 89 0 xe 5 0 x 8 b 0 x 45 0 x 0 c 0 x 03 0 x 45 0 x 08 0 x 89 0 xec 0 x 5 d 0 xc 3 0 x 401040 0 x 401041 0 x 401043 0 x 401046 0 x 401049 0 x 40104 b 0 x 40104 c 0 x 40104 d <sum>: <sum+1>: <sum+3>: <sum+6>: <sum+9>: <sum+11>: <sum+12>: <sum+13>: push mov add mov pop ret lea %ebp %esp, %ebp 0 xc(%ebp), %eax 0 x 8(%ebp), %eax %ebp, %esp %ebp 0 x 0(%esi), %esi Within gdb debugger – Once you know the length of sum using the dissambler – Examine the 13 bytes starting at sum gdb p. o x/13 b sum EECS 213 Introduction to Computer Systems Northwestern University 11

Data formats “word” – 16 b data type due to its origins – 32 b – double word – 64 b – quad words The overloading of “l” in GAS causes no problems since FP involves different operations & registers C decl Intel data type GAS suffix Size (bytes) char Byte b 1 short Word w 2 int, unsigned, long int, unsigned long, char * Double word l 4 float Single precision s 4 double Double precision l 8 long double Extended precision t 10/12 EECS 213 Introduction to Computer Systems Northwestern University 12

Accessing information 8 32 bit registers Six of them mostly for general purpose Last two point to key data in a process stack Two low-order bytes of the first 4 can be access directly (low-order 16 bit as well) 31 %eax 15 %ax %ah %ecx %ch %cl %edx %dh %dl %ebx %bh %bl %esi %edi %di Stack pointer %esp %sp Frame pointer %ebp %bp EECS 213 Introduction to Computer Systems Northwestern University 87 0 %al 13

Operand specifiers Most instructions have 1 or 2 operands – Source: constant or read from register or memory – Destination: register or memory – Types: • Immediate – constant, denoted with a “$” in front • Register – either 8 or 16 or 32 bit registers • Memory – location given by an effective address Operand forms – last is the most general – s, scale factor, must be 1, 2, 4 or 8 – Other memory forms are cases of it • Absolute - M[Imm]; Based + displacement: M[Imm + R[Eb]] Type Form Operand value Name Immediate $Imm Immediate Register Ea R[Ea] Register Memory Imm (Eb, Ei, s) M[Imm + R[Eb]+R[Ei]*s] Absolute, Indirect, Based + displacement, Indexed, Scale indexed EECS 213 Introduction to Computer Systems Northwestern University 14

Moving data Among the most common instructions IA 32 restriction – cannot move from one memory location to another with one instruction Note the differences between movb, movsbl and movzbl Last two work with the stack pushl %ebp = subl $4, %esp movl %ebp, (%esp) Since stack is part of program mem, you can really access all Instruction Effect Description Mov{l, w, b} S, D D←S Move double word, word or byte movsbl S, D D ← Sign. Extend(S) Move sign-extended byte movzbl S, D D ← Zero. Extend(S) Move zero-extended byte pushl S R[%esp] ← R[%esp] – 4; M[R[%esp]] ← S Push S onto the stack popl S D ← M[R[%esp]] R[%esp] ← R[%esp] + 4; Pop S from the stack EECS 213 Introduction to Computer Systems Northwestern University 15

movl operand combinations Source Destination movl C Analog Imm Reg Mem movl $0 x 4, %eax temp = 0 x 4; movl $-147, (%eax) *p = -147; Reg Mem movl %eax, %edx temp 2 = temp 1; movl %eax, (%edx) *p = temp; Mem Reg movl (%eax), %edx temp = *p; EECS 213 Introduction to Computer Systems Northwestern University 16

Using simple addressing modes Declares xp as being a pointer to an int void swap(int *xp, int *yp) { int t 0 = *xp; int t 1 = *yp; *xp = t 1; Read value stored in *yp = t 0; location xp and store it in t 0 } swap: pushl %ebp movl %esp, %ebp pushl %ebx movl movl 8(%ebp), %edx 12(%ebp), %ecx (%ecx), %eax (%edx), %ebx %eax, (%edx) %ebx, (%ecx) movl -4(%ebp), %ebx movl %ebp, %esp popl %ebp ret EECS 213 Introduction to Computer Systems Northwestern University Set Up Body Finish 17

Address Understanding swap void swap(int *xp, int *yp) { int t 0 = *xp; int t 1 = *yp; *xp = t 1; *yp = t 0; } Register %ecx %edx %eax %ebx Variable yp xp t 1 t 0 0 x 124 456 0 x 120 0 x 11 c 0 x 118 Offset 0 x 114 yp 12 0 x 120 0 x 110 xp 8 0 x 124 0 x 10 c 4 Rtn adr 0 x 108 %ebp movl movl 123 0 Old %ebp 0 x 104 -4 Old %ebx 0 x 100 12(%ebp), %ecx 8(%ebp), %edx (%ecx), %eax (%edx), %ebx %eax, (%edx) %ebx, (%ecx) EECS 213 Introduction to Computer Systems Northwestern University # # # ecx edx eax ebx *xp *yp = = = yp xp *yp (t 1) *xp (t 0) eax ebx 18

Address Understanding swap 123 0 x 124 456 0 x 120 0 x 11 c %eax 0 x 118 Offset %edx %ecx %ebx %esi yp 12 0 x 120 0 x 110 xp 8 0 x 124 0 x 10 c 4 Rtn adr 0 x 108 0 %ebp %edi 0 x 104 -4 %esp %ebp 0 x 114 0 x 104 movl movl 12(%ebp), %ecx 8(%ebp), %edx (%ecx), %eax (%edx), %ebx %eax, (%edx) %ebx, (%ecx) # # # ecx edx eax ebx *xp *yp EECS 213 Introduction to Computer Systems Northwestern University 0 x 100 = = = yp xp *yp (t 1) *xp (t 0) eax ebx 19

Address Understanding swap 123 0 x 124 456 0 x 120 0 x 11 c %eax 0 x 118 Offset %edx %ecx 0 x 120 %ebx %esi 12 0 x 120 0 x 110 xp 8 0 x 124 0 x 10 c 4 Rtn adr 0 x 108 0 0 x 104 -4 %esp %ebp yp %ebp %edi 0 x 114 0 x 104 movl movl 12(%ebp), %ecx 8(%ebp), %edx (%ecx), %eax (%edx), %ebx %eax, (%edx) %ebx, (%ecx) # # # ecx edx eax ebx *xp *yp EECS 213 Introduction to Computer Systems Northwestern University 0 x 100 = = = yp xp *yp (t 1) *xp (t 0) eax ebx 20

Address Understanding swap 123 0 x 124 456 0 x 120 0 x 11 c %eax 0 x 118 %edx 0 x 124 %ecx 0 x 120 Offset %ebx %esi 12 0 x 120 0 x 110 xp 8 0 x 124 0 x 10 c 4 Rtn adr 0 x 108 0 0 x 104 -4 %esp %ebp yp %ebp %edi 0 x 114 0 x 104 movl movl 12(%ebp), %ecx 8(%ebp), %edx (%ecx), %eax (%edx), %ebx %eax, (%edx) %ebx, (%ecx) # # # ecx edx eax ebx *xp *yp EECS 213 Introduction to Computer Systems Northwestern University 0 x 100 = = = yp xp *yp (t 1) *xp (t 0) eax ebx 21

Address Understanding swap 123 0 x 124 456 0 x 120 0 x 11 c %eax 456 %edx 0 x 124 %ecx 0 x 120 0 x 118 Offset %ebx %esi 12 0 x 120 0 x 110 xp 8 0 x 124 0 x 10 c 4 Rtn adr 0 x 108 0 0 x 104 -4 %esp %ebp yp %ebp %edi 0 x 114 0 x 104 movl movl 12(%ebp), %ecx 8(%ebp), %edx (%ecx), %eax (%edx), %ebx %eax, (%edx) %ebx, (%ecx) # # # ecx edx eax ebx *xp *yp EECS 213 Introduction to Computer Systems Northwestern University 0 x 100 = = = yp xp *yp (t 1) *xp (t 0) eax ebx 22

Address Understanding swap 123 0 x 124 456 0 x 120 0 x 11 c %eax 456 %edx 0 x 124 %ecx 0 x 120 %ebx 0 x 118 Offset 123 %esi 12 0 x 120 0 x 110 xp 8 0 x 124 0 x 10 c 4 Rtn adr 0 x 108 0 0 x 104 -4 %esp %ebp yp %ebp %edi 0 x 114 0 x 104 movl movl 12(%ebp), %ecx 8(%ebp), %edx (%ecx), %eax (%edx), %ebx %eax, (%edx) %ebx, (%ecx) # # # ecx edx eax ebx *xp *yp EECS 213 Introduction to Computer Systems Northwestern University 0 x 100 = = = yp xp *yp (t 1) *xp (t 0) eax ebx 23

Address Understanding swap 456 0 x 124 456 0 x 120 0 x 11 c %eax 456 %edx 0 x 124 %ecx 0 x 120 %ebx 0 x 118 Offset 123 %esi 12 0 x 120 0 x 110 xp 8 0 x 124 0 x 10 c 4 Rtn adr 0 x 108 0 0 x 104 -4 %esp %ebp yp %ebp %edi 0 x 114 0 x 104 movl movl 12(%ebp), %ecx 8(%ebp), %edx (%ecx), %eax (%edx), %ebx %eax, (%edx) %ebx, (%ecx) # # # ecx edx eax ebx *xp *yp EECS 213 Introduction to Computer Systems Northwestern University 0 x 100 = = = yp xp *yp (t 1) *xp (t 0) eax ebx 24

Address Understanding swap 456 0 x 124 123 0 x 120 0 x 11 c %eax 456 %edx 0 x 124 %ecx 0 x 120 %ebx 0 x 118 Offset 123 %esi 12 0 x 120 0 x 110 xp 8 0 x 124 0 x 10 c 4 Rtn adr 0 x 108 0 0 x 104 -4 %esp %ebp yp %ebp %edi 0 x 114 0 x 104 movl movl 12(%ebp), %ecx 8(%ebp), %edx (%ecx), %eax (%edx), %ebx %eax, (%edx) %ebx, (%ecx) # # # ecx edx eax ebx *xp *yp EECS 213 Introduction to Computer Systems Northwestern University 0 x 100 = = = yp xp *yp (t 1) *xp (t 0) eax ebx 25

A second example void decode 1(int *xp, int *yp, int *zp); movl Movl movl movl 8(%ebp), %edi 12(%ebp), %ebx 16(%ebp), %esi (%edi), %eax (%ebx), %edx (%esi), %ecx %eax, (%ebx) %edx, (%esi) %ecx, (%edi) EECS 213 Introduction to Computer Systems Northwestern University 26

Address computation instruction leal S, D D ← &S – leal = Load Effective Address – S is address mode expression – Set D to address denoted by expression Uses – Computing address w/o doing memory reference • E. g. , translation of p = &x[i]; – Computing arithmetic expressions of form x + k*y k = 1, 2, 4, or 8. leal 7(%edx, 4), %eax – when %edx=x, %eax becomes 5 x+7 EECS 213 Introduction to Computer Systems Northwestern University 27

Some arithmetic operations Instruction Effect Description incl D D←D+1 Increment decl D D←D– 1 Decrement negl D D ← -D Negate notl D D ← ~D Complement addl S, D D←D+S Add subl S, D D←D–S Substract imull S, D D←D*S Multiply xorl S, D D←D^S Exclusive or orl S, D D←D|S Or andl S, D D←D&S And sall k, D D ← D << k Left shift shll k, D D ← D << k Left shift (same as sall) sarl k, D D ← D >> k Arithmetic right shift shrl k, D D ← D >> k Logical right shift EECS 213 Introduction to Computer Systems Northwestern University 28

Using leal for arithmetic expressions int arith (int x, int y, int z) { int t 1 = x+y; int t 2 = z+t 1; int t 3 = x+4; int t 4 = y * 48; int t 5 = t 3 + t 4; int rval = t 2 * t 5; return rval; } arith: pushl %ebp movl %esp, %ebp movl 8(%ebp), %eax movl 12(%ebp), %edx leal (%edx, %eax), %ecx leal (%edx, 2), %edx sall $4, %edx addl 16(%ebp), %ecx leal 4(%edx, %eax), %eax imull %ecx, %eax movl %ebp, %esp popl %ebp ret EECS 213 Introduction to Computer Systems Northwestern University Set Up Body Finish 29

Understanding arith int arith (int x, int y, int z) { int t 1 = x+y; int t 2 = z+t 1; int t 3 = x+4; int t 4 = y * 48; int t 5 = t 3 + t 4; int rval = t 2 * t 5; return rval; } movl 8(%ebp), %eax movl 12(%ebp), %edx leal (%edx, %eax), %ecx leal (%edx, 2), %edx sall $4, %edx addl 16(%ebp), %ecx leal 4(%edx, %eax), %eax imull %ecx, %eax # # # # eax edx ecx eax Offset • • • 16 z 12 y 8 x 4 Rtn adr 0 Old %ebp = = = = Stack %ebp x y x+y (t 1) 3*y 48*y (t 4) z+t 1 (t 2) 4+t 4+x (t 5) t 5*t 2 (rval) EECS 213 Introduction to Computer Systems Northwestern University 30

$Another example int logical(int x, int y) { int t 1 = x^y; int$

Another example int logical(int x, int y) { int t 1 = x^y; int t 2 = t 1 >> 17; int mask = (1<<13) - 7; int rval = t 2 & mask; return rval; } logical: pushl %ebp movl %esp, %ebp movl xorl sarl andl movl %ebp, %esp popl %ebp ret mask 213 = 8192, 213 – 7 = 8185 movl xorl sarl andl 8(%ebp), %eax 12(%ebp), %eax $17, %eax $8185, %eax 12(%ebp), %eax 8(%ebp), %eax $17, %eax $8185, %eax eax eax = = Set Up Body Finish x x^y (t 1) t 1>>17 (t 2) t 2 & 8185 EECS 213 Introduction to Computer Systems Northwestern University 31

CISC Properties Instruction can reference different operand types – Immediate, register, memory Arithmetic operations can read/write memory Memory reference can involve complex computation – Rb + S*Ri + D – Useful for arithmetic expressions, too Instructions can have varying lengths – IA 32 instructions can range from 1 to 15 bytes EECS 213 Introduction to Computer Systems Northwestern University 32

Whose assembler? Intel/Microsoft Format GAS/Gnu Format lea sub cmp mov leal subl cmpl movl eax, [ecx+ecx*2] esp, 8 dword ptr [ebp-8], 0 eax, dword ptr [eax*4+100 h] (%ecx, 2), %eax $8, %esp $0, -8(%ebp) $0 x 100(, %eax, 4), %eax Intel/Microsoft Differs from GAS – Operands listed in opposite order mov Dest, Src movl Src, Dest – Constants not preceded by ‘$’, Denote hex with ‘h’ at end 100 h $0 x 100 – Operand size indicated by operands rather than operator suffix subl – Addressing format shows effective address computation [eax*4+100 h] $0 x 100(, %eax, 4) EECS 213 Introduction to Computer Systems Northwestern University 33