COMP 40 Machine Structure and Assembly Language Programming
COMP 40: Machine Structure and Assembly Language Programming – Spring 2014 Machine/Assembler Language Control Flow & Compiling Function Calls Noah Mendelsohn Tufts University Email: noah@cs. tufts. edu Web: http: //www. cs. tufts. edu/~noah
Goals for today – learn: § Review last week’s intro to machine & assembler lang § More examples § Function calls § Control flow 2 © 2010 Noah Mendelsohn
Review & A Few Updates © 2010 Noah Mendelsohn
General Purpose Registers 63 0 %rax 4 %eax %ah $al © 2010 Noah Mendelsohn
General Purpose Registers 63 0 %rax %eax %ah $al mov $123, %rax 5 © 2010 Noah Mendelsohn
General Purpose Registers 63 0 %rax %eax %ah $al mov $123, %ax 6 © 2010 Noah Mendelsohn
General Purpose Registers 63 0 %rax %eax %ah $al mov $123, %eax 7 © 2010 Noah Mendelsohn
X 86 -64 / AMD 64 / IA 64 General Purpose Registers 63 8 31 0 0 63 %rax %eax %ah $al %r 8 %rcx %ecx %ch $cl %r 9 %rdx %edx %dh $dl %r 10 %rbx %ebx %bh $bl %r 11 %rsi %esi %r 12 %rdi %edi %r 13 %rsp %esp %r 14 %rbp %ebp %r 15 © 2010 Noah Mendelsohn
Classes of AMD 64 registers new since last lecture § General purpose registers – 16 registers, 64 bits each – Used to compute integer and pointer values – Used for integer call/return values to functions § XMM registers – 16 Registers, 128 bits each – Used to compute float/double values, and for parallel integer computation – Used to pass double/float call/return values § X 87 Floating Point registers – 8 registers, 80 bits each – Used to compute, pass/return long double 9 © 2010 Noah Mendelsohn
Machine code (typical) § Simple instructions – each does small unit of work § Stored in memory § Bitpacked into compact binary form § Directly executed by transistor/hardware logic* * We’ll show later that some machines execute user-provided machine code directly, some convert it to an even lower level machine code and execute that. 10 © 2010 Noah Mendelsohn
Here’s the machine code for our function int times 16(int i) { Remember: return i * 16; } This is what’s really in memory and what the machine executes! 89 f 8 c 1 e 0 04 c 3 11 © 2010 Noah Mendelsohn
Here’s the machine code for our function int times 16(int i) { return i * 16; } But what does it mean? ? 89 f 8 c 1 e 0 04 c 3 Does it really implement the times 16 function? 12 © 2010 Noah Mendelsohn
Consider a simple function in C int times 16(int i) { return i * 16; } 0: 2: 5: 13 89 f 8 c 1 e 0 04 c 3 mov shl retq %edi, %eax $0 x 4, %eax © 2010 Noah Mendelsohn
Consider a simple function in C int times 16(int i) { return i * 16; } 0: 2: 5: 14 Load i into result register %eax 89 f 8 mov %edi, %eax c 1 e 0 04 shl $0 x 4, %eax c 3 retq © 2010 Noah Mendelsohn
Consider a simple function in C int times 16(int i) { return i * 16; } 0: 2: 5: 15 Shifting left by 4 is quick way to multiply by 16. 89 f 8 mov %edi, %eax c 1 e 0 04 shl $0 x 4, %eax c 3 retq © 2010 Noah Mendelsohn
Consider a simple function in C int times 16(int i) { return i * 16; } 0: 2: 5: Return to caller, which will look for result in %eax 89 f 8 mov %edi, %eax c 1 e 0 04 shl $0 x 4, %eax c 3 retq REMEMBER: you can see the assembler code for any C program by running gcc with the –S flag. Do it!! 16 © 2010 Noah Mendelsohn
INTERPRETER Software or hardware that does what the instructions say COMPILER Software that converts a program to another language ASSEMBLER Like a compiler, but the input assembler language is (mostly)1 -to-1 with machine instructions 17 © 2010 Noah Mendelsohn
Very simplified view of computer Memory Cache 18 © 2010 Noah Mendelsohn
Very simplified view of computer Memory 89 f 8 c 1 e 0 04 c 3 Cache 19 © 2010 Noah Mendelsohn
Instructions fetched and decoded Memory 89 f 8 c 1 e 0 04 c 3 Cache 20 © 2010 Noah Mendelsohn
Instructions fetched and decoded Memory 89 f 8 c 1 e 0 04 c 3 Cache ALU Arithmetic and Logic Unit executes instructions like add and shift updating registers. 21 © 2010 Noah Mendelsohn
The MIPS CPU Architecture 22 © 2010 Noah Mendelsohn
Remember We will teach some highlights of machine/assembler code here in class but you must take significant time to learn and practice on your own! Suggestions for teaching yourself: 23 • Read Bryant and O’Hallaron carefully • Remember that 64 bit section was added later • Look for online reference material • Some linked from HW 5 • Write examples, compile with –S, figure out resulting. s file! © 2010 Noah Mendelsohn
Moving Data and Calculating Values © 2010 Noah Mendelsohn
Examples %rax // contents of rax is data (%rax) // data pointed to by rax 0 x 10(%rax) // get *(16 + rax) $0 x 4089 a 0(, %rax, 8) // Global array index // of 8 -byte things (%ebx, %ecx, 2) // Add base and scaled index 4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4 25 © 2010 Noah Mendelsohn
Examples %rax // contents of rax is data (%rax) // data pointed to by rax 0 x 10(%rax) // get *(16 + rax) $0 x 4089 a 0(, %rax, 8) // Global array index // of 8 -byte things (%ebx, %ecx, 2) // Add base and scaled index 4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4 movl (%ebx, %ecx, 1), %edx // edx <- *(ebx + (ecx * 1)) leal (%ebx, %ecx, 1), %edx // edx <- (ebx + (ecx * 1)) 26 © 2010 Noah Mendelsohn
Examples movl Moves the data at the address: %rax // contents of rax is data (%rax) // data pointed to by rax similar to: 0 x 10(%rax) // get *(16 + rax) $0 x 4089 a 0(, %rax, 8) // Global array index char *ebxp; // of 8 -byte things char edx = ebxp[ecx] (%ebx, %ecx, 2) // Add base and scaled index 4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4 movl (%ebx, %ecx, 1), %edx // edx <- *(ebx + (ecx * 1)) leal (%ebx, %ecx, 1), %edx // edx <- (ebx + (ecx * 1)) 27 © 2010 Noah Mendelsohn
Examples leal Moves the address itself: %rax // contents of rax is data (%rax) // data pointed to by rax similar to: 0 x 10(%rax) // get *(16 + rax) $0 x 4089 a 0(, %rax, 8) // Global array index char *ebxp; // of 8 -byte things char *edxp = &(ebxp[ecx]); (%ebx, %ecx, 2) // Add base and scaled index 4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4 movl (%ebx, %ecx, 1), %edx // edx <- *(ebx + (ecx * 1)) leal (%ebx, %ecx, 1), %edx // edx <- (ebx + (ecx * 1)) 28 © 2010 Noah Mendelsohn
Examples scale factors support indexing larger types %rax // contents of rax is data similar to: (%rax) // data pointed to by rax 0 x 10(%rax) // get *(16 + rax) int *ebxp; $0 x 4089 a 0(, %rax, 8) // Global array index int edxp = ebxp[ecx]; // of 8 -byte things (%ebx, %ecx, 2) // Add base and scaled index 4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4 movl (%ebx, %ecx, 1), %edx // edx <- *(ebx + (ecx * 1)) leal (%ebx, %ecx, 1), %edx // edx <- (ebx + (ecx * 1)) movl (%ebx, %ecx, 4), %edx // edx <- *(ebx + (ecx * 4)) leal (%ebx, %ecx, 4), %edx // edx <- (ebx + (ecx * 4)) 29 © 2010 Noah Mendelsohn
Control Flow © 2010 Noah Mendelsohn
Simple jumps. L 4: …code here… j 31 . L 4 // jump back to L 4 © 2010 Noah Mendelsohn
Conditional jumps. L 4: movq (%rdi, %rdx), %rcx leaq (%rax, %rcx), %rsi here… testq …code %rcx, %rcx cmovg %rsi, %rax addq $8, %rdx cmpq %r 8, %rdx jne. L 4 // conditional: jump iff %r 8 != %rdf This technique is the key to compiling if statements, for loops, while loops, etc. 32 © 2010 Noah Mendelsohn
Calling Functions © 2010 Noah Mendelsohn
Why have a standard “linkage” for calling functions? § Functions are compiled separately and linked together § We need to standardize enough that function calls will succeed! § Note: optimizing compilers may “cheat” when caller and callee are in the same source file – More on this later 34 © 2010 Noah Mendelsohn
An interesting example int fact(int n) { if (n == 0) return 1; else return n * fact(n - 1); } See course notes on “Calls and Returns” 35 © 2010 Noah Mendelsohn
The process memory illusion § Process thinks it's running in a private space § Separated into segments, from address 0 § Stack: memory for executing subroutines § Heap: memory for malloc/new § Global static variables § Text segment: where program lives Now we’ll learn how your program uses these! 36 © 2010 Noah Mendelsohn
The process memory illusion argv, environ § Process thinks it's running in a private space Stack § Separated into segments, from address 0 § Stack: memory for executing subroutines § Heap: memory for malloc/new § Global static variables § Text segment: where program lives Heap (malloc’d) Static uninitialized Loaded with your program Static initialized Text (code) 0 37 © 2010 Noah Mendelsohn
The process memory illusion argv, environ § Process thinks it's running in a private space Stack § Separated into segments, from address 0 § Stack: memory for executing subroutines § Heap: memory for malloc/new to study in depth § We’re Global about static variables how function calls use the stack. § Text segment: where program lives Heap (malloc’d) Static uninitialized Loaded with your program Static initialized Text (code) 0 38 © 2010 Noah Mendelsohn
Function calls on Linux/AMD 64 § Caller “pushes” return address on stack § Where practical, arguments passed in registers § Exceptions: – Structs, etc. – Too many – What can’t be passed in registers is at known offsets from stack pointer! § Return values – In register, typically %rax for integers and pointers – Exception: structures § Each function gets a stack frame – Leaf functions that make no calls may skip setting one up 39 © 2010 Noah Mendelsohn
The stack – general case Before call After callq Arguments If callee needs frame Arguments Return address ? ? ? ? %rsp framesize Callee vars Args to next call? %rsp sub $0 x{framesize}, %rsp 40 © 2010 Noah Mendelsohn
Arguments/return values in registers Operand Size Argument Number 1 2 3 4 5 6 64 %rdi %rsi %rdx %rcx %r 8 %r 9 32 %edi %esi %edx %ecx %r 8 d %r 9 d 16 %di %si %dx %cx %r 8 w %r 9 w 8 %dil %sil %dl %cl %r 8 b %r 9 b Arguments and return values passed in registers when types are suitable and when there aren’t too many Return values usually in %rax, %eax, etc. Callee may change these and some other registers! MMX and FP 87 registers used for floating point Read the specifications for full details! 41 © 2010 Noah Mendelsohn
Factorial Revisited int fact(int n) { if (n == 0) return 1; else return n * fact(n - 1); } fact: . LFB 2: pushq %rbx. LCFI 0: movq %rdi, %rbx movl $1, %eax testq %rdi, %rdi je . L 4 leaq -1(%rdi), %rdi call fact imulq %rbx, %rax. L 4: popq %rbx ret 42 © 2010 Noah Mendelsohn
Function calls on Linux/AMD 64 (cont. ) § Much of what you’ve seen can be skipped for “leaf” functions in the call tree § Inlining: – If small procedure is in same source file (or included header): just merge the code – Unless function is static (I. e. private to source file, the compiler still needs to create the normal version, in case anyone outside calls it! § Optimizing compilers “cheat” – Don’t build full stack – Leave return address for called function (last thing A calls B; last think B does is call C B leaves return address on stack and branches to C instead of calling it…when C does normal return, it goes straigt back to A!) – Many other wild optimizations, always done so other functions can’t tell anything unusual is happening! 43 © 2010 Noah Mendelsohn
Optimized version What happened to the recursion? !? !? fact: . LFB 2: pushq %rbx. LCFI 0: movq %rdi, %rbx movl $1, %eax testq %rdi, %rdi This version doesn’t need je . L 4 to create a stack frame leaq -1(%rdi), %rdi either! call fact imulq %rbx, %rax. L 4: popq %rbx ret Lightly optimized = O 1 (what we saw before) 44 int fact(int n) { if (n == 0) return 1; else return n * fact(n - 1); } fact: . LFB 2: testq %rdi, %rdi movl $1, %eax je . L 6 . p 2 align 4, , 7 . L 5: imulq %rdi, %rax subq $1, %rdi jne . L 5 . L 6: rep ; ret Heavily optimized = O 2 © 2010 Noah Mendelsohn
Getting the details on function call “linkages” § Bryant and O’Halloran has excellent introduction – Watch for differences between 32 and 64 bit § The official specification: – System V Application Binary Interface: AMD 64 Architecture Processor Supplement – Find it at: http: //www. cs. tufts. edu/comp/40/readings/amd 64 -abi. pdf – See especially sections 3. 1 and 3. 2 45 © 2010 Noah Mendelsohn
Summary § C code compiled to assembler § Data moved to registers for manipulation § Conditional and jump instructions for control flow § Stack used for function calls § Compilers play all sorts of tricks when compiling code 46 © 2010 Noah Mendelsohn
- Slides: 46