Assemblers Linkers and Loaders CS 3410 Computer System
Assemblers, Linkers, and Loaders CS 3410 Computer System Organization & Programming These slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, and Sirer.
Big Picture: Where are we going? int x = 10; x = x + 15; C compiler MIPS assembly r 0 = 0 r 5 = r 0 + 10 r 5 = r 15 + 15 addi r 0 r 5 10 001000001010000001010 001000001010000001111 addi r 5, r 0, 10 addi r 5, 15 assembler machine code loader CPU 32 RF 32 Circuits Gates A B Transistors Silicon 2
Big Picture: Where are we going? compiler int x = 10; x = 2 * x + 15; MIPS assembly addi r 5, r 0, 10 muli r 5, 2 addi r 5, 15 machine code 001000001010000001010 000000010100001000000 001000001010000001111 Instruction Set Architecture (ISA) C assembler loader CPU High Level Languages Circuits Gates Transistors Silicon 3
From Writing to Running Compiler gcc -S Assembler Linker gcc -c sum. s C source files assembly files gcc -o sum obj files When most people say “compile” they mean the entire process: compile + assemble + link executable program exists on disk “It’s alive!” loader Executing in Memory process 4
Example: sum. c #include <stdio. h> int n = 100; int main (int argc, char* argv[ ]) { int i; int m = n; int sum = 0; for (i = 1; i <= m; i++) { sum += i; } printf ("Sum 1 to %d is %dn", n, sum); } 6
Compiler Input: Code File (. c) • Source code • #includes, function declarations & definitions, global variables, etc. Output: Assembly File (MIPS) • MIPS assembly instructions (. s file) for (i = 1; i <= m; i++) { sum += i; } li $2, 1 lw $3, 28($fp) slt $2, $3, $2 7
sum. s. globl. data. type n: . word. rdata $str 0: . ascii. text. globl. type main: addiu sw sw move sw sw la lw sw sw li sw (abridged) $L 2: n n, @object 100 "Sum 1 to %d is %dn" main, @function $sp, -48 $31, 44($sp) $fp, 40($sp) $fp, $sp $4, 48($fp) $5, 52($fp) $2, n $2, 0($2) $2, 28($fp) $0, 32($fp) $2, 1 $2, 24($fp) $L 3: lw lw slt bne lw lw addu sw lw addiu sw b la lw lw jal move lw lw addiu j $2, 24($fp) $3, 28($fp) $2, $3, $2 $2, $0, $L 3 $3, 32($fp) $2, 24($fp) $2, $3, $2 $2, 32($fp) $2, 24($fp) $2, 1 $2, 24($fp) $L 2 $4, $str 0 $5, 28($fp) $6, 32($fp) printf $sp, $fp $31, 44($sp) $fp, 40($sp) $sp, 48 $31 8
Assembler Input: Assembly File (. s) • assembly instructions, pseudo-instructions • program data (strings, variables), layout directives Output: Object File in binary machine code MIPS instructions in executable form (. o file in Unix, . obj in Windows) addi r 5, r 0, 10 muli r 5, 2 addi r 5, 15 001000001010000001010 000000010100001000000 0010000010100000011119
MIPS Assembly Instructions Arithmetic/Logical • ADD, ADDU, SUBU, AND, OR, XOR, NOR, SLTU • ADDI, ADDIU, ANDI, ORI, XORI, LUI, SLL, SRL, SLLV, SRAV, SLTIU • MULT, DIV, MFLO, MTLO, MFHI, MTHI Memory Access • LW, LH, LB, LHU, LBU, LWL, LWR • SW, SH, SB, SWL, SWR Control flow • BEQ, BNE, BLEZ, BLTZ, BGEZ, BGTZ • J, JR, JALR, BEQL, BNEL, BLEZL, BGTZL Special • LL, SC, SYSCALL, BREAK, SYNC, COPROC 10
Pseudo-Instructions Assembly shorthand, technically not machine instructions, but easily converted into 1+ instructions that are Pseudo-Insns Actual Insns NOP SLL r 0, 0 Functionality MOVE reg, reg ADD r 2, r 0, r 1 # copy between regs LI reg, 0 x 45678 LUI reg, 0 x 4 #load immediate ORI reg, 0 x 5678 LA reg, label # load address (32 bits) B # unconditional branch BLT reg, label SLT r 1, r. A, r. B BNE r 1, r 0, label + a few more… # do nothing # branch less than 11
Program Layout • Programs consist of segments used for different purposes • Text: holds instructions • Data: holds statically allocated program data such as variables, strings, etc. “cornell cs” data 13 text add r 1, r 2, r 3 25 ori r 2, r 4, 3. . .
Assembling Programs • Assembly files consist of a mix of. text • + instructions. ent main • + pseudo-instructions main: la $4, Larray • + assembler (data/layout) directives li $5, 15 • (Assembler lays out binary values. . . in memory based on directives) li $4, 0 • jal exit • Assembled to an Object File Larray: . end main • Header • Text Segment. data • Data Segment • Relocation Information • Symbol. long 51, 491, 3991 Table • Debugging Information
Assembling Programs • Assembly using a (modified) Harvard architecture • Need segments since data and program stored together in memory Registers ALU CPU Control data, address, control 1010000 10110000011 0010101. . . Program Memory 00100000001 001000000100. . . Data Memory
Takeaway • Assembly is a low-level task • Need to assemble assembly language into machine code binary. Requires - Assembly language instructions - pseudo-instructions - And Specify layout and data using assembler directives • Today, we use a modified Harvard Architecture (Von Neumann architecture) that mixes data and instructions in memory … but kept in separate segments … and has separate caches
Symbols and References math. c int pi = 3; int e = 2; static int randomval = 7; Global labels: Externally visible “exported” symbols extern int usrid; extern int printf(char *str, …); • • • int square(int x) { … } static int is_prime(int x) { … } int pick_prime() { … } int get_n() { return usrid; } (extern == defined in another file) Can be referenced from other object files Exported functions, global variables Examples: pi, e, userid, printf, pick_prime, pick_random Local labels: Internally visible only symbols • • • Only used within this object file static functions, static variables, loop labels, … Examples: randomval, is_prime 16
Handling forward references Example: bne $1, $2, L sll $0, 0 L: addiu $2, $3, 0 x 2 Looking for L Found L The assembler will change this to bne $1, $2, +1 sll $0, 0 addiu $2, $3, $0 x 2 Final machine code 0 X 14220001 # bne 0 x 0000 # sll 0 x 24620002 # addiu actually: 000101. . . 000000. . . 001001. . . 17
Object file Header • Size and position of pieces of file Text Segment Object File • instructions Data Segment • static data (local/global vars, strings, constants) Debugging Information • line number code address map, etc. Symbol Table • External (exported) references • Unresolved (imported) references 18
Object File Formats Unix • a. out • COFF: Common Object File Format • ELF: Executable and Linking Format Windows • PE: Portable Executable All support both executable and object files 19
Objdump disassembly > mipsel-linux-objdump --disassemble math. o Disassembly of section. text: 0000 <get_n>: 0: 27 bdfff 8 4: afbe 0000 8: 03 a 0 f 021 c: 3 c 020000 10: 8 c 420008 14: 03 c 0 e 821 18: 8 fbe 0000 1 c: 27 bd 0008 20: 03 e 00008 24: 0000 addiu sp, -8 sw s 8, 0(sp) move s 8, sp lui v 0, 0 x 0 lw v 0, 8(v 0) move sp, s 8 lw s 8, 0(sp) addiu sp, 8 jr ra nop elsewhere in another file: int usrid = 41; int get_n() { return usrid; } 20
Objdump symbols > mipsel-linux-objdump --syms math. o SYMBOL TABLE: 0000 l df 00000000 l d 00000008 l O 00000060 l F 00000000 l d 0000 g O 00000004 g O 0000 g F 00000028 g F 00000088 g F 00000000 segment *ABS*. text. data. bss. data. text. rodata. comment. data. text *UND* size 00000000 00000004 00000028 000000004 00000028 00000038 0000004 c 00000000 [F]unction [O]bject [l]ocal [g]lobal math. c. text. data. bss randomval is_prime. rodata. comment pi e get_n square pick_prime usrid printf 21
Separate Compilation & Assembly Compiler sum. c Assembler sum. s sum. o math. c math. s math. o source files assembly files obj files Linker executable program sum exists on disk loader Executing in Memory process 22
Linkers Linker combines object files into an executable file • Resolve as-yet-unresolved symbols • Each has illusion of own address space Relocate each object’s text and data segments • Record top-level entry point in executable file End result: a program on disk, ready to execute E. g. . /sum Linux. /sum. exe Windows simulate sum Class MIPS simulator 23
Static Libraries Static Library: Collection of object files (think: like a zip archive) Q: Every program contains the entire library? !? 24
Linker Example: Resolving an External Fn Call math 0040 0000 0040 0100 main . . . 24 21032040 28 0 C 000000 2 C 1 b 301402 30 3 C 040000 34 34040000. . . 20 T get_n 00 D pi *UND* printf *UND* usrid . text 28, JAL, printf 0040 0200 printf. o. . . 3 C T printf 1000 0000 printf . . . 40 0 C 000000 44 21035000 48 1 b 80050 C 4 C 8 C 040000 50 21047002 54 0 C 000000. . . 00 T main 00 D usrid *UND* printf *UND* pi *UND* get_n 40, JAL, printf. . . 54, JAL, get_n sum. exe math. o . data Relocation info Symbol table . text main. o . . . 21032040 0 C 40023 C 1 b 301402 3 C 041000 34040004. . . 0 C 40023 C 21035000 1 b 80050 c 8 C 048004 21047002 0 C 400020. . . 10201000 21040330 22500102. . . global variables go here (later) Entry: 0040 text: 0040 data: 1000 0100 0000 25 0000
Compiler sum. c math. c C source files Assembler Linker sum. s math. s io. s assembly files sum. o math. o io. o libc. o libm. o obj files executable program sum. exe exists on disk loader Executing in Memory process 26
Loaders Loader reads executable from disk into memory • Initializes registers, stack, arguments to first function • Jumps to entry-point Part of the Operating System (OS) 27
Shared Libraries Q: Every program contains parts of same library? ! 28
Static and Dynamic Linking Static linking • Big executable files (all/most of needed libraries inside) • Don’t benefit from updates to library • No load-time linking Dynamic linking • Small executable files (just point to shared library) • Library update benefits all programs that use it • Load-time cost to do final linking - But dll code is probably already in memory - And can do the linking incrementally, on-demand 29
Takeaway Compiler produces assembly files (contain MIPS assembly, pseudo-instructions, directives, etc. ) Assembler produces object files (contain MIPS machine code, missing symbols, some layout information, etc. ) Linker joins object files into one executable file (contains MIPS machine code, no missing symbols, some layout information) Loader puts program into memory, jumps to 1 st insn, and starts executing a process (machine code) 30
- Slides: 29