Assemblers Linkers and Loaders Hakim Weatherspoon CS 3410
Assemblers, Linkers, and Loaders Hakim Weatherspoon CS 3410 Computer Science Cornell University [Weatherspoon, Bala, Bracy, and Sirer]
Big Picture: Where are we going? C compiler int x = 10; x = 2 * x + 15; RISC-V assembly addi x 5, x 0, 10 muli x 5, 2 addi x 5, 15 assembler 10 r 5 15 r 5 op = addi 00001010000001010010011 0000010001010000000111100101000001010010011 machine code CPU op = r-type x 5 shamt=1 op = addi x 5 func=sll 32 RF 32 Circuits Gates x 0 = 0 x 5 = x 0 + 10 x 5 = x 5<<1 #x 5 = x 5 * 2 x 5 = x 15 + 15 A B Transistors Silicon 2
Big Picture: Where are we going? compiler int x = 10; x = 2 * x + 15; RISC-V assembly addi x 5, x 0, 10 muli x 5, 2 addi x 5, 15 C assembler machine code CPU High Level Languages 00001010000001010010011 0000010001010000000111100101000001010010011 Instruction Set Architecture (ISA) Circuits Gates Transistors Silicon 3
From Writing to Running Compiler gcc -S Assembler Linker gcc -c sum. s C source files assembly files gcc -o sum obj files When most people say “compile” they mean the entire process: compile + assemble + link executable program exists on disk “It’s alive!” loader Executing in Memory process 4
Example: sum. c #include <stdio. h> int n = 100; int main (int argc, char* argv[ ]) { int i; int m = n; int sum = 0; for (i = 1; i <= m; i++) { sum += i; } printf ("Sum 1 to %d is %dn", n, sum); } 6
Compiler Input: Code File (. c) • Source code • #includes, function declarations & definitions, global variables, etc. Output: Assembly File (RISC-V) • RISC-V assembly instructions (. s file) for (i = 1; i <= m; i++) { sum += i; } li x 2, 1 lw x 3, fp, 28 slt x 2, x 3, x 2 7
sum. s. globl. data. type n: . word. rdata $str 0: . string. text. globl. type main: addiu sw sw move sw sw la lw sw sw li sw (abridged) $L 2: n n, @object 100 "Sum 1 to %d is %dn" main, @function $sp, -48 $ra, 44($sp) $fp, 40($sp) $fp, $sp $a 0, -36($fp) $a 1, -40($fp) $a 5, n $a 5, 0($a 5) $a 5, -28($fp) $0, -24($fp) $a 5, 1 $a 5, -20($fp) $L 3: lw lw blt $a 4, -20($fp) $a 5, -28($fp) $a 5, $a 4, $L 3 lw lw addu sw lw addi sw j la lw lw jal li mv lw lw addiu $a 4, -24($fp) $a 5, -20($fp) $a 5, $a 4, $a 5, -24($fp) $a 5, -20($fp) $a 5, 1 $a 5, -20($fp) $L 2 $4, $str 0 $a 1, -28($fp) $a 2, -24($fp) printf $a 0, 0 $sp, $fp $ra, 44($sp) $fp, 40($sp) $sp, 48 8
Assembler Input: Assembly File (. s) • assembly instructions, pseudo-instructions • program data (strings, variables), layout directives Output: Object File in binary machine code RISC-V instructions in executable form (. o file in Unix, . obj in Windows) addi r 5, r 0, 10 muli r 5, 2 addi r 5, 15 00001010000001010010011 00000100010100000101 00001111001010000010100100119
RISC-V Assembly Instructions Arithmetic/Logical • ADD, SUB, AND, OR, XOR, SLTU • ADDI, ANDI, ORI, XORI, LUI, SLL, SRL, SLTIU • MUL, DIV Memory Access • LW, LH, LB, LHU, LBU, • SW, SH, SB Control flow • BEQ, BNE, BLT, BGE • JAL, JALR Special • LR, SCALL, SBREAK 10
Pseudo-Instructions Assembly shorthand, technically not machine instructions, but easily converted into 1+ instructions that are Pseudo-Insns Actual Insns NOP SLL x 0, 0 Functionality MOVE reg, reg ADD r 2, r 0, r 1 # copy between regs LI reg, 0 x 45678 LUI reg, 0 x 4 #load immediate ORI reg, 0 x 5678 LA reg, label # load address (32 bits) B # unconditional branch BLT reg, label SLT r 1, r. A, r. B BNE r 1, r 0, label + a few more… # do nothing # branch less than 11
Program Layout • Programs consist of segments used for different purposes • Text: holds instructions • Data: holds statically allocated program data such as variables, strings, etc. “cornell cs” data text 13 25 add x 1, x 2, x 3 ori x 2, x 4, 3. . .
Assembling Programs • Assembly files consist of a mix of. text • + instructions. ent main • + pseudo-instructions main: la $4, Larray • + assembler (data/layout) directives li $5, 15 • (Assembler lays out binary values. . . in memory based on directives) li $4, 0 • jal exit • Assembled to an Object File Larray: . end main • Header • Text Segment. data • Data Segment • Relocation Information • Symbol. long 51, 491, 3991 Table • Debugging Information
Assembling Programs • Assembly using a (modified) Harvard architecture • Need segments since data and program stored together in memory Registers ALU CPU Control data, address, control 1010000 10110000011 0010101. . . Program Memory 00100000001 001000000100. . . Data Memory
Takeaway • Assembly is a low-level task • Need to assemble assembly language into machine code binary. Requires - Assembly language instructions - pseudo-instructions - And Specify layout and data using assembler directives • Today, we use a modified Harvard Architecture (Von Neumann architecture) that mixes data and instructions in memory … but kept in separate segments … and has separate caches
Symbols and References math. c int pi = 3; int e = 2; static int randomval = 7; Global labels: Externally visible “exported” symbols extern int usrid; extern int printf(char *str, …); • • • int square(int x) { … } static int is_prime(int x) { … } int pick_prime() { … } int get_n() { return usrid; } (extern == defined in another file) Can be referenced from other object files Exported functions, global variables Examples: pi, e, userid, printf, pick_prime, pick_random Local labels: Internally visible only symbols • • • Only used within this object file static functions, static variables, loop labels, … Examples: randomval, is_prime 16
Handling forward references Example: L: bne x 1, x 2, L Looking for L sll x 0, 0 addi x 2, x 3, 0 x 2 Found L The assembler will change this to bne x 1, x 2, +1 sll x 0, 0 addi x 2, x 3, 0 x 2 Final machine code 0 X 14220001 # bne 0 x 0000 # sll 0 x 24620002 # addiu actually: 000101. . . 000000. . . 001001. . . 17
Object file Header • Size and position of pieces of file Text Segment Object File • instructions Data Segment • static data (local/global vars, strings, constants) Debugging Information • line number code address map, etc. Symbol Table • External (exported) references • Unresolved (imported) references 18
Object File Formats Unix • a. out • COFF: Common Object File Format • ELF: Executable and Linking Format Windows • PE: Portable Executable All support both executable and object files 19
Objdump disassembly > mipsel-linux-objdump --disassemble math. o Disassembly of section. text: 0000 <get_n>: 0: 27 bdfff 8 4: afbe 0000 8: 03 a 0 f 021 c: 3 c 020000 10: 8 c 420008 14: 03 c 0 e 821 18: 8 fbe 0000 1 c: 27 bd 0008 20: 03 e 00008 24: 0000 addiu sp, -8 sw s 8, 0(sp) move s 8, sp lui v 0, 0 x 0 lw v 0, 8(v 0) move sp, s 8 lw s 8, 0(sp) addiu sp, 8 jr ra nop elsewhere in another file: int usrid = 41; int get_n() { return usrid; } 20
Objdump symbols > mipsel-linux-objdump --syms math. o SYMBOL TABLE: 0000 l df 00000000 l d 00000008 l O 00000060 l F 00000000 l d 0000 g O 00000004 g O 0000 g F 00000028 g F 00000088 g F 00000000 segment *ABS*. text. data. bss. data. text. rodata. comment. data. text *UND* size 00000000 00000004 00000028 000000004 00000028 00000038 0000004 c 00000000 [F]unction [O]bject [l]ocal [g]lobal math. c. text. data. bss randomval is_prime. rodata. comment pi e get_n square pick_prime usrid printf 21
Separate Compilation & Assembly Compiler sum. c Assembler sum. s sum. o math. c math. s math. o source files assembly files obj files Linker executable program sum exists on disk loader Executing in Memory process 22
Linkers Linker combines object files into an executable file • Resolve as-yet-unresolved symbols • Each has illusion of own address space Relocate each object’s text and data segments • Record top-level entry point in executable file End result: a program on disk, ready to execute E. g. . /sum Linux. /sum. exe Windows simulate sum Class RISC-V simulator 23
Static Libraries Static Library: Collection of object files (think: like a zip archive) Q: Every program contains the entire library? !? 24
Linker Example: Loading a Global Variable 28, JAL, printf 30, LUI, usrid 34, LA, usrid . . . 21032040 0 C 40023 C 1 b 301402 LA num: 3 C 041000 LUI 1000 34040004 ORI 0004. . . 0040 0100 0 C 40023 C 21035000 1 b 80050 c 8 C 048004 21047002 0 C 400020. . . 10201000 0040 0200 21040330 22500102. . . 1000 0000 pi 00000003 usrid 0077616 B main math 0040 0000 printf . . . 24 21032040 28 0 C 000000 2 C 1 b 301402 30 3 C 040000 34 34040000. . . 20 T get_n 00 D pi *UND* printf *UND* usrid sum. exe . text . . . 40 0 C 000000 44 21035000 48 1 b 80050 C 4 C 8 C 040000 50 21047002 54 0 C 000000. . . 00 T main 00 D usrid *UND* printf *UND* pi *UND* get_n 40, JAL, printf. . . 54, JAL, get_n math. o . data Relocation info Symbol table . text main. o Entry: 0040 text: 0040 data: 1000 0100 0000 25 0000
Compiler sum. c math. c C source files Assembler Linker sum. s math. s io. s assembly files sum. o math. o io. o libc. o libm. o obj files executable program sum. exe exists on disk loader Executing in Memory process 26
Loaders Loader reads executable from disk into memory • Initializes registers, stack, arguments to first function • Jumps to entry-point Part of the Operating System (OS) 27
Shared Libraries Q: Every program contains parts of same library? !? 28
Static and Dynamic Linking Static linking • Big executable files (all/most of needed libraries inside) • Don’t benefit from updates to library • No load-time linking Dynamic linking • Small executable files (just point to shared library) • Library update benefits all programs that use it • Load-time cost to do final linking - But dll code is probably already in memory - And can do the linking incrementally, on-demand 29
Takeaway Compiler produces assembly files (contain RISC-V assembly, pseudo-instructions, directives, etc. ) Assembler produces object files (contain RISC-V machine code, missing symbols, some layout information, etc. ) Linker joins object files into one executable file (contains RISC-V machine code, no missing symbols, some layout information) Loader puts program into memory, jumps to 1 st insn, and starts executing a process (machine code) 30
- Slides: 29