Assemblers and Linkers Professor Jennifer Rexford COS 217
Assemblers and Linkers Professor Jennifer Rexford COS 217 1
Goals of This Lecture • Compilation process o Compile, assemble, archive, link, execute • Assembling o Representing instructions – Prefix, opcode, addressing modes, operands o Translating labels into memory addresses – Symbol table, and filling in local addresses o Connecting symbolic references with definitions – Relocation records o Specifying the regions of memory – Generating sections (data, BSS, text, etc. ) • Linking o Concatenating object files o Patching references 2
Compilation Pipeline. c • Compiler (gcc): . c . s Compiler o High-level to assembly language . s • Assembler (as): . s . o Assembler o Assembly to machine language • Archiver (ar): . o . a o Object files into single library • Linker (ld): . o +. a a. out o Builds an executable file • Execution (execlp) o Loads executable and starts it . o Archiver. a Linker/Loader a. out Execution 3
Assembler • Assembly language o A symbolic representation of machine instructions • Machine language o Contains everything needed to link, load, and execute the program • Assembler o Translates assembly language into machine language – Translate instruction mnemonics into op-codes – Translate symbolic names for memory locations o Stores the result in an object file (. o) 4
General IA 32 Instruction Format Instruction Opcode prefixes Mod. R/M SIB Displacement Up to 4 1, 2, or 3 byte 1 byte 0, 1, 2, prefixes of opcode (if required) or 4 bytes 1 byte each (optional) 7 6 5 3 2 0 7 6 5 3 2 Mod Reg/ Opcode R/M Scale Index Immediate 0, 1, 2, or 4 bytes 0 Base • Prefixes: we won’t worry about these for now • Opcode • Mod. R/M and SIB (scale-index-base): for memory operands • Displacement and immediate: depending on opcode, Mod. R/M and SIB • Note: byte order is little-endian (low-order byte of word at lower addresses) 5
Example: Push on to Stack • Assembly language: pushl %edx • Machine code: o IA 32 has a separate opcode for push for each register operand – 50: pushl %eax – 51: pushl %ecx 0101 0010 – 52: pushl %edx –… o Results in a one-byte instruction • Observe: sometimes one assembly language instruction can map to a group of different opcodes 6
Example: Load Effective Address • Assembly language: leal (%eax, 4), %eax • Machine code: o Byte 1: 8 D (opcode for “load effective address”) 1000 1101 o Byte 2: 04 (dest %eax, with scale-index-base) 0000 0100 o Byte 3: 80 (scale=4, index=%eax, base=%eax) 1000 0000 Load the address %eax + 4 * %eax into register %eax 7
Example: Movl (Opcode 44) Mod. R/M Instruction prefixes Opcode SIB M o reg R/M S d Displacement B I Immediate Mod=11 table 4 movl %ecx, %ebx 4 3 EAX ECX EDX EBX ESP EBP ESI EDI 3 1000 11 001 ebx ecx mov r/m 32, r 32 4 movl 12(%ecx), %ebx 2 4 mode %_, %_ 2 3 3 8 1000 01 011 00001100 12 ebx (ecx) mode disp 8(%_), %_ Reference: IA-32 Intel Architecture Software Developer’s Manual, volume 2, page 2 -1, page 2 -6, and page 3 -441 0 1 2 3 4 5 6 7 Mod=01 table [EAX]+disp 8 0 [ECX]+disp 8 1 [EDX]+disp 8 2 [EBX]+disp 8 3 [--]+disp 8 4 [EBP]+disp 8 5 [ESI]+disp 8 6 [EDI]+disp 8 7 8
Example: Mov Immediate to Memory Mod. R/M Instruction prefixes Opcode SIB M o reg R/M S d I B Displacement Immediate mov r/m 8, imm 8 4 movb $97, 999 4 2 3 3 1100 0110 00 000 101 32 8 000000000001111100111 999 mode disp 32 01100001 97 Mod=00 table [EAX] 0 [ECX] 1 [EDX] 2 [EBX] 3 [--] 4 disp 32 5 [ESI] 6 [EDI] 7 9
Encoding as Byte String Mod. R/M Instruction prefixes Opcode M o reg R/M S d 4 movb $97, 999 SIB 4 2 3 I Displacement 05 Immediate 32 3 1100 0110 00 000 101 C 6 B 000000000001111100111 E 7 03 00 00 8 01100001 61 little-endian 10
Assembly Language 4 movb $97, 999 4 2 3 3 1100 0110 00 000 101 C 6 05 char grade = 67; … grade = ‘a’; located at address 999 32 000000000001111100111 E 7 03 00 00 8 01100001 61 . globl grade. data grade: . byte 67. text. . . movb $97, grade. . . 11
Symbol Manipulation. text. . . movl count, %eax. . data count: . word 0. . globl loop: cmpl %edx, %eax jge done pushl %edx call foo jmp loop done: Create labels and remember their addresses Deal with the “forward reference problem” 12
Dealing with Forward References • Most assemblers have two passes o Pass 1: symbol definition o Pass 2: instruction assembly • Or, alternatively, o Pass 1: instruction assembly o Pass 2: patch the cross-reference 13
Implementing an Assembler instruction symbol table . s file disk input instruction assemble data section instruction text section instruction bss section in memory structure output . o file disk 14
Input Functions • Read assembly language and produce list of instructions instruction symbol table . s file input instruction assemble data section instruction text section instruction bss section output . o file 15
Input Functions • Lexical analyzer o Group a stream of characters into tokens add %g 1 , 10 , %g 2 • Syntactic analyzer o Check the syntax of the program <MNEMONIC><REG><COMMA><REG> • Instruction list producer o Produce an in-memory list of instruction data structures instruction 16
Instruction Assembly. . . loop: D 0 3 9 [0] cmpl %edx, %eax jge done 4 bytes 1 byte pushl %edx disp? 7 D [2] 5 2 [4] call foo disp? E 8 [5] jmp loop disp? E 9 [10] done: [15] How to compute the address displacements? 17
Symbol Table loop done . globl loop type def disp_s disp_l def label loop done foo loop done address 0 2 5 10 15 loop: cmpl %edx, %eax jge done pushl %edx call foo disp_l jmp loop disp_l done: D 0 3 disp_s 7 5 E E 9 D 2 8 9 [0] [2] [4] [5] [10] [15] 18
Symbol Table loop done . globl loop type def disp_s disp_l def label loop done foo loop done address 0 2 5 10 15 loop: cmpl %edx, %eax jge done pushl %edx call foo disp_l jmp loop disp_l done: D 0 3 disp_s 7 5 E E 9 D 2 8 9 [0] [2] [4] [5] [10] [15] 19
Symbol Table loop done . globl loop type def disp_s disp_l def label loop done foo loop done address 0 2 5 10 15 loop: cmpl %edx, %eax jge done pushl %edx call foo disp_l jmp loop disp_l done: D 0 3 disp_s +13 7 5 E E 9 D 2 8 9 [0] [2] [4] [5] [10] [15] 20
Symbol Table loop done . globl loop type def disp_s disp_l def label loop done foo loop done address 0 2 5 10 15 loop: cmpl %edx, %eax jge done pushl %edx call foo disp_l jmp loop disp_l done: D 0 3 disp_s +13 7 5 E E 9 D 2 8 9 [0] [2] [4] [5] [10] [15] 21
Filling in Local Addresses loop done . globl loop type def disp_s disp_l def label loop done foo loop done address 0 2 5 10 15 loop: cmpl %edx, %eax jge done pushl %edx call foo disp_l jmp loop -10 done: D 0 3 +13 7 5 E E 9 D 2 8 9 [0] [2] [4] [5] [10] [15] 22
Filling in Local Addresses loop done . globl loop type def disp_s disp_l def label loop done foo loop done address 0 2 5 10 15 loop: cmpl %edx, %eax jge done pushl %edx call foo disp_l jmp loop -10 done: D 0 3 +13 7 5 E E 9 D 2 8 9 [0] [2] [4] [5] [10] [15] 23
Filling in Local Addresses loop done . globl loop type def disp_s disp_l def label loop done foo loop done address 0 2 5 10 15 loop: cmpl %edx, %eax jge done pushl %edx call foo disp_l jmp loop -10 done: D 0 3 +13 7 5 E E 9 D 2 8 9 [0] [2] [4] [5] [10] [15] 24
Relocation Records. . globl loop: cmpl %edx, %eax def disp_l loop foo jge done pushl %edx call foo jmp loop done: disp_l -10 0 5 D 0 3 +13 7 5 E E 9 D 2 8 9 [0] [2] [4] [5] [10] [15] 25
Assembler Directives • Delineate segments o. section • Allocate/initialize data and bss segments o. word. half. byte o. ascii. asciz o. align. skip • Make symbols in text externally visible o. global 26
Assemble into Sections • Process instructions and directives to produce object file output structures instruction symbol table . s file input instruction assemble data section instruction text section instruction bss section output . o file 27
Output Functions • Machine language output o Write symbol table and sections into object file instruction symbol table . s file input instruction assemble data section instruction text section instruction bss section output . o file 28
ELF: Executable and Linking Format • Format of. o and a. out files o Output by the assembler o Input and output of linker ELF Header optional for. o files Program Hdr Table Section 1. . . Section n optional for a. out files Section Hdr Table 29
Invoking the Linker • ld bar. o main. o compiled program modules –l libc. a –o a. out library (contains more. o files) output (also in “. o” format, but no undefined symbols) • Invoked automatically by gcc, • but you can call it directly if you like. 30
Multiple Object Files main. o start def disp_l main foo loop disp_l bar. o 0 8 15 def disp_l [0] [2] [4] [7] [8] [12] [15] [20] loop foo disp_l -10 0 5 D 0 3 +13 7 5 E E 9 D 2 8 9 [0] [2] [4] [5] [10] [15] 31
Step 1: Pick An Order main. o start def disp_l main foo loop disp_l bar. o 15+0 15+8 15+15 def disp_l [15] [17] [19] [22] [23] [27] [30] [35] loop foo disp_l -10 0 5 D 0 3 +13 7 5 E E 9 D 2 8 9 [0] [2] [4] [5] [10] [15] 32
Step 1: Pick An Order main. o start def disp_l main foo loop disp_l bar. o 15+0 15+8 15+15 def disp_l [15] [17] [19] [22] [23] [27] [30] [35] loop foo disp_l -10 0 5 D 0 3 +13 7 5 E E 9 D 2 8 9 [0] [2] [4] [5] [10] [15] 33
Step 2: Patch main. o start def disp_l main foo loop 0 -(15+15)=-30 disp_l bar. o 15+0 15+8 15+15 def disp_l [15] [17] [19] [22] [23] [27] [30] [35] loop foo 0 5 D 0 3 +13 7 5 15+8 -5=18 E E -10 9 D 2 8 9 [0] [2] [4] [5] [10] [15] 34
Step 2: Patch main. o start def disp_l main foo loop 0 -(15+15)=-30 disp_l bar. o 15+0 15+8 15+15 def disp_l [15] [17] [19] [22] [23] [27] [30] [35] loop foo 0 5 D 0 3 +13 7 5 15+8 -5=18 E E -10 9 D 2 8 9 [0] [2] [4] [5] [10] [15] 35
Step 2: Patch main. o start def disp_l main foo loop 0 -(15+15)=-30 disp_l bar. o 15+0 15+8 15+15 def disp_l [15] [17] [19] [22] [23] [27] [30] [35] loop foo 0 5 D 0 3 +13 7 5 15+8 -5=18 E E -10 9 D 2 8 9 [0] [2] [4] [5] [10] [15] 36
Step 2: Patch main. o start def disp_l main foo loop 0 -(15+15)=-30 disp_l bar. o 15+0 15+8 15+15 def disp_l [15] [17] [19] [22] [23] [27] [30] [35] loop foo 0 5 D 0 3 +13 7 5 15+8 -5=18 E E -10 9 D 2 8 9 [0] [2] [4] [5] [10] [15] 37
Step 2: Patch main. o start def disp_l main foo loop 0 -(15+15)=-30 bar. o 15+0 15+8 15+15 def disp_l [15] [17] [19] [22] [23] [27] [30] [35] loop foo 0 5 D 0 3 +13 7 5 15+8 -5=18 E E -10 9 D 2 8 9 [0] [2] [4] [5] [10] [15] 38
Step 2: Patch main. o start def disp_l main foo loop 0 -(15+15)=-30 bar. o 15+0 15+8 15+15 def disp_l [15] [17] [19] [22] [23] [27] [30] [35] loop foo 0 5 D 0 3 +13 7 5 15+8 -5=18 E E -10 9 D 2 8 9 [0] [2] [4] [5] [10] [15] 39
Step 2: Patch main. o start def disp_l main foo loop 0 -(15+15)=-30 bar. o 15+0 15+8 15+15 def disp_l [15] [17] [19] [22] [23] [27] [30] [35] loop foo 0 5 D 0 3 +13 7 5 15+8 -5=18 E E -10 9 D 2 8 9 [0] [2] [4] [5] [10] [15] 40
Step 3: Concatenate a. out start main 15+0 D 0 +13 +18 -10 -30 3 7 5 E E 9 D 2 8 9 [0] [2] [4] [5] [10] [15] [17] [19] [22] [23] [27] [30] [35] 41
Summary • Assember o Read assembly language o Two-pass execution (resolve symbols) o Produce object file • Linker o Order object codes o Patch and resolve displacements o Produce executable 42
- Slides: 42