Assembly Language Part 2 Professor Jennifer Rexford COS
Assembly Language Part 2 Professor Jennifer Rexford COS 217 1
Goals of Today’s Lecture • Machine language o Encoding the operation and the operands o Simpler MIPS instruction set as an example • More on IA 32 assembly language o Different sizes of data o Example instructions o Addressing modes • Layout of assembly language program 2
Machine Language Using MIPS Architecture as an Example (since it has a simpler instruction set than IA 32) 3
Three Levels of Languages • High-level languages (e. g. , Java and C) o Easier programming by describing operations in a natural language o Increased portability of the code • Assembly language (e. g. , IA 32 and MIPS) o Tied to the specifics of the underlying machine o Instructions and names to make code human readable • Machine language o Also tied to the specifics of the underlying machine o In binary format the computer can read and execute o Every instruction is a sequence of one or more numbers 4
Machine-Language Instructions An ADD Instruction: add r 1 = r 2 + r 3 Opcode (assembly) Operands Parts of the Instruction: • Opcode (verb) – what operation to perform • Operands (noun) – what to operate upon • Source Operands – where values come from • Destination Operand – where to deposit data values
Machine-Language Instruction • Opcode o What to do • Source operand(s) o o Immediate (in the instruction itself) Register Memory location I/O port • Destination operand o Register o Memory location o I/O port • Assembly syntax Opcode source 1, [source 2, ] destination 6
MIPS Has Three Kinds of 32 -bit Instructions • R: Registers o Two source registers (rs and rt) o One destination register (rd) o E. g. , “rd = rs + rt” or “rd = rs & rt” or “rd = rs xor rt” op Operation and specific variant rs rd rt shamt funct Shift amount 7
MIPS Has Three Kinds of 32 -bit Instructions • I: Immediate, transfer, branch o o o One source register (rs) and one 16 -bit constant (imm) One destination register (rd) E. g. , “rd = rs + imm” or “rd = rs & imm” E. g. , “rd = MEM[rs + imm]” (treating rs+imm as address) E. g. , “jump to address contained in rs” (rs as address) E. g. , “jump to word imm if rs is 0” (i. e. , change instruction pointer) op rs rd address/immediate 8
MIPS Has Three Kinds of 32 -bit Instructions • J: Jump o One 28 -bit constant (imm) for # of 32 -bit words to jump o E. g. , “jump by imm words” (i. e. , change the instruction pointer) op target address 9
MIPS “Add” Instruction Encoding Add registers 18 and 19, and store result in register 17. add is an R inst 0 18 19 17 0 32 10
MIPS “Subtract” Instruction Encoding Subtract register 19 from register 18 and store in register 17 sub is an R inst 0 18 19 17 0 34 11
Greater Detail on IA 32 Assembly: Instruction Set and Data Sizes 12
n %edx count %ecx Earlier Example movl. loop: count=0; while (n>1) { count++; if (n&1) n = n*3+1; else n = n/2; } cmpl jle addl movl andl je movl addl jmp . else: sarl. endif: jmp. endloop: $0, %ecx $1, %edx. endloop $1, %ecx %edx, %eax $1, %eax. else %edx, %eax, %edx $1, %edx. endif $1, %edx. loop 13
Size of Variables • Data types in high-level languages vary in size o o Character: 1 byte Short, int, and long: varies, depending on the computer Pointers: typically 4 bytes Struct: arbitrary size, depending on the elements • Implications o Need to be able to store and manipulate in multiple sizes o Byte (1 byte), word (2 bytes), and extended (4 bytes) o Separate assembly-language instructions – e. g. , addb, addw, addl o Separate ways to access (parts of) a 4 -byte register 14
Four-Byte Memory Words 31 24 23 16 15 87 232 -1 0 . . . Byte 7 Byte 6 Byte 5 Byte 4 Byte 3 Byte 2 Byte 1 Byte 0 Memory 0 Byte order is little endian 15
IA 32 General Purpose Registers 31 15 87 AL BL CL DL AH BH CH DH SI DI 0 16 -bit AX BX CX DX 32 -bit EAX EBX ECX EDX ESI EDI General-purpose registers 16
Arithmetic Instructions • Simple instructions o o o add{b, w, l} source, dest sub{b, w, l} source, dest Inc{b, w, l} dest dec{b, w, l} dest neg{b, w, l} dest cmp{b, w, l} source 1, source 2 dest = source + dest = dest – source dest = dest + 1 dest = dest – 1 dest = ^dest source 2 – source 1 • Multiply o mul (unsigned) or imul (signed) mull %ebx # edx, eax = eax * ebx • Divide o div (unsigned) or idiv (signed) idiv %ebx # edx = edx, eax / ebx • Many more in Intel manual (volume 2) o adc, sbb, decimal arithmetic instructions 17
Bitwise Logic Instructions • Simple instructions and{b, w, l} source, dest or{b, w, l} source, dest xor{b, w, l} source, dest not{b, w, l} dest sal{b, w, l} source, dest (arithmetic) sar{b, w, l} source, dest (arithmetic) dest = source & dest = source | dest = source ^ dest = ^dest = dest << source dest = dest >> source • Many more in Intel Manual (volume 2) o o o Logic shift Rotation shift Bit scan Bit test Byte set on conditions 18
Branch Instructions • Conditional jump o j{l, g, e, ne, . . . } target if (condition) {eip = target} Comparison > Signed e ne g Unsigned e ne a < ge l le o no ae b be c nc overflow/carry no ovf/carry “equal” “not equal” “greater, above” “. . . -or-equal” “less, below” “. . . -or-equal” • Unconditional jump o jmp target o jmp *register 19
Setting the EFLAGS Register • Comparison cmpl compares two integers o Done by subtracting the first number from the second – Discarding the results, but setting the eflags register o Example: – cmpl $1, %edx (computes %edx – 1) – jle. endloop (looks at the sign flag and the zero flag) • Logical operation andl compares two integers o Example: – andl $1, %eax – je. else (bit-wise AND of %eax with 1) (looks at the zero flag) • Unconditional branch jmp o Example: – jmp. endif and jmp. loop 20
EFLAG Register & Condition Codes 31 Reserved (set to 0) 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 I VI VI A V R 0 N IO OD I T S Z 0 A 0 P 1 C P D P F CM F T L F F F F F Identification flag Virtual interrupt pending Virtual interrupt flag Alignment check Virtual 8086 mode Resume flag Nested task flag I/O privilege level Overflow flag Direction flag Interrupt enable flag Trap flag Sign flag Zero flag Auxiliary carry flag or adjust flag Parity flag Carry flag 21
Data Transfer Instructions • mov{b, w, l} source, dest o General move instruction • push{w, l} source pushl %ebx # equivalent instructions subl $4, %esp movl %ebx, (%esp) esp • pop{w, l} dest popl %ebx # equivalent instructions movl (%esp), %ebx addl $4, %esp esp • Many more in Intel manual (volume 2) o Type conversion, conditional move, exchange, compare and exchange, I/O port, string move, etc. 22
Greater Detail on IA 32 Assembly: Addressing Modes 23
Ways to Read and Write Data • Processors have many ways to access data o Known as “addressing modes” • Two simplest ways (used in earlier example) o Immediate addressing: movl $0, %ecx – Data embedded in the instruction – Initialize register ECX with zero o Register addressing: movl %edx, %ecx – Data stored in a register – Copy value in register EDX into register ECX • The others all deal with memory addresses o To read and write data from main memory o E. g. , to get data from memory into a register o E. g. , to write data from a register back in to memory 24
Direct vs. Indirect Addressing • Read or write from a particular memory location o Essentially dereferencing a pointer • Direct addressing: movl 2000, %ecx o Address embedded in the instruction o E. g. , address 2000 corresponds to a global variable o Load ECX register with the long located at address 2000 • Indirect addressing: movl (%eax), %ebx o Address stored in a register o E. g. , EAX register is a pointer o Load EBX register with long located at address in EAX 25
More Complex Addressing Modes • Base pointer addressing: movl 4(%eax), %ebx o o Extends indirect addressing by allowing an offset E. g. , add “ 4” to the register EAX to get the address Allows access to a particular field in a structure E. g. , if “age” starts at the 4 th byte of a record • Indexed addressing: movl 2000(, %ecx, 1), %ebx o o Starts from a base address (e. g. , 2000) Adds an offset from a register (e. g. , ECX) With a multiplier of 1, 2, 4, or 8 (e. g. , 1 to multiply by 1) Allows register to be index for byte, word, or long array 26
Effective Address Offset = eax ebx ecx edx esp ebp esi edi Base + eax ebx ecx edx esp ebp esi edi Index * 1 2 4 8 None + 8 -bit 16 -bit 32 -bit scale displacement • Displacement movl foo, %ebx • Base movl (%eax), %ebx • Base + displacement movl foo(%eax), %ebx movl 1(%eax), %ebx • (Index * scale) + displacement movl (, %eax, 4), %ebx • Base + (index * scale) + displacement movl foo(%edx, %eax, 4), %ebx 27
Data Access Methods: Summary • Immediate addressing: data stored in the instruction itself o movl $10, % ecx • Register addressing: data stored in a register o movl %eax, %ecx • Direct addressing: address stored in instruction o movl 2000, % ecx • Indirect addressing: address stored in a register o movl (%eax), %ebx • Base pointer addressing: includes an offset as well o movl 4(%eax), % ebx • Indexed addressing: instruction contains base address, and specifies an index register and a multiplier (1, 2, 4, or 8) o movl 2000(, %ecx, 1), % ebx 28
Layout of an Assembly Language Program 29
A Simple Assembly Program. section. data . section. text # pre-initialized . globl _start # variables go here _start: # Program starts executing . section. bss # zero-initialized # here # variables go here # Body of the program goes # here . section. rodata # Program ends with an # “exit()” system call # pre-initialized # to the operating system # constants go here movl $1, %eax movl $0, %ebx int $0 x 80 30
Main Parts of the Program • Break program into sections (. section) o Data, BSS, Ro. Data, and Text • Starting the program o Making _start a global (. global _start) – Tells the assembler to remember the symbol _start – … because the linker will need it o Identifying the start of the program (_start) – Defines the value of the label _start 31
Main Parts of the Program • Exiting the program o Specifying the exit() system call (movl $1, %eax) – Linux expects the system call number in EAX register o Specifying the status code (movl $0, %ebx) – Linux expects the status code in EBX register o Interrupting the operating system (int $0 x 80) 32
Conclusions • Machine code o Binary representation of instructions o What operation to do, and on what data • IA 32 instructions o Manipulate bytes, words, or longs o Numerous kinds of operations o Wide variety of addressing modes • Next time o Calling functions, using the stack 33
- Slides: 33