CS 2422 Assembly Language and System Programming Data

CS 2422 Assembly Language and System Programming Data Transfers, Addressing, and Arithmetic Department of Computer Science National Tsing Hua University

Assembly Language for Intel. Based Computers, 5 th Edition CS 2422 Assembly Language and System Programming Kip Irvine Chapter 4: Data Transfers, Addressing, and Arithmetic Slides prepared by the author Revision date: June 4, 2006 (c) Pearson Education, 2006 -2007. All rights reserved. You may modify and copy this slide show for your personal use, or for use in the classroom, as long as this copyright statement, the author's name, and the title are not changed.

Chapter Overview u Data Transfer Instructions l l l u u MOV Instruction Operand Types Direct Memory Operands Direct-Offset Operands Zero and Sign Extension XCHG Instruction Addition and Subtraction Data-Related Operators and Directives Indirect Addressing JMP and LOOP Instructions 2

Data Transfer Instructions u MOV is for moving data between: l l l u Memory Register Immediate (constant) Almost all combinations, except: l Memory to Memory! 3

MOV Instruction u Syntax: MOV destination, source Both operands have the same size l No more than one memory operand permitted l CS, EIP, and IP cannot be the destination l No immediate to segment register moves. data count BYTE 100 w. Val WORD 2. code mov bl, count mov ax, w. Val mov count, al mov al, w. Val ; error mov ax, count ; error mov w. Val, count ; error l 4

Your Turn. . . u Explain why each of the following MOV statements are invalid: . data b. Val BYTE 100 b. Val 2 BYTE ? w. Val WORD 2 d. Val DWORD 5. code mov ds, 45 mov esi, w. Val mov eip, d. Val mov 25, b. Val mov b. Val 2, b. Val 5

Memory to Memory? u Must go through a register…. data Var 1 WORD 100 h Var 2 WORD ? . code MOV ax, var 1 MOV var 2, ax 6

Three Types of Operands u Immediate: a constant integer (8, 16, or 32 bits) l u Register: the id of a register l u Value of the operand is encoded directly within the instruction Register name is converted to a number (id) and encoded within the instruction Memory: a location in memory l Memory address is encoded within the instruction, or a register holds the address of a memory location 7

Direct-Memory Operands u u A named reference to storage in memory a memory operand The named reference (label) is automatically dereferenced by the assembler. data var 1 BYTE 10 h. code mov al, var 1 mov al, [var 1] ; al = 10 h alternate format: [] implies a dereference operation 8

Direct-Offset Operands u A constant offset is added to a label to produce an effective address (EA) l The address is dereferenced to get the content inside its memory location . data array. B BYTE 10 h, 20 h, 30 h, 40 h. code mov al, array. B+1 ; al = 20 h mov al, [array. B+1] ; alternative notation Q: Why doesn't array. B+1 produce 11 h? 9

Direct-Offset Operands (cont) . data array. W WORD 1000 h, 2000 h, 3000 h array. D DWORD 1, 2, 3, 4. code mov ax, [array. W+2] ; AX = 2000 h mov ax, [array. W+4] ; AX = 3000 h mov eax, [array. D+4] ; EAX = 00000002 h ; Will the following statements assemble? mov ax, [array. W-2] ; ? ? mov eax, [array. D+16] ; ? ? What will happen when they run? 10

Zero or Sign Extension u What happens to ECX if – 1 is moved to CX? . data signed. Val SWORD -16. code mov ecx, 0 mov cx, signed. Val l Are the higher 16 bits of ECX all 0? l What number does ECX represent now? u The solution: MOVZX and MOVSX l l l MOVZX always fills higher bits with 0. MOVSX fills higher bits by “sign extension”. Just extend the left-most bit! 11

Zero Extension u When copy a smaller value into a larger destination, MOVZX instruction fills (extends) the upper half of the destination with zeros (bl) (ax) mov bl, 10001111 b movzx ax, bl ; zero-extension The destination must be a register 12

Sign Extension u MOVSX fills the upper half of the destination with a copy of the source operand's sign bit (bl) Does it affect the value? (ax) mov bl, 10001111 b movsx ax, bl ; sign extension The destination must be a register 13

LAHF/SAHF and XCHG u LAHF to load flags (EFLAGS) into AH l Loads Sign, Zero, Auxiliary Carry, Parity, Carry u SAHF to store contents of AH to EFLAGS u XCHG for exchanging data between: Register, register l Register, memory l Memory, register (again, no memory to memory) l 14

XCHG Instruction u XCHG exchanges the values of two operands. At least one operand must be a register. No immediate operands are permitted . data var 1 WORD 1000 h var 2 WORD 2000 h. code xchg ax, bx xchg ah, al xchg var 1, bx xchg eax, ebx xchg var 1, var 2 ; exchange 16 -bit regs ; exchange 8 -bit regs ; exchange mem, reg ; exchange 32 -bit regs ; error: two memory operands 15

Your Turn. . . Rearranges values of following double words as 3, 1, 2: . data array. D DWORD 1, 2, 3 • Step 1: copy the first value into EAX and exchange it with the value in the second position. mov eax, array. D xchg eax, [array. D+4] • Step 2: Exchange EAX with the third array value and copy the value in EAX to the first array position. xchg eax, [array. D+8] mov array. D, eax 16

Evaluate This. . . • Add the following three bytes: . data my. Bytes BYTE 80 h, 66 h, 0 A 5 h • What is your evaluation of the following code? mov al, my. Bytes add al, [my. Bytes+1] add al, [my. Bytes+2] • What is your evaluation of the following code? mov ax, my. Bytes add ax, [my. Bytes+1] add ax, [my. Bytes+2] • Any other possibilities? 17

What's Next u u Data Transfer Instructions Addition and Subtraction l l l INC and DEC Instructions ADD and SUB Instructions NEG Instruction Implementing Arithmetic Expressions Flags Affected by Arithmetic ‒ Zero, Sign, Carry, Overflow u u u Data-Related Operators and Directives Indirect Addressing JMP and LOOP Instructions 18

INC and DEC Instructions u Add 1/subtract 1 from destination operand may be register or memory. data my. Word WORD 1000 h my. Dword DWORD 10000000 h. code inc my. Word ; 1001 h dec my. Word ; 1000 h inc my. Dword ; 10000001 h mov ax, 00 FFh inc ax ; AX = 0100 h mov ax, 00 FFh inc al ; AX = 0000 h l 19

Your Turn. . . u Show the value of the destination operand after each of the following instructions executes: . data my. Byte BYTE 0 FFh, 0. code mov al, my. Byte ; AL = mov ah, [my. Byte+1] ; AH = dec ah ; AH = inc al ; AL = dec ax ; AX = 20

ADD and SUB Instructions u u u ADD destination, source l Logic: destination + source SUB destination, source l Logic: destination – source Same operand rules as for MOV instruction. data var 1 DWORD 10000 h var 2 DWORD 20000 h. code mov eax, var 1 add eax, var 2 add ax, 0 FFFFh add eax, 1 sub ax, 1 ; ---EAX--; 00010000 h ; 0003 FFFFh ; 00040000 h ; 0004 FFFFh 21

NEG (negate) Instruction u Reverses the sign of an operand. Operand can be a register or memory operand. data val. B BYTE -1 val. W WORD +32767. code mov al, val. B neg al neg val. W ; AL = -1 ; AL = +1 ; val. W = -32767 Suppose AX contains – 32, 768 and we apply NEG to it. Will the result be valid? 22

NEG Instruction and the Flags u u NEG implemented using internal operation: SUB 0, operand Any nonzero operand causes Carry flag to be set. data val. B BYTE 1, 0 val. C SBYTE -128. code neg val. B neg [val. B + 1] neg val. C ; CF = 1, OF = 0 ; CF = 0, OF = 0 ; CF = 1, OF = 1 23

Arith. Expression in Assembly u HLL mathematical expressions are translated into assembly language by compiler, e. g. Rval = -Xval + (Yval – Zval) Rval DWORD ? Xval DWORD 26 Yval DWORD 30 Zval DWORD 40. code mov eax, Xval neg eax ; EAX = -26 mov ebx, Yval sub ebx, Zval ; EBX = -10 add eax, ebx mov Rval, eax ; -36 24

Your Turn. . . u Translate the following expression into assembly language. Do not modify Xval, Yval, or Zval. Rval = Xval - (-Yval + Zval) Assume that all values are signed doublewords. mov ebx, Yval neg ebx add ebx, Zval mov eax, Xval sub eax, ebx mov Rval, eax Can you do it using only one register? compiler optimization 25

Flags Affected by Arithmetic u ALU has a number of status flags that reflect the outcome of arithmetic (and bitwise) operations l u Essential flags: l l u based on the contents of the destination operand Zero: set when destination equals zero Sign: set when destination is negative Carry: set when unsigned value is out of range Overflow: set when signed value is out of range The MOV instruction never affects the flags 26

Zero Flag (ZF) u Zero flag is set when the result of an operation produces zero in the destination operand mov cx, 1 sub cx, 1 mov ax, 0 FFFFh inc ax ; CX = 0, ZF = 1 ; AX = 1, ZF = 0 Remember. . . • A flag is set when it equals 1. • A flag is clear when it equals 0. 27

Sign Flag (SF) u Sign flag is set when the destination operand is negative and clear when destination is positive mov cx, 0 sub cx, 1 add cx, 2 u ; CX = -1, SF = 1 ; CX = 1, SF = 0 Sign flag is a copy of the destination's highest bit: mov al, 0 sub al, 1 ; AL = 1111 b, SF = 1 add al, 2 ; AL = 00000001 b, SF = 0 28

Signed and Unsigned Integers A Hardware Viewpoint: u All CPU instructions operate exactly the same on signed and unsigned integers u The CPU cannot distinguish between signed and unsigned integers u YOU, the programmer, are solely responsible for using the correct data type with each instruction 29

Carry Flag (CF) u The Carry flag is set when the result of an operation generates an unsigned value that is out of range (too big or too small for the destination operand) carry or borrow mov al, 0 FFh add al, 1 ; CF = 1, AL = 00 ; Try to go below zero: mov al, 0 sub al, 1 ; CF = 1, AL = FF 30

Your Turn. . . • For each of the following marked entries, show the values of the destination operand the Sign, Zero, and Carry flags: mov ax, 00 FFh add ax, 1 sub ax, 1 add al, 1 mov bh, 6 Ch add bh, 95 h mov al, 2 sub al, 3 ; AX= SF= ZF= CF= ; AL= SF= ZF= CF= ; BH= SF= ZF= CF= ; AL= SF= ZF= CF=

Overflow Flag (OF) u The Overflow flag is set when the signed result of an operation is invalid or out of range ; Example 1 mov al, +127 add al, 1 ; Example 2 mov al, 7 Fh add al, 1 ; OF = 1, AL = ? ? ; OF = 1, AL = 80 h The two examples are identical at binary level because 7 Fh equals +127. To determine the value of destination operand, it is often easier to calculate in hexadecimal. 32

A Rule of Thumb u When adding two integers, remember that the Overflow flag is only set when. . . l l Two positive operands are added and their sum is negative Two negative operands are added and their sum is positive What will be the values of the Overflow flag? mov al, 80 h add al, 92 h ; OF = 1 mov al, -2 add al, +127 ; OF = 0 33

Your Turn. . . u What will be the values of the given flags after each operation? mov al, -128 neg al ; CF = OF = mov ax, 8000 h add ax, 2 ; CF = OF = mov ax, 0 sub ax, 2 ; CF = OF = mov al, -5 sub al, +125 ; OF = 34

What's Next u u u Data Transfer Instructions Addition and Subtraction Data-Related Operators and Directives l l l u u OFFSET Operator PTR Operator TYPE Operator LENGTHOF Operator SIZEOF Operator LABEL Directive Interpreted by assembler Indirect Addressing JMP and LOOP Instructions 35

OFFSET Operator u OFFSET returns the distance in bytes of a label from the beginning of its enclosing segment l l Protected mode: 32 bits Real mode: 16 bits The protected-mode programs that we write only have a single segment (we use the flat memory model) 36

OFFSET Example. data b. Val w. Val d. Val 2 byte word dword 1 2 3 4 . code main PROC mov al, bval mov bx, w. Val mov ecx, d. Val mov edx, d. Val 2 call Dump. Regs mov eax, offset bval mov ebx, offset w. Val mov ecx, offset d. Val mov edx, offset d. Val 2 call Dump. Regs exit main ENDP 37

OFFSET Example u u Let's assume that the data segment begins at 00404000 h Result of execution: … EAX=75944801 EBX=7 FFD 0002 ECX=00000003 EDX=00000004 ESI=0000 EDI=0000 EBP=0012 FF 94 ESP=0012 FF 8 C EIP=0040102 D EFL=00000246 CF=0 SF=0 ZF=1 OF=0 EAX=00404000 EBX=00404001 ECX=00404003 EDX=00404007 ESI=0000 EDI=0000 EBP=0012 FF 94 ESP=0012 FF 8 C EIP=00401046 EFL=00000246 CF=0 SF=0 ZF=1 OF=0 … 38

OFFSET Example 0000. data 0000 01 b. Val byte 1 00000001 0002 w. Val word 2 u Let's assume that the data segment begins at 00000003 d. Val dword 3 00404000 h: 00000007 00000004 d. Val 2 dword 4 0000. code. data 0000 main PROC b. Val BYTE ? 0000 A 0 0000 R mov al, bval w. Val WORD ? 00000005 66| 8 B 1 D mov bx, w. Val d. Val DWORD ? 00000001 R d. Val 2 DWORD ? 0000000 C 8 B 0 D 00000003 R mov ecx, d. Val 00000012 8 B 15 00000007 R mov edx, d. Val 2 00000018 E 8 0000 E call Dump. Regs. code 0000001 D B 8 0000 R mov eax, mov esi, OFFSET b. Val ; ESI = offset bval 00404000 00000022 BB 00000001 R mov ebx, mov esi, OFFSET w. Val ; ESI = 00404001 offset w. Val 00000027 B 9 00000003 R mov ecx, mov esi, OFFSET d. Val ; ESI = 00404003 offset d. Val 39 mov esi, OFFSET d. Val 2 ; ESI = 0000002 C BA 00000007 R mov edx, 00404007

Relating to C/C++ u The value returned by OFFSET is a pointer Compare the following code written for both C++ and assembly language: ; C++ version: char array[1000]; char * p = array; l . data array BYTE 1000 DUP(? ). code mov esi, OFFSET array ; ESI is p 40

PTR Operator u Overrides default type of a label (variable) and provides the flexibility to access part of a variable . data my. Double DWORD 12345678 h. code mov ax, my. Double ; error – why? mov ax, WORD PTR my. Double ; loads 5678 h mov WORD PTR my. Double, 4321 h ; saves 4321 h Recall that little endian order is used when storing data in memory (see Section 3. 4. 9) 41

PTR Operator Examples. data my. Double DWORD 12345678 h mov al, BYTE PTR my. Double 78 h mov al, BYTE PTR [my. Double+1] mov al, BYTE PTR [my. Double+2] mov ax, WORD PTR my. Double 5678 h mov ax, WORD PTR [my. Double+2] ; AL = 56 h ; AL = 34 h ; AX = 1234 h 42

PTR Operator (cont) u PTR can also be used to combine elements of a smaller data type and move them into a larger operand l The processor will automatically reverse the bytes . data my. Bytes BYTE 12 h, 34 h, 56 h, 78 h. code mov ax, WORD PTR [my. Bytes] mov ax, WORD PTR [my. Bytes+2] mov eax, DWORD PTR my. Bytes 78563412 h ; AX = 7856 h ; EAX = 43

Your Turn. . . • Write down value of each destination operand: . data var. B BYTE 65 h, 31 h, 02 h, 05 h var. W WORD 6543 h, 1202 h var. D DWORD 12345678 h. code mov ax, WORD PTR [var. B+2] mov bl, BYTE PTR var. D mov bl, BYTE PTR [var. W+2] mov ax, WORD PTR [var. D+2] mov eax, DWORD PTR var. W ; a. ; b. ; c. ; d. ; e. 44

TYPE Operator u The TYPE operator returns the size, in bytes, of a single element of a data declaration. data var 1 BYTE ? var 2 WORD ? var 3 DWORD ? var 4 QWORD ? . code mov eax, TYPE var 1 mov eax, TYPE var 2 mov eax, TYPE var 3 mov eax, TYPE var 4 ; 1 ; 2 ; 4 ; 8 45

LENGTHOF Operator u The LENGTHOF operator counts the number of elements in a single data declaration . data byte 1 BYTE 10, 20, 30 array 1 WORD 30 DUP(? ), 0, 0 array 2 WORD 5 DUP(3 DUP(? )) array 3 DWORD 1, 2, 3, 4 digit. Str BYTE "12345678", 0 LENGTHOF ; 32 ; 15 ; 4 ; 9 . code mov ecx, LENGTHOF array 1 ; 32 46

SIZEOF Operator u SIZEOF returns a value that is equivalent to multiplying LENGTHOF by TYPE. . data byte 1 BYTE 10, 20, 30 array 1 WORD 30 DUP(? ), 0, 0 array 2 WORD 5 DUP(3 DUP(? )) array 3 DWORD 1, 2, 3, 4 digit. Str BYTE "12345678", 0 SIZEOF ; 3 ; 64 ; 30 ; 16 ; 9 . code mov ecx, SIZEOF array 1 ; 64 47

Spanning Multiple Lines (1 of 2) u A data declaration spans multiple lines if each line (except the last) ends with a comma. The LENGTHOF and SIZEOF operators include all lines belonging to the declaration: . data array WORD 10, 20, 30, 40, 50, 60. code mov eax, LENGTHOF array mov ebx, SIZEOF array ; 6 ; 12 48

Spanning Multiple Lines (2 of 2) u In the following example, array identifies only the first WORD declaration. Compare the values returned by LENGTHOF and SIZEOF here to those in the previous slide: . data array WORD 10, 20 WORD 30, 40 WORD 50, 60. code mov eax, LENGTHOF array mov ebx, SIZEOF array ; 2 ; 4 49

LABEL Directive u u u Assigns an alternate label name and type to a storage location Does not allocate any storage of its own Removes the need for the PTR operator. data dw. List LABEL DWORD word. List LABEL WORD int. List BYTE 00 h, 10 h, 00 h, 20 h. code mov eax, dw. List ; 20001000 h mov cx, word. List ; 1000 h mov dl, int. List ; 00 h 50

What's Next u u Data Transfer Instructions Addition and Subtraction Data-Related Operators and Directives Indirect Addressing l l u Indirect Operands Array Sum Example Indexed Operands Pointers JMP and LOOP Instructions 51

Direct-Offset Addressing u We have discussed Direct-Offset operands: . data array. B BYTE 10 h, 20 h, 30 h, 40 h. code mov al, array. B+1 ; al = 20 h mov al, [array. B+1] ; alternative notation u Problem: the offset is fixed. l Can’t handle array index, like A[i] 52

Indirect Addressing u u The solution? The memory address must be a variable too! Store it in a register! Compare these: l l l MOV AL, [10000 h] address fixed statically MOV AL, [Var 1+1] MOV AL, [ESI] indirect addressing 53

Indirect Operands (1 of 2) u An indirect operand holds the address of a variable, usually an array or string l It can be dereferenced (just like a pointer) . data val 1 BYTE 10 h, 20 h, 30 h. code mov esi, OFFSET val 1 mov al, [esi] ; dereference ESI (AL = 10 h) inc esi mov al, [esi] ; AL = 20 h inc esi mov al, [esi] ; AL = 30 h 54

Indirect Operands (2 of 2) u Use PTR to clarify the size attribute of a memory operand. . data my. Count WORD 0. code mov esi, OFFSET my. Count inc [esi] ; error: can’t tell ; from context inc WORD PTR [esi] ; ok Should PTR be used here? add [esi], 20 55

Array Traversal u Indirect operands good for traversing an array l The register in brackets must be incremented by a value that matches the array type. . data array. W WORD 1000 h, 2000 h, 3000 h Try: . code mov eax, [esi] mov esi, OFFSET array. W mov ax, [esi] add esi, 2 ; or: add esi, TYPE array. W add ax, [esi] add esi, 2 add ax, [esi] ; AX = sum of the array To. Do: Modify this example for an array of doublewords. 56

Indexed Operands u Adds a constant to a register to generate an effective address: two notational forms: [label + reg] label[reg] . data array. W WORD 1000 h, 2000 h, 3000 h. code mov esi, 0 mov ax, [array. W + esi] ; AX = 1000 h mov ax, array. W[esi] ; alternate format add esi, 2 add ax, [array. W + esi] To. Do: Modify this example for an array of doublewords. 57

Pointers u You can declare a pointer variable that contains the offset of another variable. data array. W WORD 1000 h, 2000 h, 3000 h ptr. W DWORD array. W. code mov esi, ptr. W mov ax, [esi] ; AX = 1000 h u Alternate format: ptr. W DWORD OFFSET array. W 58

What's Next u u u Data Transfer Instructions Addition and Subtraction Data-Related Operators and Directives Indirect Addressing JMP and LOOP Instructions l l l JMP Instruction LOOP Example Summing an Integer Array Copying a String 59

JMP Instruction u u An unconditional jump to a label that is usually within the same procedure Syntax: JMP target Logic: EIP target Example: top: . . jmp top A jump outside the current procedure must be to a special type of label called a global label (see Section 5. 5. 2. 3). 60

LOOP Instruction u u u The LOOP instruction creates a counting loop Syntax: LOOP target Logic: l l u ECX – 1 if ECX != 0, jump to target Implementation: l l The assembler calculates the distance, in bytes, between the offset of the following instruction and the offset of the target label the relative offset The relative offset is added to EIP 61

LOOP Example • Calculates the sum 5 + 4 + 3 +2 + 1: offset machine code source code 0000 66 B 8 0000 mov ax, 0 00000004 B 9 00000005 mov ecx, 5 00000009 66 03 C 1 0000000 C E 2 FB 0000000 E L 1: add ax, cx loop L 1 When LOOP is executed, the current location = 0000000 E (offset of the next instruction). Then, – 5 (FBh) is added to the current location, causing a jump to location 00000009: 00000009 0000000 E + FB 62

Your Turn. . . u If the relative offset is encoded in a single signed byte, (a) what is the largest possible backward jump? (b) what is the largest possible forward jump? (a) -128 (b) +127 63

Your Turn. . . What will be the final value of AX? How many times will the loop execute? mov ax, 6 mov ecx, 4 L 1: inc ax loop L 1 mov ecx, 0 X 2: inc ax loop X 2 64

Nested Loop u Must save the outer loop counter's ECX value Example: the outer loop executes 100 times, and the inner loop 20 times. data count DWORD ? . code Saved in register or memory? mov ecx, 100 ; set outer loop count L 1: mov count, ecx ; save outer loop count mov ecx, 20 ; set inner loop count L 2: . . loop L 2 ; repeat the inner loop mov ecx, count ; restore outer loop count loop L 1 ; repeat the outer loop 65 l

Summing an Integer Array u The following code calculates the sum of an array of 16 -bit integers . data intarray WORD 100 h, 200 h, 300 h, 400 h. code mov edi, OFFSET intarray ; address of intarray mov ecx, LENGTHOF intarray ; loop counter mov ax, 0 ; zero the accumulator L 1: add ax, [edi] ; add an integer add edi, TYPE intarray ; point to next integer loop L 1 ; repeat until ECX = 0 66

Copying a String u The following code copies a string from source to target: . data source BYTE "This is the source string", 0 target BYTE SIZEOF source DUP(0) good use of SIZEOF. code mov esi, 0 ; index register mov ecx, SIZEOF source ; loop counter L 1: mov al, source[esi] ; get char from source mov target[esi], al ; store it in the target inc esi ; move to next character loop L 1 ; repeat for entire string 67

Summary u Data Transfer l l u Operand types l u l INC, DEC, ADD, SUB, NEG Sign, Carry, Zero, Overflow flags Operators l u direct, direct-offset, indirect, indexed Arithmetic l u MOV – data transfer from source to destination MOVSX, MOVZX, XCHG OFFSET, PTR, TYPE, LENGTHOF, SIZEOF, TYPEDEF JMP and LOOP – branching instructions 68