Assembly Language Programming Bilal Zahoor Kashmir University North

Assembly Language Programming Bilal Zahoor Kashmir University North Campus

8086 Assembly language The language is case sensitive (MOV or mov is same). Each instruction is represented by one assembly language statement. There may be only one statement/line. A statement may start in any column. A statement is either An 8086 instruction or Assembler directive: provides directions to the assembler program.

Format of 8086 Instruction General format for an assembly language statement [Label: ] Mnemonic [Operand 1 [ , Operand 2] ] [; Comment] Start: MOV AX, BX ; copy BX into AX INC CX ; increment CX RET ; return from procedure The entities enclosed with in square brackets are optional. Start is a user defined name and you only put in a label in your statement when necessary!!!!. The symbol : is used to indicate that it is a label. The symbol ; is used to indicate that it is a comment.

Format of an Assembler directive General format for an assembler directive [Name] Pseudo-Op [Operands] [; Comment] temp DB 5 ; define temp as a byte variable initialized with value 5 my. Procedure PROC ; beginning of a procedure (my. Procedure). my. Procedure ENDP ; end of a procedure (my. Procedure). Name is used for Variables, Procedure name etc. Pseudo-ops qualify Name as segment name, macro-name and so on. There are various Pseudo-ops used by assemblers, they have different names in different assemblers.

ALP Development The most common assemblers for Assembly language programming development are: Intel ASM Microsoft Works MASM (MACRO Assembler). on Microsoft Platforms (Windows and DOS). Produces Borland 16 -bit only. Com and. exe files. TASM (Turbo Assembler). and 32 -bit. Supports all platforms.

Defining Data in a program Data is usually stored in the data segment. You can define constants, work areas (a chunk of memory). Data can be defined in different lengths (8 -bit, 16 -bit, 32 -bit and so on) Each byte of character is stored as its ASCII value in hexadecimal The definition for data: <Name> DX <Expression> Name – name of data segment. DX – this is called the directives. It defines length of the data. Expression – defines the value (content) of the data segment.

Defining Data in a program(CONT. . ) Pseudo-op Description Data size (in bytes) DB Define Byte 1 DW Define Word 2 DD Define Double word 4 DQ Define Quad-word 8 DT Define Ten bytes 10 Examples: var DW 1234 H ; define var as word and initialize with value 1234(in Hex). var DB ? ; define var as Byte and its initial value is unknown.

Defining Arrays Initialized Array: arr DB ‘a’, ’b’, 2 arr DB 20 dup(5) ; initialized array of 3 elements, size of each element is 1 byte ; arr is byte array of 20 elements all initialized to value 5 Uninitialized Array: arr DB 20 dup(? ) ; arr is uninitialized byte array of 20 elements DUP – duplicate can be used to define multiple storages. arr stores the first value, arr + 1 stores the second value and so on. E. g Mov Al, [arr+1] ; [] used for dereferencing, store value at arr+1 in Al

Named constant Directive EQU – this directive does not define a data item; instead, it defines a value that the assembler can use to substitute in other instructions (similar to defining a constant in C programming or using the #define ) FACT EQU 12 ; defines FACT as a named constant MOV CX, FACT No memory is allocated. Strings are also possible. ; without dereferencing

User defined Data definition directives Structure: Struct my. Struct ; declares my. Struct as a structure var 1 DB 0 ; Var 1 data byte initialized with 0 var 2 DB 1 ; Var 2 data byte initialized with 1 Ends my. Structure variable: struct. Var my. Struct ? Acceccing ; creates structure variable structure: MOV [struct. Var. var 1], 20 ; move 20 in var 1 in mystruct

Segment Directive The SEGMENT directive identifies the start of a memory segment and ENDS identifies the end of a segment when full-segment definitions are in use. Syntax: <logical-segment_name> SEGMENT. . <logical-segment_name> ENDS E. g my. Segment SEGMENT Mov Ax, BX. . my. Segment ENDS

Assume Directive The ASSUME statement tells the assembler what names have been chosen for the code, data, extra, and stack segments. Without the ASSUME statement, the assembler assumes nothing and automatically uses a segment override prefix on all instructions that address memory data. The ASSUME statement is only used with full-segment definitions Syntax: ASSUME <Physical-Segment>: <logica-segment_name> E. g: my. Segment SEGMENT ASSUME CS : my. Segment Mov AX, BX my. Segment ENDS ; Code segment is initialized to my. Segment

Memory Organization The assembler uses two basic formats for developing software: 1. Using Memory Models: Memory models are unique to the MASM assembler program. The TASM assembler also uses memory models, but they differ somewhat from the MASM models. The 2. models are easier to use for simple tasks. Using full-segment definitions: The full-segment definitions are common to most assemblers, including the Intel assembler, and are often used for software development. The full-segment definitions offer better control over the assembly language task and are recommended for complex programs.

Simplified Segment Directives. MODEL The. MODEL statement followed by the size of the memory system designates the Memory Model. E. g . MODEL SMALL ; select small model . CODE Designates the beginning of the CODE segment in the program. . DATA Designates the beginning of the DATA segment in the program . STACK Defines STACK segment in the program. Syntax : . STACK [memory-size] Default memory size for stack segment is 1 KB. Initializes Stack Segment(SS), Stack Pointer(SP) and Base Pointer(BP). ; memory-size is optional

Simplified Segment Directives. STARTUP Designates the start of the execution of the problem (similar to main() in C). Loads DATA segment and EXTRA segment. If the. STARTUP directive is used, the MOV AX, @DATA followed by MOV DS, AX statements can be eliminated. The. STARTUP directive also eliminates the need to store the starting address next to the END label. . EXIT The. EXIT 0 directive returns to DOS with an error code of 0 (no error). If no parameter is added to. EXIT, it still returns to DOS, but the error code is not defined. END: End of the FIle.

Memory Model MODEL Description TINY All data and code must fit into a single 64 K-byte memory segment. Tiny programs assemble as DOS. COM files and must use an origin at 0100 H for the code. SMALL A two-segment model with a single code segment and a single data segment. Small programs generate DOS. EXE files and have an origin of 0000 H. FLAT The flat model uses a single segment of up to 4 G bytes in length. Flat programs are programs that will only function in Windows with an origin of 0000 H. All models except TINY model assemble as. exe Data segment(DS) and Extra Segment (ES) need to be initialized explicitly by programmer

Standard I/O DOS function calls are used for Standard input/output in Assembly language(8086). To use a DOS function call in a DOS program, 1. Place the function number in AH (8 bit register) and other data that might be necessary in other registers. 2. Once everything is loaded, execute the INT 21 H instruction to perform the task. After execution of a DOS function, it may return results in some specific registers. 01 H: Read the Keyboard This function waits until a character is input from the keyboard. Returns ASCII key code of character in AL register. E. g: MOV AH, 01 H INT 21 H ; load DOS function number in AH ; access DOS ; returns with AL = ASCII key code

Standard I/O (CONT. . ) 02 H: Write to Standard Output device This function displays single character on the video display. ASCII key code of the character to be displayed must be loaded in DL register. E. g MOV DL, ‘A’ ; load ASCII key code of Character ‘A’ in DL MOV AH, 02 H ; load DOS function number in AH INT 21 H ; access DOS

Standard I/O (CONT. . ) 09 H: Display a character String This function displays a character string on the video display. The character string must end with an ASCII of symbol ‘$’ (24 H). The character string can be of any length and may contain control characters such as carriage return (0 DH) and line feed (0 AH). DX must contain address of the character string. E. g Buf DB “Hello World$” ; define character string MOV DX, offset Buf ; load address of the string in DX, offset gives address of the Buf. MOV AH, 09 H ; load DOS function number in AH INT 21 H ; access DOS

Standard I/O (CONT. . ) 0 AH: Buffered keyboard input This function continues to read the keyboard (displaying data as typed) until either the specified number of characters are typed or until the enter key is typed. The first byte of the buffer contains the size of the buffer (up to 255). The second byte is filled with the number of characters typed upon return. The third byte through the end of the buffer contains the character string typed, followed by a carriage return (0 DH). Buffer e. g: Buf DB 13, 10, “Welcome$” E. g Buf DB 10, ? , 10 dup(0) MOV DX, offset Buf MOV AH, 0 AH INT 21 H ; declare a buffer. ; load address of the buffer in DX, offset gives address of the Buf. ; load DOS function number in AH ; access DOS

Procedure/Subroutines Procedures can be declared within the. CODE (code segment). However, ensure that a procedure is not executed without explicit invocation. Use CALL instruction to call the procedure. Eg. CALL procedure-name Example: . MODEL SMALL. DATA msg DB “Hello World$”. STACK. CODE my. Proc PROC MOV DX, offset msg MOV AH, 09 H INT 21 H RET my. Proc ENDP. STARTUP CALL my. Proc. EXIT END

Procedure/Subroutines The PROC and ENDP directives indicate the start and end of a procedure (subroutine). Both the PROC and ENDP directives require a label to indicate the name of the procedure. Syntax: <Procedure-name> PROC. . ; starts procedure ; body of the procedure . . RET <Procedure-name>ENDP ; close of the procedure When returning from procedures, use RET instruction.

Macros A macro is a group of instructions that perform one task, just as a procedure performs one task. The difference is that a procedure is accessed via a CALL instruction, whereas a macro, and all the instructions defined in the macro, is inserted in the program at the point of usage. Creating a macro is very similar to creating a new opcode, which is actually a sequence of instructions, in this case, that can be used in the program. You type the name of the macro and any parameters associated with it, and the assembler then inserts them into the program. Macro sequences execute faster than procedures because there is no CALL or RET instruction to execute. The instructions of the macro are placed in your program by the assembler at the point where they are invoked.

Macros (CONT. . ) The MACRO and ENDM directives delineate a macro sequence. The first statement of a macro is the MACRO instruction, which contains the name of the macro and any parameters associated with it. An example is MOVE MACRO A, B which defines the macro name as MOVE. This new pseudo opcode uses two parameters: A and B. The last statement of a macro is the ENDM instruction, which is placed on a line by itself. Never place a label in front of the ENDM statement. If a label appears before ENDM, the macro will not assemble. Syntax: <Macro-name> MACRO [<Arg 1> <, Arg 2>…<, Arg n>]. . ENDM

Macros (CONT. . ) Example; Adding two numbers using Macro. MODEL SMALL ; define Memory Model SUM MACRO X, Y ; Macro definition SUM, X and Y are parameters MOV AX, X ; Move value of X in AX MOV BX, Y ; Move Value of Y in BX ADD AX, BX ; Add values of X and Y ENDM ; end of macro . DATA ; start of data segment . STACK ; start of stack segment . CODE ; start of code segment . STARTUP SUM 5, 10 ; start of execution ; call Macro with Parameters 5 and 10 . EXIT ; return to DOS END ; end of program file

Modular Programming Many programs are too large to be developed by one person. This means that programs are routinely developed by teams of programmers. The Linker program is used so that programming modules can be linked together into a complete program. The assembler program converts a symbolic source module (file) into a hexadecimal object file. The linker program, reads the object files that are created by the assembler program and links them together into a single execution file. An execution file is created with the file name extension EXE. If a file is short enough (less than 64 K bytes long), it can be converted from an execution file to a command file (. COM). The command file is slightly different from an execution file in that the program must be originated at location 0100 H before it can execute.

Modular Programming (CONT. . ) Object file 1. obj Program. asm Assembler Object file 2. obj Object file 3. obj Linker Executable file. exe

Modular Programming (CONT. . ) The PUBLIC and EXTRN directives are very important to modular programming because they allow communications between modules. We use PUBLIC to declare that labels of code, data, or entire segments are available to other program modules. EXTRN (external) declares that labels are external to a module. The PUBLIC directive is placed in the opcode field of an assembly language statement to define a label as public, so that the label can be used (seen by) by other modules. The label declared as public can be a jump address, a data address, or an entire segment. When segments are made public, they are combined with other public segments that contain data with the same segment name.

Modular Programming (CONT. . ) Placing MACRO Definitions in Their Own Module: Macro definitions can be placed in the program file, or they can be placed in their own macro module. A file can be created that contains only macros to be included with other program files. We use the INCLUDE directive to indicate that a program file will include a module that contains external macro definitions. When macro sequences are placed in a file (often with the extension INC or MAC), they do not contain PUBLIC statements as does a library.

Modular Programming (CONT. . ) Example My. Fun. inc file . MODEL SMALL SUM MACRO X, Y . DATA MOV AX, X MOV BX, Y ADD AX, BX . STACK. CODE INCLUDE My. Fun. inc. STARTUP SUM 5, 10. EXIT END ; include my. Fun file ENDM

Conditional Assembly (or Controlled Expansion) Conditional assembly language statements allow portions of a program to be assembled if a condition is met. These are useful for tailoring software to an application Following table shows the forms used for IF statement in the conditional assembly. Statement Function IF If the expression is true IFB If argument is blank IFE If the expression is not true IFDEF If the label has been defined IFNB If argument is not blank IFNDEF If the label has not been defined IFIDN If argument 1 equals argument 2 IFDIFWWW If argument 1 does not equal to argument 2