Introduction Chapter 1 n What is Assembly Language

Introduction Chapter 1 n What is Assembly Language? n Data Representation 1

What is Assembly Language? n n n 2 A low-level processorspecific programming language design to match the processor’s machine instruction set each assembly language instruction matches exactly one machine language instruction we study here Intel’s 80 x 86 (and Pentiums)

Why learn Assembly Language? n n n 3 To learn how high-level language code gets translated into machine language u ie: to learn the details hidden in HLL code To learn the computer’s hardware u by direct access to memory, video controller, sound card, keyboard… To speed up applications u direct access to hardware (ex: writing directly to I/O ports instead of doing a system call) u good ASM code is faster and smaller: rewrite in ASM the critical areas of code

Assembly Language Applications n n n 4 Application programs are rarely written completely in assembly language u only time-critical parts are written in ASM F Ex: an interface subroutine (called from HLL programs) is written in ASM for direct hardware access F Ex 2: device drivers (called from the OS) ASM often used for embedded systems programs stored in PROM chips F computer cartridge games, microcontrollers (automobiles, industrial plants. . . ), telecommunication equipment… Very fast and compact but processor-specific

Machine Language n An assembler is a program that converts ASM code into machine language code: u mov al, 5 (Assembly Language) u 101100000101 (Machine Language) F most significant byte is the opcode for “move into register AL” F the least significant byte is for the operand “ 5” n 5 Directly programming in machine language offers no advantage (over Assembly). . .

Binary Numbers n n are used to store both code and data On Intel’s x 86: u byte = 8 bits (smallest addressable unit) u word = 2 bytes u doubleword = 2 words u quadword = 2 doublewords 6

Number System s n n n 7 A written number is meaningful only with respect to a base To tell the assembler which base we use: u Hexadecimal 25 is written as 25 h u Octal 25 is written as 25 o or 25 q u Binary 1010 is written as 1010 b u Decimal 1010 is written as 1010 or 1010 d You are supposed to know how to convert from one base to another (see appendix A)

Data Representation n n Even if we know that a block of memory contains data, to obtain its value we need to choose an interpretation Ex: memory content 0100 0001 can either represent: u the number 2^{6} + 1 = 65 u or the ASCII code of character “A” 8

Signed and Unsigned Interpretation n When a memory block contains a number, to obtain its value we must choose either: u the signed interpretation: in that case the most significant bit (msb) represents the sign F Positive number (or zero) if msb = 0 F Negative number if msb = 1 u the unsigned interpretation: in that case all the bits are used to represent a magnitude (ie: positive number, or zero) 9

Twos Complement Notation n n 10 Used to represent negative numbers The twos complement of a positive number X, denoted by NEG(X), is obtained by complementing all its bits and adding +1 NEG(X) = NOT(X) + 1 u Ex: NEG(10) = NOT(10) + 1 u = NOT(0000 1010 b) + 1 u = (1111 0101 b) + 1 = 1111 0110 b = NEG(10) = -10 It follows that X + NEG(X) = 0 To perform the difference X - Y: u the machine executes the addition X + NEG(Y)

Maximum and Minimum Values n The msb of a signed number is used for its sign u fewer n bits are left for its magnitude Ex: for a signed byte u smallest positive = 0000 b u largest positive = 0111 1111 b = 127 u largest negative = -1 = 1111 b u smallest negative = 1000 0000 b = -128 11

Signed/Unsigned Interpretation (again) n n To obtain the value of a number we need to chose an interpretation Ex: memory content 1111 can either represent: u -1 if a signed interpretation is used u 255 if an unsigned interpretation is used n 12 Only the programmer can provide an interpretation of the content of memory

Character Representation n Each character is represented by a 7 -bit code: the ASCII code (from 00 h to 7 Fh) u Only codes from 20 h to 7 Eh represent printable characters. The rest are control codes (used for printing, transmission…). n An extended character set is obtained by setting the msb to 1 (codes 80 h to FFh) so that each character is stored in 1 byte u Varies from one system to another F MS-DOS usage: for accentuated characters, Greek symbols and some graphic characters 13

The ASCII character set n CR = “carriage return” (MSDOS: move to beginning of line) LF = “line feed” (MSDOS: move directly one line below) n SPC = “blank space” n 14

Text Files n n These are files containing only ASCII characters But different conventions are used for indicating an “end-of line” u MS-DOS: <CR>+<LF> u UNIX: <LF> u MAC: <CR> n 15 This is at the origin of many problems encountered during transfers of text files from one system to another

Strings and numbers n n n 16 A strings is stored as an array of characters A 1 -byte ASCII code is stored for each char Hence, we can either store the number 123 in numerical form or as the string “ 123” u The string form is best for display u The numerical form is best for computations
- Slides: 16