Introduction to Assembly Language Programming and Computer Architecture
Introduction to Assembly Language Programming and Computer Architecture By Anand George
Agenda • Understanding of basic computer architecture • Introduction to x 86 CPU architecture. • Introduction to x 86 instruction sets. • Examples and demos.
Why we need to learn assembly? • Better understanding of the output code generated by the compiler. • Understand different C language features better, by looking at the assembly generated for them. • Certain language features like pointers, calling conventions etc are very difficult to understand without knowing the equivalent assembly code. • Many other … reverse engineering…debugging…optimizing etc are some incomplete list.
Basic Computer Architecture Motherboard Disk NIC Devices USB DMA B u s C o n t r o ll e r s Buses CPU Buses Memory which contain instruction and data.
CPU • Chip in the Motherboard. • Connected to devices and memory directly via buses and bus controllers. • Buses are practically wires. • Fetch and execute instructions from memory.
CPU Registers. • CPU contain small internal memory regions • Small and very fast access compared to the main memory. • They are called registers and are well defined for a particular make of CPU. Note ( Cache is not going be relevant to discussion )
Working in a nutshell • Programs are loaded into the memory. • CPU start executing it from the memory. • CPU Mostly does following 3 things 1. Read data from memory or devices to some register 2. Modify ( also can be said process ) the data 3. Write back to memory or devices.
Example – Adding 2 numbers • Suppose you have a program which add 2 number and display the output. • You type the 2 input numbers in the keyboard. 1. 2. 3. CPU read those 2 numbers from the Keyboard into CPU registers. ( read ) Add values in registers. ( modify / process ) Result is written to video memory to display. ( write )
Note • CPU do the processing mainly on values in registers due to performance reason an not on memory or device directly. • Registers are like wallet for a CPU – Wallet is what you use immediate purchases ( processing ) but we cant put entire bank ( memory ) account amount to wallet as wallet is small.
Understanding Data and Code • Memory normally contain programs. • Programs are nothing but binary data. • The binary information constitutes a program can be divided in 2 types 1. 2. Code Data
Code and Data • Code is command to the CPU and instruct the CPU what to do with the data already given or going to give. • Example. Suppose a Baby is CPU and following are the instructions given to baby 1. 2. 3. 4. Drink Milk Smile Close eyes Go to Sleep
Code and Data • Drink, Smile, Close, Goto are Code. • Milk, Mouth, eyes, Sleep are data. • Note that Mouth is implicit kind of data which is very common when it come to real CPU.
Intel 32 bit CPU registers • Many – we concentrate on very few which are important for the time begin. • All are 32 bit in size • EAX, EBX, ECX, EDX, ESI, EDI, EBP ESP, EIP and EFLAGS. • ESP, EIP are strictly special purpose. • ECX, ESI, EDI are occasional special purpose. • EAX, EBP, EDX are normally general purpose although use of EBP as special purpose is compiler dependent. ( No restriction from CPU or any instruction dependents on them ) • EFLAGS is also special purpose but it is different from other registers from the fact that it is the bits inside the registers are being used than the register as a whole ( details later ).
Demo • View CPU registers in Visual Studio in a hello world C application.
Intel x 86 32 bit Registers CPU Mostly Used Registers EAX ESI EBX EDI ECX EBP ECX ESP EIP EFLAGS Control Register and others ( CR 0 to CR 4, DR 0 to DR 7, TR 3 to TR 7, GDTR, IDTR etc ) Segment Registers Special co processor registers like MMX, FL unit Reg etc
Registers and Programming • Registers are the programmer visible part of the CPU. • From a programmers perspective Registers are CPU. • All the programming language compiler ( or compiler plus runtime ) create one way or the other binary instruction to the CPU to which the program is targeted to. No Exception whatsoever. ( C#, java, Perl, Java. Script or C/C++)
Special Purpose registers • Certain instructions uses some registers implicitly. • In the example of Baby for the instruction SMILE the baby has to use the mouth ( special purpose register ). • Again Mouth can be used for EATING as well so it is not strict special purpose. • Let’s consider the instruction HEAR to the baby now baby has to use the EAR ( strict special purpose register ) to do it. • And Ear normally cannot be used for anything else other than HEAR.
EIP ( Instruction pointer ) • Very Strict Special Purpose • Always points to CODE which is the next instruction the CPU is going execute. • EIP is Extended Institution Pointer. • All instructions depends on this register value. • Controls the flow of the program in general. • Like the steering of a CAR.
Demo EIP in Visual Studio
ESP ( Stack pointer ) • Normally a program has 100 s of instructions if not 1000 s. • All programs need some temporary storage to keep track of immediate states of variables, flow etc. • To manage that all program uses a small chunk of memory called stack. • ESP register always points to the top of the stack. • PUSH, POP, RET are some of the instructions depends on ESP. • Normally non of the compilers use ESP for any other purpose other than point to stack of a thread ( program ). • So it is normally a strict special purpose register.
Demo ESP In Visual Studio
EFLAGS CF Bit ZF Label SF Description -------------0 CF Carry flag 2 PF Parity flag 4 AF Auxiliary carry flag 6 ZF Zero flag 7 SF Sign flag 8 TF Trap flag 9 IF Interrupt enable flag 10 DF Direction flag 11 OF Overflow flag 12 -13 IOPL I/O Privilege level 14 NT Nested task flag 16 RF Resume flag 17 VM Virtual 8086 mode flag 18 AC Alignment check flag 19 VIF Virtual interrupt flag 20 VIP Virtual interrupt pending flag 21 ID ID flag DF OF
EFLAGS • Set or cleared by results of operation like addition, subtraction etc. • Some operations ( INC, DEC ) are “signed” which means the CPU will take care of the sign part of the number. • Will restrict our discussion to Operation result based flags. • Mainly Carry, Zero, Signed, Direction, Overflow.
Operation based flags. • CPU clear or set the flag after an operation if required. • That is how the programmer “Check” the result of certain operations. • We will be revisiting this when we discuss Jump instruction later to get a complete picture.
Carry Flag • Will be set ( value 1 ) if the result of previous operation is more than the maximum value the target register can hold. • Mainly used for unsigned calculations. • Result of an operation cannot be higher than the maximum value a register can hold. ( unlike the normal math ) • Carry flag indicate that result of the operation is too huge and the value in the result register is not correct.
Zero Flag • Will be set if the result of an operation is zero.
Sign Flag • Will be set if the most significant bit of the result register after an operation is 1. • Which means the result number is negative. • Used in signed arithmetic. • Can be used to check the result of the operation was positive value or negative value.
Overflow flag • Like a carry flag for signed operation.
Demo • For assembly language practical - again all you need is Visual Studio no assembler etc.
Instructions - MOV • One of the very basic x 86 instructions. • Move data between memory and CPU register and vice versa. • Move data between 2 CPU registers. • Lot of instruction which indirectly do what MOV does like LEA, PUSH, POP. . some are faster than MOV.
Demo • MOV instruction in Visual Studio.
Some common Operations • Add – ADD • Subtract – SUB • OR • AND • XOR • Shift
Demo • Sum of 2 numbers • Other operations
Jump instructions • JMP – Unconditional Jump • JNZ – jump if not zero. • Jump instructions change EIP register which means it changes the flow of execution. • So many types of Jumps based on all the flags we discussed. We don’t care them as we are not going to program in assembly but only going to understand.
Demo • Jump instruction.
Instruction which uses stack pointer. • PUSH • POP • RET PUSH and POP are more or less MOV instructions based on stack pointer. RET is more or less a JUMP based on stack pointer. All instructions used by Functions in C
CALL instruction • Similar to Jump but again does some additional actions on stack. • Normally used for function calls.
Demo • PUSH, POP, RET and CALL
Functions and calling convention in C • How the arguments are passed to the stack of callee. • Who is cleaning up the ( incrementing the stack pointer) stack. • C calling convention • standard calling convention • other exist like thiscall, fastcall, x 64 etc. • all depends on complier. • caller and callee should follow the calling convention. • Normally programmer don’t have to bother but if you linking to a dynamically loaded binary’s function pointer complier is not there to give error.
Pro-Con C-calling convention • Caller cleans the stack • Caller manage the stack Std • Callee cleans the stack • Callee manage the stack
Pro-Con C-calling convention • support variable number of arg. • will generate more code and binary will become big Std • small binary • no variable args.
Lot more on assembly. . but we may not need. . • How to go from here? • Look at the assembly generated by the programs in previous sessions. • Get acquainted with the assembly instruction pattern. • Fill up if there is some gaps in required knowledge in the exercise. • Make sure we are all set to start pointers in C.
Demo • Cdecl or c calling convention. • stdcall or standard calling convention. • looking at assembly and understanding the difference.
Thank you
- Slides: 44