Introduction to Assembly Chapter 2 Sepehr Naimi www
Introduction to Assembly Chapter 2 Sepehr Naimi www. Nicer. Land. com
Topics n ARM’s CPU n n n Its architecture Some simple programs Data Memory access Program memory RISC architecture 2
ARM ’s CPU n n ALU Registers n n Instruction decoder R 1 ALU R 2 … n General Purpose registers (R 0 to R 12) PC register (R 15) R 0 CPSR: N Z C V I F T M 4 -M 0 CPU R 13 (SP) R 14 (LR) R 15 (PC) registers Instruction decoder Instruction Register 3
CPU 4
Some simple instructions 1. MOV (MOVE) n MOV Rd, #k n n n Rd = k k is an 8 -bit value Example: n R 5 = 53 MOV R 9, #0 x 27 n n MOV Rd, Rs n n Rd = Rs Example: n MOV R 5, #53 n n n MOV R 5, R 2 n n R 5 = R 2 MOV R 9, R 7 n R 9 = R 7 R 9 = 0 x 27 MOV R 3, #0 b 11101100 5
LDR pseudo-instruction (loading 32 -bit values) n LDR Rd, =k n n n Rd = k k is an 32 -bit value Example: n LDR R 5, =5543 n n LDR R 9, =0 x 123456 n n R 5 = 5543 R 9 = 0 x 123456 LDR R 4, =0 b 1011011001 6
Some simple instructions 2. Description Arithmetic calculation Instruction ADD Rn to Op 2 and place the result in Rd Opcode destination, source 1, source 2 ADC Rd, Rn, Op 2 ADD Rn to Op 2 with Carry and place the result in Rd n n Rd, Rn, Op 2 * AND Rd, Rn, Op 2 AND Rn with Op 2 and place the result in Rd Opcodes: ADD, SUB, AND, etc. BIC Rd, Rn, Op 2 AND Rn with NOT of Op 2 and place the result in Rd CMP Rn, Op 2 Examples: CMN n ADD R 5, R 2, R 1 EOR Rd, Rn, Op 2 MVN n n n Rn, Op 2 Compare Rn with Op 2 and set the status bits of CPSR** Compare Rn with negative of Op 2 and set the status bits Exclusive OR Rn with Op 2 and place the result in Rd Rd, Op 2 R 5 = R 2 + R 1 Store the negative of Op 2 in Rd MOV Rd, Op 2 Move (Copy) Op 2 to Rd RSC Rd, Rn, Op 2 Subtract Rn from Op 2 with carry and place the result in Rd SBC Rd, Rn, Op 2 Subtract Op 2 from Rn with carry and place the result in Rd SUB Rd, Rn, Op 2 Subtract Op 2 from Rn and place the result in Rd TEQ Rn, Op 2 Exclusive-OR Rn with Op 2 and set the status bits of CPSR TST Rn, Op 2 AND Rn with Op 2 and set the status bits of CPSR SUB R 5, R 9, #23 OR Rn with Op 2 and place the result in Rd ORR Rd, Rn, Op 2 RSB = R 9 Rd, Rn, Op 2 R 5 - 23 Subtract Rn from Op 2 and place the result in Rd * Op 2 can be an immediate 8 -bit value #K which can be 0– 255 in decimal, (00–FF in hex). Op 2 can also be a register Rm. Rd, Rn and Rm are any of the general purpose registers ** CPSR is discussed later in this chapter 7
A simple program n Write a program that calculates 19 + 95 MOV R 6, #19 ; R 6 = 19 MOV R 2, #95 ; R 2 = 95 ADD R 6, R 2 ; R 6 = R 6 + R 2 8
A simple program n Write a program that calculates 19 + 95 - 5 MOV R 1, #19 ; R 6 = 19 MOV R 2, #95 ; R 2 = 95 MOV R 3, #5 ; R 21 = 5 ADD R 6, R 1, R 2 ; R 6 = R 1 + R 2 SUB R 6, R 3 ; R 6 = R 6 - R 3 MOV R 1, #19 ; R 6 = 19 MOV R 2, #95 ; R 2 = 95 ADD R 6, R 1, R 2 ; R 6 = R 1 + R 2 MOV R 2, #5 ; R 21 = 5 SUB R 6, R 2 ; R 6 = R 6 - R 2 9
Von-Neumann and ARM 7 10
Harvard in ARM 9 and Cortex 11
Memory Map in Raspberry Pi Example: Add contents of location 0 x 90 to contents of location 0 x 94 STR (Store register) and store the result in location 0 x 300. Example: Write a program that copies the contents of location 0 x 80 Solution: STR Rx, [Rd] ; [Rd]=Rx of into location 0 x 88. LDR (Load register) LDR R 6, =0 x 90 ; R 6 = 0 x 90 Solution: LDR R 1, [R 6] LDR R 2, =0 x 80 LDR R 6, =0 x 94 LDR R 1, [R 2] LDR R 2, [R 6] LDR R 2, =0 x 88 ADD R 2, R 1 STR R 1, [R 2] LDR R 6, =0 x 300 STR R 2, [R 6] Example: ; R 1 = [0 x 90] LDR Rd, [Rx] ; Rd = [Rx] ; R 2 = 0 x 80 ; R 6 = 0 x 94 ; [0 x 20000000]=0 x 12345678 ; R 1 = [0 x 80] Example: ; R 1 = [0 x 94] LDR R 5, =0 x 12345678 ; R 2 = 0 x 88 ; R 2 = R 2 + R 1 LDR R 4, =0 x 20000000 ; [0 x 88] = R 1 LDR R 2, =0 x 20000000 ; R 6 = 0 x 300 LDR R 1, [R 4] STR ; [0 x 300] = R 2 R 5, [R 2] ; [R 2] = R 5 12
LDRB, LDRH, STRB, STRH Data Size Bits Load instruction used Store instruction used Byte 8 LDRB STRB Half-word 16 LDRH STRH Word 32 LDR STR LDR Rd, [Rs] LDRB Rd, [Rs] LDRH Rd, [Rs] STR Rs, [Rd] STRB Rs, [Rd] STRH Rs, [Rd] 13
Status Register (CPSR) CPSR: Negative o. Verflow Zero Interrupt Thumb carry Example: Show the status of the C and Z flags after thethe addition of Example: Show the status of the flag after subtraction of 0 x 9 C Example: Show the status of the Cafter and Zthe flags after addition Example: Show thestatus ofthe flag afterthe subtraction of 0 x 23 0 x 73 of Example: Show the of ZZZ flag subtraction of 0 x 0000009 C and 0 x. FFFFFF 64 in the following instructions: from 0 x 9 C in the following instructions: 0 x 38 0 x 2 F in the following instructions: from 0 x. A 5 0 x 52 and inthe following instructions: from in following instructions: LDR R 0, =0 x 9 C ; R 6 = 0 x 38 LDR R 0, =0 x 9 C MOV #0 x 38 LDRR 6, R 0, =0 x. A 5 R 0, =0 x 52 LDR R 1, =0 x. FFFFFF 64 LDR R 1, =0 x 9 C MOV #0 x 2 F ; R 17 = 0 x 2 F LDRR 7, R 1, =0 x 23 R 1, =0 x 73 LDR ADDS R 0, R 1 ; add; subtract ; add to R 0 from R 20 SUBS R 0, R 1 R 21 ADDS R 6, R 0, R 1 R 6, R 7 to R 1 R 6 R 1 SUBS R 0, R 1 ; subtract R 1 from. R 0 R 0 SUBS ; subtract Solution: Solution: 52 01010000 00100000 38 1010 00000000 0011 1000 9 C 1001 1100 0000009 C 00000000 10011100 0 x. A 5 0101 73 01110000 0011111 + 2 F 0010 00000000 0010 1111 FFFFFF 64 11111111 01100100 9 C 1001 1100 -+- - 0 x 23 0011 DF 67 1000 11010000 1111 R 00000 0 x. DF 00000000 01100111 10000 1 00000000 00 0000 R 0 $00 0 x 82 0010 R 0 == 0 x 82 = 10 R 6 because the R 20 is has a value other than zero after the subtraction. = 0000 ZZ =R 0 because the zero after the subtraction. = 0 x 67 CC= =0 because R 1 is than beyond R 0 and there isbit. a borrow from D 32 bit. 1 because there is a carry the D 7 there R 21 isbigger not than R 20 and is no borrow from ZC==01 because the R 20 hasbigger a value other than 0 the after the subtraction. C = 0 because there is no carry beyond D 31 bit. Z = 1 because R 0 (the result) has a value 0 in it after the addition. C = 1 because R 1 is not R 0 has andathere no borrow Z = 0 because the bigger R 6 (thethan result) valueisother than 0 from after. D 32 the bit. addition. 14
Assembler 15
Instructions, pseudo-instructions, and assembler directives n Instructions: they are the real instructions of the CPU n n n Pseudo-instructions: the pseudo-instructions are translated to some real instructions by the assembler n n mov r 5, #23 add r 4, r 5, r 6 ldr r 5, =0 x 253234 Assembler directives: n n . equ my. Value, 0 x 3 fffc 045. global _start 16
Some Widely Used Directives Directive. text. data. global. extern. thumb. arm Description Informs the assembler that a code section begins. Informs the assembler that an initialized data section begins. To inform the assembler that a name or symbol will be referenced in other files. Informs the assembler that the code accesses a name or symbol defined in other file. Forces the assembler to convert the next instructions to THUMB machine instructions. Forces the assembler to convert the next instructions to ARM machine instructions. 17
. global and. extern directives file 1. s. text. extern my. Func. . . bl my. Func. . . file 2. s. text. global my. Func: add r 2, r 1, r 5. . . 18
Assembler Directives. EQU n . equ name, value n Example: . equ COUNT, 0 x 25 mov r 1, #COUNT r 2, #COUNT + 3 ; r 1 = 0 x 25 ; r 2 = 0 x 28 19
Assembler Directives. INCLUDE n . INCLUDE “filename. ext” h. File. inc. equ. . SREG SPL SPH = 0 x 3 f = 0 x 3 d = 0 x 3 e Program. asm. include “h. File. inc” 20
A simple program @ ARM Assembly language program to add some data @ and store the SUM in r 0. . text. global _start: @ the beginning point for ARM assembly programs mov r 1, #0 x 25 @ r 1 = 0 x 25 mov r 2, #0 x 34 @ r 2 = 0 x 34 add r 0, r 2, r 1 @ r 0 = r 2 + r 1 mov r 7, #1 svc 0 @ system call to terminate the program 21
Memory allocation using. byte, . hword, and. word Directive. byte. hword. float Description Allocates one or more bytes of memory, and defines the initial runtime contents of the memory Allocates one or more halfwords of memory, and defines the initial runtime contents of the memory. The data is not aligned. Allocates one or more words of memory and initializes with a floating point number. 22
A Simple Code that Stores Fixed Data in Program Memory @ storing data in program memory. . text. global _start: ldr r 2, =our_fixed_data @ point to our_fixed_data @ load r 0 with the contents of memory pointed to by r 2 ldrb r 0, [r 2] @ terminate the program mov r 7, #1 svc 0 our_fixed_data: . byte 0 x 55, 0 x 33, 1, 2, 3, 4, 5, 6. word 0 x 23222120, 0 x 30. hword 0 x 4540, 0 x 50 23
Defining variables A, B, and C in RAM. text. global _start: @ r 1 = a ldr r 0, =a @ r 0 = addr. of a ldr r 1, [r 0] @ r 1 = value of a @ r 2 = b ldr r 0, =b @ r 0 = addr. of b ldr r 2, [r 0] @ r 2 = value of b @ c = r 1 + r 2 (c = a + b) add r 3, r 1, r 2 @ r 3 = a + b ldr r 0, =c @ r 0 = addr. of c str r 3, [r 0] @ c = r 3 mov r 7, #1 @ terminate the program svc 0 a: b: c: @ allocates the followings in data memory. data. word 5. word 4. word 0 24
Flash memory and PC register . text. global _start: mov r 1, #0 x 25 mov r 2, #0 x 34 add r 0, r 2, r 1 mov r 7, #1 svc 0 start E 3 A 01025 32 -bit +04 E 3 A 02034 +08 E 0810002 +12 E 3 A 07001 +16 EF 000000 10 14 C 4 0 8 6 25
How to speed up the CPU n Increase the clock frequency n n n More frequency More power consumption & more heat Limitations Change the architecture n n Pipelining RISC 26
Changing the architecture RISC vs. CISC n CISC (Complex Instruction Set Computer) n n Put as many instruction as you can into the CPU RISC (Reduced Instruction Set Computer) n Reduce the number of instructions, and use your facilities in a more proper way. 27
RISC architecture n Feature 1 n RISC processors have a fixed instruction size. It makes the task of instruction decoder easier. n n n In ARM the instructions are 4 bytes. In Thumb 2 the instructions are either 2 or 4 bytes. In CISC processors instructions have different lengths n E. g. in 8051 n n n CLR C ADD A, #20 H LJMP HERE ; a 1 -byte instruction ; a 2 -byte instruction ; a 3 -byte instruction 28
RISC architecture n Feature 2: reduce the number of instructions n n Pros: Reduces the number of used transistors Cons: n n Can make the assembly programming more difficult Can lead to using more memory 29
RISC architecture n Feature 3: limit the addressing mode n Advantage n n hardwiring Disadvantage n Can make the assembly programming more difficult 30
RISC architecture n Feature 4: Load/Store LDR R 8, =0 x 20 LDR R 0, [R 8] LDR R 8, =0 x 220 LDR R 1, [R 8] ADD R 0, R 1 LDR R 8, =0 x 230 STR R 0, [R 8] 31
RISC architecture SUB R 3, R 4 LDR R 2, [R 4] ; R 2 = [R 4] ADD R 0, R 1 ; R 20 = R 20 + R 21 LDR R 2, [R 4] SUB R 3, R 4 n Feature 5 (Harvard architecture): separate buses for opcodes and operands Fetch n n Advantage: opcodes and operands can go in and out of the CPU together. Disadvantage: leads to more cost in general purpose computers. Decode Execute Code Memory Control bus Data bus Address bus Control bus CPU Data bus Data Memory Address bus 32
RISC architecture n Feature 6: more than 95% of instructions are executed in 1 machine cycle 33
RISC architecture n Feature 7 n RISC processors have at least 32 registers. Decreases the need for stack and memory usages. n In ARM there are 16 general purpose registers (R 0 to R 15) 34
- Slides: 34