Introduction to Arm Assembly Chapter 2 Sepehr Naimi
Introduction to Arm Assembly Chapter 2 Sepehr Naimi www. Nicer. Land. com
Topics n ARM’s CPU n n n Its architecture Some simple programs Data Memory access Program memory RISC architecture 2
ARM ’s CPU n n n R 0 R 1 ALU R 2 … n ALU 16 General Purpose registers (R 0 to R 15) PC register (R 15) Instruction decoder CPSR: I T H S V N Z C CPU PC R 13 (SP) R 14 (LR) R 15 (PC) registers Instruction decoder Instruction Register 3
CPU 4
Some simple instructions 1. MOV (MOVE) n MOV Rd, #k n n n Rd = k k is an 8 -bit value Example: n R 5 = 53 MOV R 9, #0 x 27 n n MOV Rd, Rs n n Rd = Rs Example: n MOV R 5, #53 n n n MOV R 5, R 2 n n R 5 = R 2 MOV R 9, R 7 n R 9 = R 7 R 9 = 0 x 27 MOV R 3, #2_11101100 5
LDR pseudo-instruction (loading 32 -bit values) n LDR Rd, =k n n n Rd = k k is an 32 -bit value Example: n LDR R 5, =5543 n n LDR R 9, =0 x 123456 n n R 5 = 5543 R 9 = 0 x 123456 LDR R 4, =2_1011011001 6
Some simple instructions 2. Description Arithmetic calculation Instruction ADD Rn to Op 2 and place the result in Rd Opcode destination, source 1, source 2 ADC Rd, Rn, Op 2 ADD Rn to Op 2 with Carry and place the result in Rd n n Rd, Rn, Op 2 * AND Rd, Rn, Op 2 Rn with Op 2 and place the result in Rd Opcodes: ADD, AND SUB, AND, etc. BIC Rd, Rn, Op 2 CMP Rn, Op 2 Examples: n CMN MVN n n n Rn, Op 2 ADD R 5, R 2, R 1 EOR Rd, Rn, Op 2 AND Rn with NOT of Op 2 and place the result in Rd Compare Rn with Op 2 and set the status bits of CPSR** Compare Rn with negative of Op 2 and set the status bits Exclusive OR Rn with Op 2 and place the result in Rd Rd, Op 2 R 5 = R 2 + R 1 Store the negative of Op 2 in Rd MOV Rd, Op 2 Move (Copy) Op 2 to Rd RSC Rd, Rn, Op 2 Subtract Rn from Op 2 with carry and place the result in Rd SBC Rd, Rn, Op 2 Subtract Op 2 from Rn with carry and place the result in Rd SUB Rd, Rn, Op 2 Subtract Op 2 from Rn and place the result in Rd TEQ Rn, Op 2 Exclusive-OR Rn with Op 2 and set the status bits of CPSR TST Rn, Op 2 AND Rn with Op 2 and set the status bits of CPSR SUB R 5, R 9, #23 OR Rn with Op 2 and place the result in Rd ORR Rd, Rn, Op 2 RSB = R 9 Rd, Rn, Op 2 R 5 - 23 Subtract Rn from Op 2 and place the result in Rd * Op 2 can be an immediate 8 -bit value #K which can be 0– 255 in decimal, (00–FF in hex). Op 2 can also be a register Rm. Rd, Rn and Rm are any of the general purpose registers ** CPSR is discussed later in this chapter 7
A simple program n Write a program that calculates 19 + 95 MOV R 6, #19 ; R 6 = 19 MOV R 2, #95 ; R 2 = 95 ADD R 6, R 2 ; R 6 = R 6 + R 2 8
A simple program n Write a program that calculates 19 + 95 - 5 MOV R 1, #19 ; R 6 = 19 MOV R 2, #95 ; R 2 = 95 MOV R 3, #5 ; R 21 = 5 ADD R 6, R 1, R 2 ; R 6 = R 1 + R 2 SUB R 6, R 3 ; R 6 = R 6 - R 3 MOV R 1, #19 ; R 6 = 19 MOV R 2, #95 ; R 2 = 95 ADD R 6, R 1, R 2 ; R 6 = R 1 + R 2 MOV R 2, #5 ; R 21 = 5 SUB R 6, R 2 ; R 6 = R 6 - R 2 9
Status Register (CPSR) CPSR: Negative o. Verflow Zero Interrupt Thumb carry Example: Showthe thestatusof ofthe the. ZZflagafterthe thesubtractionof of 0 x 73 0 x 23 Example: Show the status of the C and Z flags after the addition of Example: Show the status instructions: ofinstructions: the Z theafter subtraction of 0 x 9 C C flag and after Z flags the addition of from 0 x 52 0 x. A 5 inthe following from in following 0 x 0000009 C 0 x. FFFFFF 64 in the following instructions: from 0 x 9 C in and the instructions: 0 x 38 and 0 x 2 F in following the following instructions: LDR R 0, =0 x. A 5 LDR R 0, =0 x 52 LDR R 0, =0 x 9 C; R 6 = 0 x 38 MOV LDR R 6, #0 x 38 LDR R 1, =0 x 23 LDR R 1, =0 x 73 LDR R 1, =0 x. FFFFFF 64 R 1, =0 x 9 C; R 17 = 0 x 2 F MOV LDR R 7, #0 x 2 F SUBS R 0, R 1 ; subtract R 1 from R 0 SUBS R 0, R 1 ; subtract ADDS R 0, R 1 ; add R 1 to R 0, R 1; add R 7 ; subtract R 21 from R 20 ADDS SUBS R 6, R 7 to R 6 Solution: 52 Solution: 0 x. A 5 0101 10100010 0101 38 00000000 0011 1000 1001 1100 -- 9 C 73 0011 0000009 C 0000 0 x 23 0111 0010 0011 00000000 10011100 2 F 00000000 0010 1111 + FFFFFF 64 11111111 01100100 - +DF 9 C 1001 1100 1111 R 0 0 x 82 1101 1000 0010 1111 R 0= =0 x. DF 0 x 82 67 the R 20 00000000 01100111 Z = 10 because has a value than zero after the subtraction. 0000 1 00000000 00 0000 other R 0 =0000 $00 C because bigger than R 0 the and there borrow from D 32 bit. R 0 R 6 =00000 0 x 67 Z because the R 20 ishas zero after subtraction. Z====01 because. R 1 theis R 20 a value other thanis 0 aafter the subtraction. CC==11 because a carry beyond theand D 7 there bit. R 21 isisnot bigger than R 0 R 20 and thereisisno noborrowfrom. D 32 bit. becausethere R 1 is than C = 0 because there is nobigger carry beyond the D 31 bit. Z = 1 because R 0 (the result) has a value 0 in it after the addition. Z = 0 because the R 6 (the result) has a value other than 0 after the addition.
Harvard in ARM 9 and Cortex 11
Memory Map in STM 32 F 103 Example: Add contents of location 0 x 90 to contents of location 0 x 94 STR (Store register) and store the result in location 0 x 20000300. Solution: STR Rx, [Rd] ; [Rd]=Rx LDR (Load register) LDR R 6, =0 x 90 ; R 6 that = 0 x 90 Example: Write a program copies the contents of location 0 x 80 into. LDR location 0 x 88. Example: R 1, [R 6] ; R 1 = [0 x 90] LDR Rd, [Rx] ; Rd = [Rx] Solution: LDR R 6, =0 x 94 ; R 6 = 0 x 94 ; [0 x 20000000]=0 x 12345678 LDRR 2, [R 6] R 2, =0 x 80 LDR ; R 2 == [0 x 94] 0 x 80 Example: ; R 1 LDR R 2, R 1, [R 2] ADD ; R 1 == R 2 [0 x 80] ; R 2 + R 1 LDR R 5, =0 x 12345678 R 4, =0 x 20000000 LDR R 2, =0 x 20000000 LDR R 6, =0 x 20000300 R 2, =0 x 88 ; R 2 LDR ; R 6= =0 x 88 0 x 20000300 LDR R 1, [R 4] STR R 2, [R 6] R 1, [R 2] ; [0 x 88] =STR R 1 = R 5, [R 2] ; [R 2] = R 5 STR ; [0 x 20000300] R 2
LDRB, LDRH, STRB, STRH Data Size Bits Load instruction used Store instruction used Byte 8 LDRB STRB Half-word 16 LDRH STRH Word 32 LDR STR LDR Rd, [Rs] LDRB Rd, [Rs] LDRH Rd, [Rs] STR Rs, [Rd] STRB Rs, [Rd] STRH Rs, [Rd] 13
Memory Map in STM 32 F 103 I/O Register Address GPIOA_LCKR 0 x 40010818 GPIOA_BRR 0 x 40010814 GPIOA_BSRR 0 x 40010810 GPIOA_ODR 0 x 4001080 C GPIOA_IDR 0 x 40010808 GPIOA_CRH 0 x 40010804 GPIOA_CRL 0 x 40010800 Example: Read the contents of GPIOA_IDR. Example: Write 0 x 53 F 6 into GPIOA_ODR. Solution: LDR R 2, =0 x 53 F 6 LDR R 1, =0 x 40010808 ; R 1= 0 x 40010808 LDR ; R 6 R 2, [R 1] = 0 x 53 F 6 ; R 2 = [0 x 4001080 C] LDR R 1, =0 x 4001080 C ; R 1= 0 x 4001080 C STR R 2, [R 1] ; [0 x 4001080 C] = 0 x 53 F 6 14
Some Arm addressing modes n Immediate n n n R 1, #0 x 25 R 6, #0 x 40 F 04 F 0125 Register addressing mode n n n MOV ADD R 2, R 4 R 3, R 2, R 1 EB 020301 Register indirect (indexed) n n STR LDR R 5, [R 6] R 10, [R 3] 15
Assembler Directives 16
Assembler Assembly assembler Machine Language 17
Assembler directives vs. Instructions n n Instructions (e. g. ADD, MOV) tell the CPU what to do Assembler directives tell the assembler what to do n n n AREA IMPORT and EXPORT END DCD, DCW, DCB EQU INCLUDE 18
AREA n n AREA section. Name, attribute 1, attribute 2, … Code: n AREA my. Code, CODE, READONLY Data: AREA MY_PROG, CODE, READONLY n __main n AREA DATA, READWRITE MOV my. Data 1, R 4, #6 ADD R 1, R 2 n AREA my. Const, DATA, READONLY …. my. Func ADD R 2, R 3, R 4 … READWRITE READONLY 19
IMPORT and EXPORT File 1. s ; from the main program: IMPORT MY_FUNC. . . BL MY_FUNC ; call MY_FUNC function. . . File 2. s AREA OUR_EXAMPLE, CODE, READONLY EXPORT MY_FUNC IMPORT DATA 1 MY_FUNC LDR. . . R 1, =DATA 1 20
First Assembly Program EXPORT __main AREA PROG_2_1, CODE, READONLY __main HERE MOV ADD B END R 1, #0 x 25 ; R 1 = 0 x 25 R 2, #0 x 34 ; R 2 = 0 x 34 R 3, R 2, R 1 ; R 3 = R 2 + R 1 HERE ; stay here forever ; end of source file 21
Defining Const. Values using DCD, DCW, and DCB n DCB allocates bytes of memory & initializes them. n Examples: n n DCW allocates a half-word n Example: n n MYVALUE DCB 5 FIBO DCB 1, 1, 2, 3, 5, 8 MY_MSG DCB “Hello World!” MYVALUE DCW 25425 DCD allocates a word of memory n MYDATA DCD 0 x 200000, 0 x 30 F 5, 5000000 22
Storing Fixed Data in Program Memory EXPORT __main AREA PROG 2_2, CODE, READONLY __main LDR R 2, =OUR_FIXED_DATA ; point to OUR_FIXED_DATA LDRB R 0, [R 2] ; load R 0 with the contents ; of memory pointed to by R 2 ADD R 1, R 0 ; add R 0 to R 1 HERE B HERE ; stay here forever AREA LOOKUP_EXAMPLE, DATA, READONLY OUR_FIXED_DATA DCB 0 x 55, 0 x 33, 1, 2, 3, 4, 5, 6 DCD 0 x 23222120, 0 x 30 DCW 0 x 4540, 0 x 50 END 23
Allocating memory using SPACE n SPACE allocates memory without initializing. n Example 1: Allocating 4 bytes of memory: n n Example 2: Allocating 2 bytes: n n MY_LONG SPACE 4 ALFA SPACE 2 Example 3: Allocating an array of 20 bytes: n MY_ARRAY SPACE 20 24
Defining 3 variables A, B, and C __main loop EXPORT __main AREA OUR_PROG, CODE, READONLY ; A = 5 LDR R 0, =A ; R 0 = Addr. of A MOV R 1, #5 ; R 1 = 5 STR R 1, [R 0] ; init. A with 5 ; B = 4 LDR R 0, =B ; R 0 = Addr. of B MOV R 1, #4 ; R 1 = 4 STR R 1, [R 0] ; init. B with 4 ; R 1 = A LDR R 0, =A ; R 0 = Addr. of A LDR R 1, [R 0] ; R 1 = value of A ; R 2 = B LDR R 0, =B ; R 0 = Addr. of A LDR R 2, [R 0] ; R 2 = value of A ; C = R 1 + R 2 (C = A + B) ADD R 3, R 1, R 2 ; R 3 = A + B LDR R 0, =C ; R 0 = Addr. of C STR R 3, [R 0] ; C = R 3 B loop A B C AREA OUR_DATA, READWRITE ; Allocates the followings in SRAM SPACE 4 END int main() { int a = 5; int b = 4; int c = a + b; while(1) { } } 25
ALIGN n ALIGN is used to align data on 32 -bit or 16 -bit boundary. a) DTA DCB END 0 x 55 0 x 22 DCB ALIGN DCB END 0 x 55 2 0 x 22 DCB ALIGN DCB 0 x 55 4 0 x 22 b) DTA c) DTA 26
Assembler Directives EQU and RN n name EQU value n Example: COUNT EQU 0 x 25 MOV R 1, #COUNT MOV R 2, #COUNT + 3 n ; R 1 = 0 x 25 ; R 2 = 0 x 28 Example 2: GPIOA_ODR EQU 0 x 4001080 C n name RN register n Example 1: RESULT RN R 2 MOV RESULT, #23 n Example 2: Prog. Counter RN R 15 27
Assembler Directives INCLUDE n INCLUDE “filename. ext” h. File. inc GPIOA_CRL GPIOA_CRH GPIOA_IDR GPIOA_ODR. . EQU EQU 0 x 40010800 0 x 40010804 0 x 40010808 0 x 4001080 C Program. s include “h. File. inc” 28
Power up in Cortex-M 29
Startup and main files Startup_stm 32 f 10 x. s AREA RESET, DATA, READONLY EXPORT __Vectors DCD __initial_sp ; loc. 0 to 3 (Stack init) DCD Reset_Handler ; loc. 4 to 7. . . Reset_Handler IMPORT. . . main. s PROC __main AREA OUR_EXAMPLE, CODE, READONLY EXPORT __main LDR R 0, =__main BX R 0 __main. . . ; reserving 0 x 400 bytes for stack Stack_Mem AREA STACK, NOINIT, READWRITE, ALIGN=3 SPACE 0 x 400 __initial_sp 30
Flash memory and PC register 0 x 08000200 F 04 F 0125 0 x 08000204 0 x 08000208 F 04 F 0234 0 F 02 EB 020301 0 x 0800020 C E 7 FE 0 x 0800020 E main. lst Line Offset Machine Instruction _ 1 0000 ; The program adds some data 2 0000 EXPORT __main 3 0000 AREA PROG_2_4, CODE, READONLY 4 0000 __main 5 0000 F 04 F 0125 MOV R 1, #0 x 25 ; R 1 = 0 x 25 6 00000004 F 04 F 0234 MOV R 2, #0 x 34 ; R 2 = 0 x 34 7 00000008 EB 02 0301 ADD R 3, R 2, R 1 ; R 3 = R 2 + R 1 8 0000000 C 9 0000000 C E 7 FE HERE B HERE ; stay here forever 10 0000000 E END 0 x 08000200 0 x 0800020 C 0 x 08000208 0 x 08000204 0 x 0800020 E 31
How to speed up the CPU n Increase the clock frequency n n n More frequency More power consumption & more heat Limitations Change the architecture n n n Pipelining Harvard RISC 32
Pipeline n Non-pipeline n n Just fetches, decodes, or executes in a given time Pipeline 33
Pipeline (Cont. ) SUB R 3, R 4 LDR R 2, [R 4] ; R 2 = [R 4] ADD R 0, R 1 ; R 20 = R 20 + R 21 ADD R 0, R 1 LDR R 2, [R 4] SUB R 3, R 4 Fetch Decode Execute 34
Harvard Architecture n separate buses for opcodes and operands n n Advantage: opcodes and operands can go in and out of the CPU together. Disadvantage: Using Harvard architecture in motherboards leads to more cost in general purpose computers. Code Memory Control bus Data bus Address bus Control bus CPU Data bus Data Memory Address bus 35
Changing the architecture RISC vs. CISC n CISC (Complex Instruction Set Computer) n n Put as many instruction as you can into the CPU RISC (Reduced Instruction Set Computer) n Reduce the number of instructions, and use your facilities in a more proper way. 36
RISC architecture n Feature 1 (fixed instruction size) n RISC processors have a fixed instruction size. It makes the task of instruction decoder easier. n n n In ARM the instructions are 4 bytes. In Thumb 2 the instructions are either 2 or 4 bytes. In CISC processors instructions have different lengths n E. g. in 8051 n n n CLR C ADD A, #20 H LJMP HERE ; a 1 -byte instruction ; a 2 -byte instruction ; a 3 -byte instruction 37
RISC architecture n Feature 2: reduce the number of instructions n n Pros: Reduces the number of used transistors Cons: n n Can make the assembly programming more difficult Can lead to using more memory 38
RISC architecture n Feature 3: limit the addressing mode n Advantage n n hardwiring Disadvantage n Can make the assembly programming more difficult 39
RISC architecture n Feature 4: Load/Store LDR LDR ADD LDR STR R 8, =0 x 20 R 0, [R 8] R 8, =0 x 220 R 1, [R 8] R 0, R 1 R 8, =0 x 230 R 0, [R 8] 40
RISC architecture n Feature 5: more than 95% of instructions are executed in 1 machine cycle 41
RISC architecture n Feature 6 n RISC processors have at least 32 registers. Decreases the need for stack and memory usages. n In ARM there are 16 general purpose registers (R 0 to R 15) 42
- Slides: 42