ARM Assembly Programming II Computer Organization and Assembly
ARM Assembly Programming II Computer Organization and Assembly Languages Yung-Yu Chuang 2007/11/26 with slides by Peng-Sheng Chen
GNU compiler and binutils • HAM uses GNU compiler and binutils – gcc: GNU C compiler – as: GNU assembler – ld: GNU linker – gdb: GNU project debugger – insight: a (Tcl/Tk) graphic interface to gdb
Pipeline • COFF (common object file format) • ELF (extended linker format) • Segments in the object file – Text: code – Data: initialized global variables – BSS: uninitialized global variables . c C source gcc . s as asm source . coff ld object file . elf executable Simulator Debugger …
GAS program format. file “test. s”. text. global main. type main, %function main: MOV R 0, #100 ADD R 0, R 0 SWI #11. end
GAS program format export variable main: signals the end of the program . file “test. s”. text. global main. type main, %function set the type of a MOV R 0, #100 symbol to be ADD R 0, R 0 either a function or an object SWI #11. end call interrupt to end the program
ARM assembly program label operation operand comments main: LDR STR SWI R 1, value R 1, result #11 value: . word 0 x 0000 C 123 result: . word 0 @ load value
Control structures • Program is to implement algorithms to solve problems. Program decomposition and flow of control are important concepts to express algorithms. • Flow of control: – Sequence. – Decision: if-then-else, switch – Iteration: repeat-until, do-while, for • Decomposition: split a problem into several smaller and manageable ones and solve them independently. (subroutines/functions/procedures)
Decision • If-then-else • switch
If statements if C then T else E C BNE else T B else: endif: endif E // find maximum if (R 0>R 1) then R 2: =R 0 else R 2: =R 1
If statements if C then T else E // find maximum if (R 0>R 1) then R 2: =R 0 else R 2: =R 1 C BNE else CMP T B else: endif: endif E else: endif: R 0, R 1 BLE else MOV R 2, R 0 B endif MOV R 2, R 1
If statements Two other options: // find maximum if (R 0>R 1) then R 2: =R 0 else R 2: =R 1 CMP R 0, R 1 MOVGT R 2, R 0 MOVLE R 2, R 1 MOV R 2, R 0 CMP R 0, R 1 MOVLE R 2, R 1 CMP else: endif: R 0, R 1 BLE else MOV R 2, R 0 B endif MOV R 2, R 1
If statements if (R 1==1 || R 1==5 || R 1==12) R 0=1; TEQNE MOVEQ R 1, R 0, #1 #5 #12 #1 . . BNE fail
If statements if (R 1==0) zero else if (R 1>0) plus else if (R 1<0) neg TEQ R 1, #0 BMI neg BEQ zero BPL plus neg: . . . B exit Zero: . . . B exit. . .
If statements R 0=abs(R 0) TEQ RSBMI R 0, #0
Multi-way branches CMP BCC CMP BLS CMP BCC CMP BHI letter: . . . R 0, #`0’ other @ less than ‘ 0’ R 0, #`9’ digit @ between ‘ 0’ and ‘ 9’ R 0, #`A’ other R 0, #`Z’ letter @ between ‘A’ and ‘Z’ R 0, #`a’ other R 0, #`z’ other @ not between ‘a’ and ‘z’
Switch statements switch (exp) { case c 1: S 1; break; case c 2: S 2; break; . . . case c. N: SN; break; default: SD; } e=exp; if (e==c 1) {S 1} else if (e==c 2) {S 2} else. . .
Switch statements switch (R 0) { case 0: S 0; break; case 1: S 1; break; case 2: S 2; break; case 3: S 3; break; default: err; } The range is between 0 and N Slow if N is large CMP R 0, BEQ S 0 CMP R 0, BEQ S 1 CMP R 0, BEQ S 2 CMP R 0, BEQ S 3 err: . . . B exit S 0: . . . B exit #0 #1 #2 #3
Switch statements ADR CMP LDRLS err: . . . B S 0: . . . JMPTBL: . word R 1, JMPTBL What if the range is between M and N? R 0, #3 PC, [R 1, R 0, LSL #2] exit For larger N and sparse values, we could use a hash function. R 1 S 0 S 1 S 2 S 3 JMPTBL R 0 S 1 S 2 S 3
Iteration • repeat-until • do-while • for
repeat loops do { S } while ( C ) loop: S C BEQ endw: loop
while loops while ( C ) { S } loop: C B test loop: S BNE endw S B endw: test: C BEQ loop endw:
while loops while ( C ) { S } C BNE endw B test loop: S test: C S test: BEQ loop endw: C endw:
GCD int gcd (int i, int j) { while (i!=j) { if (i>j) i -= j; else j -= i; } }
GCD Loop: CMP SUBGT SUBLT BNE R 1, R 2, R 2, R 1 loop
for loops for ( I ; C ; A ) { S } I loop: C BNE endfor S A B endfor: loop for (i=0; i<10; i++) { a[i]: =0; }
for loops for ( I ; C ; A ) { S } for (i=0; i<10; i++) { a[i]: =0; } I loop: MOV C ADR MOV BNE endfor loop: CMP S BGE STR A #2] B loop ADD endfor: B endfor: R 0, #0 R 2, A R 1, #0 R 1, #10 endfor R 0, [R 2, R 1, LSL R 1, #1 loop
for loops for (i=0; i<10; i++) { do something; } MOV R 1, #0 loop: CMP R 1, #10 BGE endfor @ do something ADD R 1, #1 B loop endfor: Execute a loop for a constant of times. MOV R 1, #10 loop: @ do something SUBS R 1, #1 BNE loop endfor:
Procedures • Arguments: expressions passed into a function • Parameters: values received by the function • Caller and callee void func(int a, int b) callee { parameters. . . } int main(void) caller arguments { func(100, 200); . . . }
Procedures main: . . . BL func. . end func: . . . . end • How to pass arguments? By registers? By stack? By memory? In what order?
Procedures main: caller @ use R 5 BL func @ use R 5. . . . end callee func: . . . @ use R 5. . . . end • How to pass arguments? By registers? By stack? By memory? In what order? • Who should save R 5? Caller? Callee?
Procedures (caller save) main: caller @ use R 5 @ save R 5 BL func @ restore R 5 @ use R 5. end callee func: . . . @ use R 5 . end • How to pass arguments? By registers? By stack? By memory? In what order? • Who should save R 5? Caller? Callee?
Procedures (callee save) main: caller @ use R 5 BL func @ use R 5 . end callee func: @ save R 5. . . @ use R 5 @restore R 5. end • How to pass arguments? By registers? By stack? By memory? In what order? • Who should save R 5? Caller? Callee?
Procedures main: caller @ use R 5 BL func @ use R 5. . . . end callee func: . . . @ use R 5. . . . end • How to pass arguments? By registers? By stack? By memory? In what order? • Who should save R 5? Caller? Callee? • We need a protocol for these.
ARM Procedure Call Standard (APCS) • ARM Ltd. defines a set of rules for procedure entry and exit so that – Object codes generated by different compilers can be linked together – Procedures can be called between high-level languages and assembly • APCS defines – – Use of registers Use of stack Format of stack-based data structure Mechanism for argument passing
APCS register usage convention
APCS register usage convention • Used to pass the first 4 parameters • Caller-saved if necessary
APCS register usage convention • Register variables, must return unchanged • Callee-saved
APCS register usage convention • Registers for special purposes • Could be used as temporary variables if saved properly.
Argument passing • The first four word arguments are passed through R 0 to R 3. • Remaining parameters are pushed into stack in the reverse order. • Procedures with less than four parameters are more effective.
Return value • One word value in R 0 • A value of length 2~4 words (R 0 -R 1, R 0 -R 2, R 0 R 3)
Function entry/exit • A simple leaf function with less than four parameters has the minimal overhead. 50% of calls are to leaf functions BL leaf 1. . . leaf 1: . . . MOV PC, LR @ return
Function entry/exit • Save a minimal set of temporary variables BL leaf 2. . . leaf 2: STMFD sp!, {regs, lr} @ save. . . LDMFD sp!, {regs, pc} @ restore and @ return
Standard ARM C program address space application load address top of application code application image static data heap top of heap stack limit (sl) stack pointer (sp) stack top of memory
Accessing operands • A procedure often accesses operand in the following ways – An argument passed on a register: no further work – An argument passed on the stack: use stack pointer (R 13) relative addressing with an immediate offset known at compiling time – A constant: PC-relative addressing, offset known at compiling time – A local variable: allocate on the stack and access through stack pointer relative addressing – A global variable: allocated in the static area and can be accessed by the static base relative (R 9) addressing
Procedure low main: LDR. . . BL. . . R 0, #0 func high stack
Procedure func: low STMFD SP!, {R 4 -R 6, LR} SUB SP, #0 x. C. . . STR R 0, [SP, #0] @ v 1=a 1. . . ADD SP, #0 x. C LDMFD SP!, {R 4 -R 6, PC} high v 1 v 2 v 3 R 4 R 5 R 6 LR stack
Block copy example void bcopy(char *to, char *from, int n) { while (n--) *to++ = *from++; }
Block copy example @ arguments: bcopy: TEQ BEQ loop: SUB LDRB STRB B end: MOV R 0: to, R 1: from, R 2: n R 2, #0 end R 2, #1 R 3, [R 1], #1 R 3, [R 0], #1 bcopy PC, LR
Block copy example @ arguments: R 0: to, R 1: from, R 2: n @ rewrite “n–-” as “-–n>=0” bcopy: SUBS R 2, #1 LDRPLB R 3, [R 1], #1 STRPLB R 3, [R 0], #1 BPL bcopy MOV PC, LR
Block copy example @ arguments: R 0: to, R 1: from, R 2: n @ assume n is a multiple of 4; loop unrolling bcopy: SUBS R 2, #4 LDRPLB R 3, [R 1], #1 STRPLB R 3, [R 0], #1 BPL bcopy MOV PC, LR
Block copy example @ arguments: R 0: to, R 1: from, R 2: n @ n is a multiple of 16; bcopy: SUBS R 2, #16 LDRPL R 3, [R 1], #4 STRPL R 3, [R 0], #4 BPL bcopy MOV PC, LR
Block copy example @ arguments: R 0: to, R 1: from, R 2: n @ n is a multiple of 16; bcopy: SUBS R 2, #16 LDMPL R 1!, {R 3 -R 6} STMPL R 0!, {R 3 -R 6} BPL bcopy MOV PC, LR @ could be extend to copy 40 byte at a time @ if not multiple of 40, add a copy_rest loop
Search example int main(void) { int a[10]={7, 6, 4, 5, 5, 1, 3, 2, 9, 8}; int i; int s=4; for (i=0; i<10; i++) if (s==a[i]) break; if (i>=10) return -1; else return i; }
Search. section. LC 0: . word . rodata 7 6 4 5 5 1 3 2 9 8
Search. text low . global main. type main, %function s i main: sub sp, #48 a[0] adr r 4, L 9 @ =. LC 0 add r 5, sp, #8 : ldmia r 4!, {r 0, r 1, r 2, r 3} a[9] stmia r 5!, {r 0, r 1, r 2, r 3} ldmia r 4!, {r 0, r 1, r 2, r 3} stmia r 5!, {r 0, r 1, r 2, r 3} ldmia r 4!, {r 0, r 1} high stmia r 5!, {r 0, r 1} stack
Search mov str r 3, #4 [sp, #0] @ s=4 #0 [sp, #4] @ i=0 loop: ldr cmp bge ldr mov mul add ldr r 0, end r 1, r 2, r 3, r 4, [sp, #4] @ r 0=i #10 @ i<10? low s i a[0] : a[9] [sp, #0] @ r 1=s #4 r 0, r 2 r 3, #8 [sp, r 3] @ r 4=a[i] high stack
Search teq beq add str b r 1, r 4 end @ test if s==a[i]low s i a[0] r 0, #1 @ i++ r 0, [sp, #4] @ update i loop end: str cmp movge add mov r 0, sp, pc, [sp, #4] #10 #-1 sp, #48 lr : a[9] high stack
Optimization • • Remove unnecessary load/store Remove loop invariant Use addressing mode Use conditional execution
Search (remove load/store) mov str r 3, r 1, r 3, r 0, r 3, #4 [sp, #0] @ s=4 #0 [sp, #4] @ i=0 loop: ldr cmp bge ldr mov mul add ldr r 0, end r 1, r 2, r 3, r 4, [sp, #4] @ r 0=i #10 @ i<10? low s i a[0] : a[9] [sp, #0] @ r 1=s #4 r 0, r 2 r 3, #8 [sp, r 3] @ r 4=a[i] high stack
Search (remove load/store) teq beq add str b r 1, r 4 end @ test if s==a[i]low s i a[0] r 0, #1 @ i++ r 0, [sp, #4] @ update i loop end: str cmp movge add mov r 0, sp, pc, [sp, #4] #10 #-1 sp, #48 lr : a[9] high stack
Search (loop invariant/addressing mode) mov str r 3, r 1, r 3, r 0, r 3, #4 [sp, #0] @ s=4 #0 [sp, #4] @ i=0 low s i a[0] mov r 2, sp, #8 loop: ldr r 0, [sp, #4] @ r 0=i cmp r 0, #10 @ i<10? : bge end a[9] ldr r 1, [sp, #0] @ r 1=s mov r 2, #4 mul r 3, r 0, r 2 add r 3, #8 ldr r 4, [sp, r 3] @ r 4=a[i] high stack ldr r 4, [r 2, r 0, LSL #2]
Search (conditional execution) teq beq r 1, r 4 end @ test if s==a[i]low s i a[0] addeq add r 0, #1 @ i++ str r 0, [sp, #4] @ update i beq b loop end: str cmp movge add mov r 0, sp, pc, [sp, #4] #10 #-1 sp, #48 lr : a[9] high stack
Optimization • • Remove unnecessary load/store Remove loop invariant Use addressing mode Use conditional execution • From 22 words to 13 words and execution time is greatly reduced.
- Slides: 63