CMPE 450490 ARM Programming 2010 Elliott Durdle Minderman
- Slides: 53
CMPE 450/490 ARM Programming © 2010 Elliott, Durdle, Minderman Portions courtesy of ARM, Greenhill
Greenhill’s MULTI IDE Integrated Development Environment
MULTI • MULTI is a complete Integrated Development Environment (IDE) – Designed especially for embedded systems engineers – To assist them in analyzing, editing, compiling, optimizing, and debugging embedded applications. • The MULTI IDE includes graphical tools for each part of the software development process. – IDE launcher • MULTI Launcher -- The gateway to the MULTI IDE, – Launch any of the primary MULTI tools, access open windows, and manage MULTI workspaces – Editing tools • MULTI Editor, Checkout Browser, Diff Viewer, Hex Editor – Building tools • • MULTI Builder -- A graphical interface for managing and building projects Code. Balance -- A graphical interface for optimizing an executable for size or speed INTEGRATE -- A graphical utility for configuring tasks, connections, and kernel objects across multiple address spaces Linker Directives File Editor -- A graphical editor for creating and modifying linker directives files – Debugging tools • • • MULTI Debugger (multi) -- A graphical source-level debugger Event. Analyzer -- A graphical viewer for monitoring the complex real-time interactions Resource. Analyzer -- A graphical viewer for monitoring the CPU and memory usage Script Debugger -- A graphical debugger for writing, recording, and debugging scripts Serial Terminal -- A serial terminal emulator for connecting to serial ports on embedded devices – Miscellaneous and administrative tools
Launcher
MULTI Debugger (I) • A powerful graphical debugger that supports source, assembly, and mixed-language debugging. • Allows you to perform the following tasks quickly and easily: – Browse, view, and search all aspects of your program code – Download, execute, control, and debug embedded applications written in C, C++, FORTRAN, assembly, or a combination of these languages – View and edit variables, pointers, structures, registers, and memory ranges – Create, view, edit, and remove conditional breakpoints, – View performance profiling, function profiling, memory allocation, code coverage, and stack trace information – Interface seamlessly with the MULTI Editor and the MULTI Builder, or with third-party editors and compilers – Perform multiprocess debugging through a single JTAG connection, even when those processes are running on multiple processors – Perform non-intrusive field debugging of live systems – Develop board setup scripts
MULTI Debugger (II)
Main Debugger Window (I)
Main Debugger Window (II)
Introduction to the ARM 7 Microprocessor Architecture
ARM 7 RISC CPU Architecture • Load/Store architecture • Large Register Bank – Typically thirty two 32 bit registers • Fixed size for all instructions – 32 bits long • • Pipelined execution Single cycle execution Orthogonal Instruction Set Hardwired instruction decode logic
ARM 7 32 -bit RISC Architecture • Von Neumann Enhanced RISC Architecture • Three Stage Pipeline – Fetch, Decode & Execute • Conditional execution of every instruction • 32 -bit flat address space – (4 GB memory map) • Most instructions execute in a single cycle. • Combined ALU and shifter for high speed bit manipulation
ARM 7 RISC Architecture (cont. ) • Powerful multiple load and store instructions combined with auto-indexing addressing modes – Block Copy – Stack Manipulation • Open instruction set extension via coprocessors
ARM Powered Products i. POD, Gameboy, Toshiba PDA, Samsung Video Recorder, etc.
ARM 7 Block Diagram • Von Neumann Architecture • 3 -stage pipeline – fetch, decode, execute • 32 -bit Data Bus • 32 -bit Address Bus • 37 32 -bit registers • 32 -bit ARM instruction set • 16 -bit THUMB instruction set • 32 x 8 Multiplier • Barrel Shifter
Pipeline Organization • 3 -stage pipeline: Fetch – Decode - Execute • Three-cycle latency, one instruction per cycle throughput i n s t r u c t i o n i Fetch i+1 Decode Execute Fetch Decode Execute i+2 Fetch Decode Execute cycle t t+1 t+2 t+3 t+4
Pipeline Organization (2) • Pipeline flushed and refilled on branch, causing execution to slow down • Special features in instruction set eliminate small jumps in code to obtain the best flow through pipeline
Operating Modes • Seven operating modes: – User – Privileged: • System • FIQ • IRQ • Abort • Undefined • Supervisor exception modes
Operating Modes (2) User mode: Exception modes: – Normal program execution mode – Entered upon exception – System resources unavailable – Full access to system resources – Mode changed by exception or software interrupt (trap instruction) – Mode changed freely
Exceptions Exception Mode Priority IV Address Reset Supervisor 1 0 x 0000 Undefined instruction Undefined 6 0 x 00000004 Software interrupt Supervisor 6 0 x 00000008 Prefetch Abort 5 0 x 0000000 C Data Abort 2 0 x 00000010 Interrupt IRQ 4 0 x 00000018 Fast interrupt FIQ 3 0 x 0000001 C Table 1 - Exception types, sorted by Interrupt Vector addresses
ARM Register Organization User Mode FIQ Mode General Purpose Registers IRQ Mode Supervisor Mode Abort Mode Undef Mode
Thumb Code Compression Thumb Code Example In C: Int iabs(intx) { if (x>=0) return x; else return -x; } In ARM Code CMP r 0, #0 RSBLT r 0, #0 MOV PC, lr (12 bytes) In Thumb code CMP r 0, #0 BGE return NEG r 0, r 0 return MOV PC, lr (8 bytes 67%)
ARM 7 TM Block Diagram • • Thumb Features Thumb addresses code density All Thumb instructions are 16 bits long Thumb may be viewed as a compressed form of a subset of the 32 bit ARM instruction set. • Implementations of Thumb use dynamic compression in an ARM instruction pipeline. This logic translates the 16 -bit Thumb instruction into its equivalent 32 bit ARM instruction. • Decompression logic added without compromising cycle time or pipe line latency-Original ARM 7 pipe line did very little work in phase one of the decode cycle. • Programmer’s Model - r 0 -r 7, r 13, r 15
Thumb Applications A typical early embedded system, e. g. a mobile phone, will include a small amount of fast 32 -bit memory (to store speedcritical DSP code) and 16 -bit off-chip memory to store the control code. – – – Thumb code requires 70% of the space of the ARM code Thumb code uses 40% more instructions than ARM code With 32 -bit memory, the ARM code is 40% faster than Thumb code With 16 -bit memory, the Thumb code is 45% faster than ARM code Thumb code uses 30% less external memory power than ARM code
ARM 7 Family 60 -100 MIPS
Code Examples
Example 1
Basic Arithmetic Operations ADD r 0, r 1, r 2 ; r 0: = r 1 + r 2 ADC r 0, r 1, r 2 ; r 0: = r 1 + r 2 +C SUB r 0, r 1, r 2 ; r 0: = r 1 - r 2 SBC r 0, r 1, r 2 ; r 0: = r 1 - r 2 + C - 1 RSB r 0, r 1, r 2 ; r 0: = r 2 – r 1 RSC r 0, r 1, r 2 ; r 0: = r 2 – r 1 + C - 1
Extended Precision • E. g. Add two 64 bit numbers X and Y and store in Z Store X in r 1: r 0 and Y in r 3: r 2 and Z in r 5: r 4 ADDS r 4, r 0, r 2 ; add least sig. word, result in r 4 ADC ; add most sig. word, result in r 5, r 1, r 3
Operations with Shifts ADD r 3, r 2, r 1, LSL #3 ADD r 5, r 3, LSL r 2 ; Types of shift LSR, LSL, ASR, ROR, RRX
ARM Instructions I
Instruction Set • Two instruction sets: – ARM • Standard 32 -bit instruction set – THUMB • 16 -bit compressed form • Code density better than most CISC • Dynamic decompression in pipeline
ARM Instruction Set • Features: – Load / Store architecture – 3 -address data processing instructions – Conditional execution – Load / Store multiple registers – Shift & ALU operation in single clock cycle
ARM Instruction Set (2) • Conditional execution: – Each data processing instruction prefixed by condition code – Result – smooth flow of instructions through pipeline – 16 condition codes: EQ equal MI negative HI unsigned higher NE not equal PL positive or zero LS unsigned lower LE or same signed less than or equal CS unsigned higher or same VS overflow GE signed greater than or equal AL always CC unsigned lower VC no overflow LT signed less than NV special purpose GT signed greater than
ARM Instruction Set (3) ARM instruction set Data processing instructions Block transfer instructions Data transfer instructions Branching instructions Multiply instructions Software interrupt instructions
Data Processing Instructions • Arithmetic and logical operations • 3 -address format: – Two 32 -bit operands (op 1 is register, op 2 is register or immediate) – 32 -bit result placed in a register • Barrel shifter for operand 2 allows full 32 -bit shift within instruction cycle
Data Processing Instructions (2) • Arithmetic operations: – ADD, ADDC, SUBC, RSB, RSC • Bit-wise logical operations: – AND, EOR, ORR, BIC • Register movement operations: – MOV, MVN • Comparison operations: – TST, TEQ, CMP, CMN
Data Processing Instructions e. g. : if (z==1) R 1=R 2+(R 3*4) compiles to EQADDS R 1, R 2, R 3, LSL #2 ( SINGLE INSTRUCTION ! )
Data Transfer Instructions • Load/store instructions • Used to move signed and unsigned Word, Half Word and Byte to and from registers • Can be used to load PC (if target address is beyond branch instruction range) LDR Load Word STR Store Word LDRH Load Half Word STRH Store Half Word LDRSH Load Signed Half Word STRSH Store Signed Half Word LDRB Load Byte STRB Store Byte LDRSB Load Signed Byte STRSB Store Signed Byte
Block Transfer Instructions • Load / Store Multiple instructions (LDM / STM) • Whole register bank or a subset copied to memory or restored with single instruction LDM R 0 Mi Mi+1 Mi+2 R 1 R 2 Mi+14 R 15 Mi+15 STM
ARM Addressing Modes
Addressing Modes • Immediate Addressing – The desired value is a binary value in the instruction • Absolute Addressing – The instruction contains the full binary address • Indirect addressing – The instruction contains the binary address of a memory location containing the binary address • Base relative addressing – Plus offset – Plus index – Plus scaled index • Stack addressing
Immediate Addressing • Used to load an immediate 8 -bit value into a register e. g. mov r 0, #0 x. FF • Used to control the operation of the barrel shifter on the 3 rd operand e. g. add r 3, r 2, r 1 LSL#3 ; r 3 : = r 2 + 8 x r 1
Absolute Addressing • To load an absolute address into a register example: start: ldr r 1, =address ldr r 0, [r 1] address: . word 0 x 15000000
Indirect Addressing ldr r 0, [r 1] str r 0, [r 1] ; r 0: = mem 32[r 1] ; mem 32[r 1] : =r 0
Base Plus Offset Addressing ldr r 0, [r 1, #4] r 1 is not altered Another form is And another ldr r 0, [r 1, #4]! !==update ldr r 0, [r 1], #4 ; r 0: = mem 32[r 1+4] ; r 1 : = r 1+4 ; r 0 : = mem 32[r 1] ; r 1= r 1+4
Base Plus Index Addressing ldr r 1, =base ; load r 1 with base address ldr r 2, =index ; load r 2 with and index ldr r 0, [r 1, r 2] ; get data record into r 0
Base Plus Scaled Index Addressing • ldr r 1, =base ; load r 1 with base address • ldr r 2, =index ; load r 2 with and index • ldr r 0, [r 1, r 2, LSL #2] ; r 0: = mem 32[r 1+4*r 2]
Direct functionality of Block Data Transfer • When LDM / STM are not being used to implement stacks, it is clearer to specify exactly what functionality of the instruction is: – i. e. specify whether to increment / decrement the base pointer, before or after the memory access. • In order to do this, LDM / STM support a further syntax in addition to the stack one: – – STMIA / LDMIA : Increment After : STMIB / LDMIB : Increment Before : STMDA / LDMDA : Decrement After : STMDB / LDMDB : Decrement Before: int *p; t = p++; ++p p---p
Example: Block Copy – Copy a block of memory, which is an exact multiple of 12 words long from the location pointed to by r 12 to the location pointed to by r 13. r 14 points to the end of block to be copied. ; r 12 points to the start of the source data ; r 14 points to the end of the source data ; r 13 points to the start of the destination data loop LDMIA r 12!, {r 0 -r 11} ; load 48 bytes STMIA r 13!, {r 0 -r 11} ; and store them CMP r 12, r 14 ; check for the end BNE loop ; and loop until done – This loop transfers 48 bytes in 31 cycles – Over 50 Mbytes/sec at 33 MHz r 13 r 14 r 12 Increasing Memory
Stacks • A stack is an area of memory which grows as new data is “pushed” onto the “top” of it, and shrinks as data is “popped” off the top. • Two pointers define the current limits of the stack. – A base pointer • used to point to the “bottom” of the stack (the first location). – A stack pointer • used to point the current “top” of the stack. PUSH {1, 2, 3} SP POP 3 2 SP BASE SP 1 BASE 2 1 BASE Result of pop = 3
Stack Operation • Traditionally, a stack grows down in memory, with the last “pushed” value at the lowest address. The ARM also supports ascending stacks, where the stack structure grows up through memory. • The value of the stack pointer can either: – Point to the last occupied address (Full stack) • and so needs pre-decrementing (i. e. before the push) – Point to the next occupied address (Empty stack) • and so needs post-decrementing (i. e. after the push) • The stack type to be used is given by the postfix to the instruction: – – STMFD / LDMFD : Full Descending stack STMFA / LDMFA : Full Ascending stack. STMED / LDMED : Empty Descending stack STMEA / LDMEA : Empty Ascending stack
Stack Examples STMED sp!, {r 0, r 1, r 3 -r 5} STMFD sp!, {r 0, r 1, r 3 -r 5} STMEA sp!, {r 0, r 1, r 3 -r 5} STMFA sp!, {r 0, r 1, r 3 -r 5} 0 x 418 SP Old SP r 5 SP r 4 r 3 r 1 r 0 r 5 r 4 r 3 r 1 r 0 Old SP r 5 r 4 r 3 r 1 r 0 SP Old SP r 5 r 4 r 3 r 1 r 0 0 x 400 SP 0 x 3 e 8
Stacks and Subroutines • • One use of stacks is to create temporary register workspace for subroutines. Any registers that are needed can be pushed onto the stack at the start of the subroutine and popped off again at the end so as to restore them before return to the caller : STMFD sp!, {r 0 -r 12, lr}. . . . LDMFD sp!, {r 0 -r 12, pc} ; stack all registers ; and the return address ; load all the registers ; and return automatically
- Lever classes
- Effort arm and resistance arm
- Linker arm left arm
- Reverse aker clasp indication
- Classification of retainers
- How do machines make work easier
- Code in assembly language
- Arm programming model
- Dana elliott
- Penfolds wine v elliott
- Sam elliott manager
- Dr suzanne elliott
- Michael elliott md
- Dr jodi elliott
- Kathryn elliott
- Pierre elliott trudeau high school courses
- Joseph elliott actor
- Pierre elliott trudeau high school courses
- Timo elliott sap
- Amelia a. erwitt
- Elliott business software
- Pierre elliott trudeau hs
- Elliott cheu
- Elliott lancaster
- Clark elliott depaul
- Yrdsb career cruising
- Sandy elliott
- Dr elliott mayo clinic
- Pargo v elliott
- Elliott durham school
- Cmpe 280
- Cmpe 273
- Cmpe 212
- Cmpe 280
- Pseudocode flowchart example
- Cmpe 150
- Cmpe emu edu tr
- Cmpe 252
- Cmpe 273
- Duygu çelik ertuğrul
- Cmpe 280
- Qian chen ucsc
- Kmeler
- Cmpe 294
- Cmpe150
- Georg cantor vally guttmann
- Cmpe 252
- Cmpe 252
- Qian chen ucsc
- Cmpe 226
- Sjsu cmpe 131
- Computer engineering emu
- Variste galois
- Cmpe 280