CMPE 450490 ARM Programming 2010 Elliott Durdle Minderman

  • Slides: 53
Download presentation
CMPE 450/490 ARM Programming © 2010 Elliott, Durdle, Minderman Portions courtesy of ARM, Greenhill

CMPE 450/490 ARM Programming © 2010 Elliott, Durdle, Minderman Portions courtesy of ARM, Greenhill

Greenhill’s MULTI IDE Integrated Development Environment

Greenhill’s MULTI IDE Integrated Development Environment

MULTI • MULTI is a complete Integrated Development Environment (IDE) – Designed especially for

MULTI • MULTI is a complete Integrated Development Environment (IDE) – Designed especially for embedded systems engineers – To assist them in analyzing, editing, compiling, optimizing, and debugging embedded applications. • The MULTI IDE includes graphical tools for each part of the software development process. – IDE launcher • MULTI Launcher -- The gateway to the MULTI IDE, – Launch any of the primary MULTI tools, access open windows, and manage MULTI workspaces – Editing tools • MULTI Editor, Checkout Browser, Diff Viewer, Hex Editor – Building tools • • MULTI Builder -- A graphical interface for managing and building projects Code. Balance -- A graphical interface for optimizing an executable for size or speed INTEGRATE -- A graphical utility for configuring tasks, connections, and kernel objects across multiple address spaces Linker Directives File Editor -- A graphical editor for creating and modifying linker directives files – Debugging tools • • • MULTI Debugger (multi) -- A graphical source-level debugger Event. Analyzer -- A graphical viewer for monitoring the complex real-time interactions Resource. Analyzer -- A graphical viewer for monitoring the CPU and memory usage Script Debugger -- A graphical debugger for writing, recording, and debugging scripts Serial Terminal -- A serial terminal emulator for connecting to serial ports on embedded devices – Miscellaneous and administrative tools

Launcher

Launcher

MULTI Debugger (I) • A powerful graphical debugger that supports source, assembly, and mixed-language

MULTI Debugger (I) • A powerful graphical debugger that supports source, assembly, and mixed-language debugging. • Allows you to perform the following tasks quickly and easily: – Browse, view, and search all aspects of your program code – Download, execute, control, and debug embedded applications written in C, C++, FORTRAN, assembly, or a combination of these languages – View and edit variables, pointers, structures, registers, and memory ranges – Create, view, edit, and remove conditional breakpoints, – View performance profiling, function profiling, memory allocation, code coverage, and stack trace information – Interface seamlessly with the MULTI Editor and the MULTI Builder, or with third-party editors and compilers – Perform multiprocess debugging through a single JTAG connection, even when those processes are running on multiple processors – Perform non-intrusive field debugging of live systems – Develop board setup scripts

MULTI Debugger (II)

MULTI Debugger (II)

Main Debugger Window (I)

Main Debugger Window (I)

Main Debugger Window (II)

Main Debugger Window (II)

Introduction to the ARM 7 Microprocessor Architecture

Introduction to the ARM 7 Microprocessor Architecture

ARM 7 RISC CPU Architecture • Load/Store architecture • Large Register Bank – Typically

ARM 7 RISC CPU Architecture • Load/Store architecture • Large Register Bank – Typically thirty two 32 bit registers • Fixed size for all instructions – 32 bits long • • Pipelined execution Single cycle execution Orthogonal Instruction Set Hardwired instruction decode logic

ARM 7 32 -bit RISC Architecture • Von Neumann Enhanced RISC Architecture • Three

ARM 7 32 -bit RISC Architecture • Von Neumann Enhanced RISC Architecture • Three Stage Pipeline – Fetch, Decode & Execute • Conditional execution of every instruction • 32 -bit flat address space – (4 GB memory map) • Most instructions execute in a single cycle. • Combined ALU and shifter for high speed bit manipulation

ARM 7 RISC Architecture (cont. ) • Powerful multiple load and store instructions combined

ARM 7 RISC Architecture (cont. ) • Powerful multiple load and store instructions combined with auto-indexing addressing modes – Block Copy – Stack Manipulation • Open instruction set extension via coprocessors

ARM Powered Products i. POD, Gameboy, Toshiba PDA, Samsung Video Recorder, etc.

ARM Powered Products i. POD, Gameboy, Toshiba PDA, Samsung Video Recorder, etc.

ARM 7 Block Diagram • Von Neumann Architecture • 3 -stage pipeline – fetch,

ARM 7 Block Diagram • Von Neumann Architecture • 3 -stage pipeline – fetch, decode, execute • 32 -bit Data Bus • 32 -bit Address Bus • 37 32 -bit registers • 32 -bit ARM instruction set • 16 -bit THUMB instruction set • 32 x 8 Multiplier • Barrel Shifter

Pipeline Organization • 3 -stage pipeline: Fetch – Decode - Execute • Three-cycle latency,

Pipeline Organization • 3 -stage pipeline: Fetch – Decode - Execute • Three-cycle latency, one instruction per cycle throughput i n s t r u c t i o n i Fetch i+1 Decode Execute Fetch Decode Execute i+2 Fetch Decode Execute cycle t t+1 t+2 t+3 t+4

Pipeline Organization (2) • Pipeline flushed and refilled on branch, causing execution to slow

Pipeline Organization (2) • Pipeline flushed and refilled on branch, causing execution to slow down • Special features in instruction set eliminate small jumps in code to obtain the best flow through pipeline

Operating Modes • Seven operating modes: – User – Privileged: • System • FIQ

Operating Modes • Seven operating modes: – User – Privileged: • System • FIQ • IRQ • Abort • Undefined • Supervisor exception modes

Operating Modes (2) User mode: Exception modes: – Normal program execution mode – Entered

Operating Modes (2) User mode: Exception modes: – Normal program execution mode – Entered upon exception – System resources unavailable – Full access to system resources – Mode changed by exception or software interrupt (trap instruction) – Mode changed freely

Exceptions Exception Mode Priority IV Address Reset Supervisor 1 0 x 0000 Undefined instruction

Exceptions Exception Mode Priority IV Address Reset Supervisor 1 0 x 0000 Undefined instruction Undefined 6 0 x 00000004 Software interrupt Supervisor 6 0 x 00000008 Prefetch Abort 5 0 x 0000000 C Data Abort 2 0 x 00000010 Interrupt IRQ 4 0 x 00000018 Fast interrupt FIQ 3 0 x 0000001 C Table 1 - Exception types, sorted by Interrupt Vector addresses

ARM Register Organization User Mode FIQ Mode General Purpose Registers IRQ Mode Supervisor Mode

ARM Register Organization User Mode FIQ Mode General Purpose Registers IRQ Mode Supervisor Mode Abort Mode Undef Mode

Thumb Code Compression Thumb Code Example In C: Int iabs(intx) { if (x>=0) return

Thumb Code Compression Thumb Code Example In C: Int iabs(intx) { if (x>=0) return x; else return -x; } In ARM Code CMP r 0, #0 RSBLT r 0, #0 MOV PC, lr (12 bytes) In Thumb code CMP r 0, #0 BGE return NEG r 0, r 0 return MOV PC, lr (8 bytes 67%)

ARM 7 TM Block Diagram • • Thumb Features Thumb addresses code density All

ARM 7 TM Block Diagram • • Thumb Features Thumb addresses code density All Thumb instructions are 16 bits long Thumb may be viewed as a compressed form of a subset of the 32 bit ARM instruction set. • Implementations of Thumb use dynamic compression in an ARM instruction pipeline. This logic translates the 16 -bit Thumb instruction into its equivalent 32 bit ARM instruction. • Decompression logic added without compromising cycle time or pipe line latency-Original ARM 7 pipe line did very little work in phase one of the decode cycle. • Programmer’s Model - r 0 -r 7, r 13, r 15

Thumb Applications A typical early embedded system, e. g. a mobile phone, will include

Thumb Applications A typical early embedded system, e. g. a mobile phone, will include a small amount of fast 32 -bit memory (to store speedcritical DSP code) and 16 -bit off-chip memory to store the control code. – – – Thumb code requires 70% of the space of the ARM code Thumb code uses 40% more instructions than ARM code With 32 -bit memory, the ARM code is 40% faster than Thumb code With 16 -bit memory, the Thumb code is 45% faster than ARM code Thumb code uses 30% less external memory power than ARM code

ARM 7 Family 60 -100 MIPS

ARM 7 Family 60 -100 MIPS

Code Examples

Code Examples

Example 1

Example 1

Basic Arithmetic Operations ADD r 0, r 1, r 2 ; r 0: =

Basic Arithmetic Operations ADD r 0, r 1, r 2 ; r 0: = r 1 + r 2 ADC r 0, r 1, r 2 ; r 0: = r 1 + r 2 +C SUB r 0, r 1, r 2 ; r 0: = r 1 - r 2 SBC r 0, r 1, r 2 ; r 0: = r 1 - r 2 + C - 1 RSB r 0, r 1, r 2 ; r 0: = r 2 – r 1 RSC r 0, r 1, r 2 ; r 0: = r 2 – r 1 + C - 1

Extended Precision • E. g. Add two 64 bit numbers X and Y and

Extended Precision • E. g. Add two 64 bit numbers X and Y and store in Z Store X in r 1: r 0 and Y in r 3: r 2 and Z in r 5: r 4 ADDS r 4, r 0, r 2 ; add least sig. word, result in r 4 ADC ; add most sig. word, result in r 5, r 1, r 3

Operations with Shifts ADD r 3, r 2, r 1, LSL #3 ADD r

Operations with Shifts ADD r 3, r 2, r 1, LSL #3 ADD r 5, r 3, LSL r 2 ; Types of shift LSR, LSL, ASR, ROR, RRX

ARM Instructions I

ARM Instructions I

Instruction Set • Two instruction sets: – ARM • Standard 32 -bit instruction set

Instruction Set • Two instruction sets: – ARM • Standard 32 -bit instruction set – THUMB • 16 -bit compressed form • Code density better than most CISC • Dynamic decompression in pipeline

ARM Instruction Set • Features: – Load / Store architecture – 3 -address data

ARM Instruction Set • Features: – Load / Store architecture – 3 -address data processing instructions – Conditional execution – Load / Store multiple registers – Shift & ALU operation in single clock cycle

ARM Instruction Set (2) • Conditional execution: – Each data processing instruction prefixed by

ARM Instruction Set (2) • Conditional execution: – Each data processing instruction prefixed by condition code – Result – smooth flow of instructions through pipeline – 16 condition codes: EQ equal MI negative HI unsigned higher NE not equal PL positive or zero LS unsigned lower LE or same signed less than or equal CS unsigned higher or same VS overflow GE signed greater than or equal AL always CC unsigned lower VC no overflow LT signed less than NV special purpose GT signed greater than

ARM Instruction Set (3) ARM instruction set Data processing instructions Block transfer instructions Data

ARM Instruction Set (3) ARM instruction set Data processing instructions Block transfer instructions Data transfer instructions Branching instructions Multiply instructions Software interrupt instructions

Data Processing Instructions • Arithmetic and logical operations • 3 -address format: – Two

Data Processing Instructions • Arithmetic and logical operations • 3 -address format: – Two 32 -bit operands (op 1 is register, op 2 is register or immediate) – 32 -bit result placed in a register • Barrel shifter for operand 2 allows full 32 -bit shift within instruction cycle

Data Processing Instructions (2) • Arithmetic operations: – ADD, ADDC, SUBC, RSB, RSC •

Data Processing Instructions (2) • Arithmetic operations: – ADD, ADDC, SUBC, RSB, RSC • Bit-wise logical operations: – AND, EOR, ORR, BIC • Register movement operations: – MOV, MVN • Comparison operations: – TST, TEQ, CMP, CMN

Data Processing Instructions e. g. : if (z==1) R 1=R 2+(R 3*4) compiles to

Data Processing Instructions e. g. : if (z==1) R 1=R 2+(R 3*4) compiles to EQADDS R 1, R 2, R 3, LSL #2 ( SINGLE INSTRUCTION ! )

Data Transfer Instructions • Load/store instructions • Used to move signed and unsigned Word,

Data Transfer Instructions • Load/store instructions • Used to move signed and unsigned Word, Half Word and Byte to and from registers • Can be used to load PC (if target address is beyond branch instruction range) LDR Load Word STR Store Word LDRH Load Half Word STRH Store Half Word LDRSH Load Signed Half Word STRSH Store Signed Half Word LDRB Load Byte STRB Store Byte LDRSB Load Signed Byte STRSB Store Signed Byte

Block Transfer Instructions • Load / Store Multiple instructions (LDM / STM) • Whole

Block Transfer Instructions • Load / Store Multiple instructions (LDM / STM) • Whole register bank or a subset copied to memory or restored with single instruction LDM R 0 Mi Mi+1 Mi+2 R 1 R 2 Mi+14 R 15 Mi+15 STM

ARM Addressing Modes

ARM Addressing Modes

Addressing Modes • Immediate Addressing – The desired value is a binary value in

Addressing Modes • Immediate Addressing – The desired value is a binary value in the instruction • Absolute Addressing – The instruction contains the full binary address • Indirect addressing – The instruction contains the binary address of a memory location containing the binary address • Base relative addressing – Plus offset – Plus index – Plus scaled index • Stack addressing

Immediate Addressing • Used to load an immediate 8 -bit value into a register

Immediate Addressing • Used to load an immediate 8 -bit value into a register e. g. mov r 0, #0 x. FF • Used to control the operation of the barrel shifter on the 3 rd operand e. g. add r 3, r 2, r 1 LSL#3 ; r 3 : = r 2 + 8 x r 1

Absolute Addressing • To load an absolute address into a register example: start: ldr

Absolute Addressing • To load an absolute address into a register example: start: ldr r 1, =address ldr r 0, [r 1] address: . word 0 x 15000000

Indirect Addressing ldr r 0, [r 1] str r 0, [r 1] ; r

Indirect Addressing ldr r 0, [r 1] str r 0, [r 1] ; r 0: = mem 32[r 1] ; mem 32[r 1] : =r 0

Base Plus Offset Addressing ldr r 0, [r 1, #4] r 1 is not

Base Plus Offset Addressing ldr r 0, [r 1, #4] r 1 is not altered Another form is And another ldr r 0, [r 1, #4]! !==update ldr r 0, [r 1], #4 ; r 0: = mem 32[r 1+4] ; r 1 : = r 1+4 ; r 0 : = mem 32[r 1] ; r 1= r 1+4

Base Plus Index Addressing ldr r 1, =base ; load r 1 with base

Base Plus Index Addressing ldr r 1, =base ; load r 1 with base address ldr r 2, =index ; load r 2 with and index ldr r 0, [r 1, r 2] ; get data record into r 0

Base Plus Scaled Index Addressing • ldr r 1, =base ; load r 1

Base Plus Scaled Index Addressing • ldr r 1, =base ; load r 1 with base address • ldr r 2, =index ; load r 2 with and index • ldr r 0, [r 1, r 2, LSL #2] ; r 0: = mem 32[r 1+4*r 2]

Direct functionality of Block Data Transfer • When LDM / STM are not being

Direct functionality of Block Data Transfer • When LDM / STM are not being used to implement stacks, it is clearer to specify exactly what functionality of the instruction is: – i. e. specify whether to increment / decrement the base pointer, before or after the memory access. • In order to do this, LDM / STM support a further syntax in addition to the stack one: – – STMIA / LDMIA : Increment After : STMIB / LDMIB : Increment Before : STMDA / LDMDA : Decrement After : STMDB / LDMDB : Decrement Before: int *p; t = p++; ++p p---p

Example: Block Copy – Copy a block of memory, which is an exact multiple

Example: Block Copy – Copy a block of memory, which is an exact multiple of 12 words long from the location pointed to by r 12 to the location pointed to by r 13. r 14 points to the end of block to be copied. ; r 12 points to the start of the source data ; r 14 points to the end of the source data ; r 13 points to the start of the destination data loop LDMIA r 12!, {r 0 -r 11} ; load 48 bytes STMIA r 13!, {r 0 -r 11} ; and store them CMP r 12, r 14 ; check for the end BNE loop ; and loop until done – This loop transfers 48 bytes in 31 cycles – Over 50 Mbytes/sec at 33 MHz r 13 r 14 r 12 Increasing Memory

Stacks • A stack is an area of memory which grows as new data

Stacks • A stack is an area of memory which grows as new data is “pushed” onto the “top” of it, and shrinks as data is “popped” off the top. • Two pointers define the current limits of the stack. – A base pointer • used to point to the “bottom” of the stack (the first location). – A stack pointer • used to point the current “top” of the stack. PUSH {1, 2, 3} SP POP 3 2 SP BASE SP 1 BASE 2 1 BASE Result of pop = 3

Stack Operation • Traditionally, a stack grows down in memory, with the last “pushed”

Stack Operation • Traditionally, a stack grows down in memory, with the last “pushed” value at the lowest address. The ARM also supports ascending stacks, where the stack structure grows up through memory. • The value of the stack pointer can either: – Point to the last occupied address (Full stack) • and so needs pre-decrementing (i. e. before the push) – Point to the next occupied address (Empty stack) • and so needs post-decrementing (i. e. after the push) • The stack type to be used is given by the postfix to the instruction: – – STMFD / LDMFD : Full Descending stack STMFA / LDMFA : Full Ascending stack. STMED / LDMED : Empty Descending stack STMEA / LDMEA : Empty Ascending stack

Stack Examples STMED sp!, {r 0, r 1, r 3 -r 5} STMFD sp!,

Stack Examples STMED sp!, {r 0, r 1, r 3 -r 5} STMFD sp!, {r 0, r 1, r 3 -r 5} STMEA sp!, {r 0, r 1, r 3 -r 5} STMFA sp!, {r 0, r 1, r 3 -r 5} 0 x 418 SP Old SP r 5 SP r 4 r 3 r 1 r 0 r 5 r 4 r 3 r 1 r 0 Old SP r 5 r 4 r 3 r 1 r 0 SP Old SP r 5 r 4 r 3 r 1 r 0 0 x 400 SP 0 x 3 e 8

Stacks and Subroutines • • One use of stacks is to create temporary register

Stacks and Subroutines • • One use of stacks is to create temporary register workspace for subroutines. Any registers that are needed can be pushed onto the stack at the start of the subroutine and popped off again at the end so as to restore them before return to the caller : STMFD sp!, {r 0 -r 12, lr}. . . . LDMFD sp!, {r 0 -r 12, pc} ; stack all registers ; and the return address ; load all the registers ; and return automatically