ARM 1 Introduction to ARM Processor Architecture Registers


































































- Slides: 66
ARM 1
Introduction to ARM Processor: Architecture, Registers, Pipe Line, Interrupts, Architecture revisions, ARM Instructions, LPC 2148 Architecture, GPIO, Instruction Format, Processor Mode, Interrupts Vs Exceptions 2
3
4
5
What is ARM? • Advanced RISC Machine • First RISC microprocessor for commercial use • Market-leader for low-power and cost-sensitive embedded applications 6
ARM Powered Products 7
Features • Architectural simplicity • Very small implementations • Very low power consumption 8
The History of ARM • Developed at Acorn Computers Limited, of Cambridge, England , between 1983 and 1985 • Problems with CISC: • Slower memory parts • Clock cycles per instruction 9
Architecture Revisions 10
The History of ARM (2) • Solution – the Berkeley RISC I: • Competitive • Easy to develop (less than a year) • Cheap • Pointing the way to the future 11
ARM Architecture • Typical RISC architecture: • Large uniform register file • Load/store architecture • Simple addressing modes • Uniform and fixed-length instruction fields 12
ARM Architecture (2) • Enhancements: • Each instruction controls the ALU and shifter • Auto-increment and auto-decrement addressing modes • Multiple Load/Store • Conditional execution 13
ARM Architecture (3) • Results: • High performance • Low code size • Low power consumption • Low silicon area 14
15
Pin Diagram 16
ARM architecture 17
• • • • LPC 2148 Chip Features: 16 -bit/32 -bit ARM 7 TDMI-S microcontroller in a tiny LQFP 64 package. � 8 k. B to 40 k. B of on-chip static RAM and 32 k. B to 512 k. B of on-chip flash memory. � 128 -bit wide interface/accelerator enables high-speed 60 MHz operation. �In-System Programming/In-Application Programming (ISP/IAP) via on-chip boot loader software. Single flash sector or full chip erase in 400 ms and programming of 256 bytes in 1 ms. Embedded ICE RT and Embedded Trace interfaces offer real-time debugging with the on-chip Real Monitor software and high-speed tracing of instruction execution. USB 2. 0 Full-speed compliant device controller with 2 k. B of endpoint RAM. Single 10 -bit DAC provides variable analog output (LPC 2142/44/46/48 only). Two 32 -bit timers/external event counters (with four capture and four compare channels each), PWM unit (six outputs) and watchdog. Low power Real-Time Clock (RTC) with independent power and 32 k. Hz clock input. Multiple serial interfaces including two UARTs (16 C 550), two Fast I 2 C-bus (400 kbit/s), SPI and SSP with buffering and variable data length capabilities. 18
ARM Architecture (3) �Current low-end ARM core for applications like digital mobile phones �TDMI � T: Thumb, 16 -bit instruction set � D: on-chip Debug support, enabling the processor to halt in response to a debug request � M: enhanced Multiplier, yield a full 64 -bit result, high performance � I: Embedded ICE hardware �Von Neumann architecture � 3 -stage pipeline 19
Nomenclature ARM-XYZ TDMI EJF- S X - series of ARM processor Y - Support of CACHE MEMORY Z – Support of Memory management and Memory protection Unit. T – Thumb architecture Support of 16 bit instruction. D – Debugger support M – Fast Multiplier I – Embedded ICE - In Circuit Emulator E – Embedded Trace Macro-cell. J – Jazelle Instruction set / Java byte code, support to java programs F – Floating point co-processor S – Synthesizable version means the ARM is a set of software instruction engine that can be compiled on a suitable compiler.
CPU + memory address memory 200 PC data CPU 200 ADD r 5, r 1, r 3 © 2008 Wayne Wolf ADD IR r 5, r 1, r 3 Overheads for Computers as Components 2 nd ed. 21
Harvard architecture address data memory data address program memory © 2008 Wayne Wolf PC CPU data Overheads for Computers as Components 2 nd ed. 22
23
ARM architecture
Data Sizes and Instruction Sets • The ARM is a 32 -bit architecture. • When used in relation to the ARM: – Byte means 8 bits – Half word means 16 bits (two bytes) – Word means 32 bits (four bytes) • Most ARM’s implement two instruction sets – 32 -bit ARM Instruction Set – 16 -bit Thumb Instruction Set • Jazelle cores can also execute Java byte code 25
ARM Registers • 31 general-purpose 32 -bit registers • 16 visible, R 0 – R 15 • Others speed up the exception process 26
Registers • Only 16 registers are visible to a specific mode. A mode could access – A particular set of r 0 -r 12 – r 13 (sp, stack pointer) – r 14 (lr, link register) – r 15 (pc, program counter) – Current program status register (cpsr) – The uses of r 0 -r 13 are orthogonal
ARM Registers (4) System & User FIQ Supervisor Abort IRQ Undefined R 0 R 1 R 2 R 3 R 4 R 5 R 6 R 7 R 8 R 9 R 10 R 11 R 12 R 13 R 14 R 15 (PC) R 0 R 1 R 2 R 3 R 4 R 5 R 6 R 7_fiq R 8_fiq R 9_fiq R 10_fiq R 11_fiq R 12_fiq R 13_fiq R 14_fiq R 15 (PC) R 0 R 1 R 2 R 3 R 4 R 5 R 6 R 7 R 8 R 9 R 10 R 11 R 12 R 13_svc R 14_svc R 15 (PC) R 0 R 1 R 2 R 3 R 4 R 5 R 6 R 7 R 8 R 9 R 10 R 11 R 12 R 13_abt R 14_abt R 15 (PC) R 0 R 1 R 2 R 3 R 4 R 5 R 6 R 7 R 8 R 9 R 10 R 11 R 12 R 13_irq R 14_irq R 15 (PC) R 0 R 1 R 2 R 3 R 4 R 5 R 6 R 7 R 8 R 9 R 10 R 11 R 12 R 13_und R 14_und R 15 (PC) CPSR SPSR_fiq CPSR SPSR_svc CPSR SPSR_abt CPSR SPSR_irq CPSR SPSR_und 28
Program counter • Store the address of the instruction to be executed • All instructions are 32 -bit wide and wordaligned • Thus, the last two bits of pc are undefined.
Program status register (CPSR) mode bits overflow carry/borrow zero negative Thumb state FIQ disable IRQ disable
Pipeline Organization • Increases speed – most instructions executed in single cycle • Versions: – 3 -stage (ARM 7 TDMI and earlier) – 5 -stage (ARMS, ARM 9 TDMI) – 6 -stage (ARM 10 TDMI) 31
Pipeline Organization (2) • 3 -stage pipeline: Fetch – Decode - Execute • Three-cycle latency, one instruction per cycle throughput i n s t r u c t i o n i Fetch i+1 Decode Execute Fetch Decode Execute i+2 Fetch Decode Execute cycle t t+1 t+2 t+3 t+4 32
Pipeline Organization (3) • 5 -stage pipeline: – Reduces work per cycle => allows higher clock frequency – Separates data and instruction memory => reduction of CPI (average number of clock Cycles Per Instruction) • Stages: Fetch Decode Execute Buffer/data Write-back 33
34
Operating Modes • Seven operating modes: – User – Privileged: • System (version 4 and above) • FIQ • IRQ • Abort exception modes • Undefined • Supervisor 35
Processor Modes The ARM has seven basic operating modes: – User : unprivileged mode under which most tasks run – FIQ : entered when a high priority (fast) interrupt is raised – IRQ : entered when a low priority (normal) interrupt is raised – Supervisor : entered on reset and when a Software Interrupt instruction is executed – Abort : used to handle memory access violations – Undef : used to handle undefined instructions – System : privileged mode using the same registers as user mode 36
Operating Modes (2) User mode: Exception modes: – Normal program execution mode – Entered upon exception – System resources unavailable – Full access to system resources – Mode changed by exception only – Mode changed freely 37
Exceptions Exception Mode Priority IV Address Reset Supervisor 1 0 x 0000 Undefined instruction Undefined 6 0 x 00000004 Software interrupt Supervisor 6 0 x 00000008 Pre fetch Abort 5 0 x 0000000 C Data Abort 2 0 x 00000010 Interrupt IRQ 4 0 x 00000018 Fast interrupt FIQ 3 0 x 0000001 C Table 1 - Exception types, sorted by Interrupt Vector addresses 38
Instruction Set • Two instruction sets: – ARM • Standard 32 -bit instruction set – THUMB • 16 -bit compressed form • Code density better than most CISC • Dynamic decompression in pipeline 39
ARM Instruction Set • Features: – Load/Store architecture – 3 -address data processing instructions – Conditional execution – Load/Store multiple registers – Shift & ALU operation in single clock cycle 40
ARM Instruction Set (2) • Conditional execution: – Each data processing instruction prefixed by condition code – Result – smooth flow of instructions through pipeline – 16 condition codes: EQ equal MI negative HI unsigned higher NE not equal PL positive or zero LS unsigned lower LE or same signed less than or equal CS unsigned higher or same VS overflow GE signed greater than or equal AL always CC unsigned lower VC no overflow LT signed less than NV special purpose GT signed greater than 41
ARM Instruction Set (3) ARM instruction set Data processing instructions Data transfer instructions Block transfer instructions Branching instructions Multiply instructions Software interrupt instructions 42
Difference b/w ARM & THUMB 43
Data Processing Instructions • Arithmetic and logical operations • 3 -address format: – Two 32 -bit operands (op 1 is register, op 2 is register or immediate) – 32 -bit result placed in a register • Barrel shifter for op 2 allows full 32 -bit shift within instruction cycle 44
Data Processing Instructions (2) • Arithmetic operations: – ADD, ADDC, SUBC • Bit-wise logical operations: – AND, EOR, ORR • Register movement operations: – MOV, MVN • Comparison operations: – TST, TEQ, CMP, CMN 45
46
47
48
49
Data Processing Instructions (3) Conditional codes + Data processing instructions + Barrel shifter = Powerful tools for efficient coded programs 50
Data Processing Instructions (4) e. g. : if (z==1) R 1=R 2+(R 3*4) compiles to EQADDS R 1, R 2, R 3, LSL #2 ( SINGLE INSTRUCTION ! ) 51
Data Transfer Instructions • Load/store instructions • Used to move signed and unsigned Word, Half Word and Byte to and from registers • Can be used to load PC (if target address is beyond branch instruction range) LDR Load Word STR Store Word LDRH Load Half Word STRH Store Half Word LDRSH Load Signed Half Word STRSH Store Signed Half Word LDRB Load Byte STRB Store Byte LDRSB Load Signed Byte STRSB Store Signed Byte 52
Block Transfer Instructions • Load/Store Multiple instructions (LDM/STM) • Whole register bank or a subset copied to memory or restored with single instruction LDM R 0 Mi Mi+1 Mi+2 R 1 R 2 Mi+14 R 14 Mi+15 STM R 15 53
Swap Instruction • Exchanges a word between registers • Two cycles but single atomic action • Support for RT semaphores R 0 R 1 R 2 R 7 R 8 R 15 54
Modifying the Status Registers • MSR moves contents from selected GPR to CPSR/SPSR • MRS moves contents from CPSR/SPSR to selected GPR R 0 R 1 MRS R 7 CPSR SPSR MSR R 8 R 14 R 15 55
Multiply Instructions • Integer multiplication (32 -bit result) • Long integer multiplication (64 -bit result) • Built in Multiply Accumulate Unit (MAC) • Multiply and accumulate instructions add product to running total 56
Multiply Instructions • Instructions: MUL Multiply 32 -bit result MULA Multiply accumulate 32 -bit result UMULL Unsigned multiply 64 -bit result UMLAL Unsigned multiply accumulate 64 -bit result SMULL Signed multiply 64 -bit result SMLAL Signed multiply accumulate 64 -bit result 57
Software Interrupt • SWI instruction – Forces CPU into supervisor mode – Usage: SWI #n 31 28 27 Cond 24 23 Opcode 0 Ordinal Maximum 224 calls n Suitable for running privileged code and making OS calls n 58
Branching Instructions • Branch (B): jumps forwards/backwards up to 32 MB • Branch link (BL): same + saves (PC+4) in LR • Suitable for function call/return • Condition codes for conditional branches 59
Thumb Instruction Set • Compressed form of ARM – Instructions stored as 16 -bit, – Decompressed into ARM instructions and – Executed • Lower performance (ARM 40% faster) • Higher density (THUMB saves 30% space) • Optimal – “interworking” (combining two sets) – compiler supported 60
THUMB Instruction Set (2) • More traditional: – No condition codes – Two-address data processing instructions • Access to R 0 – R 8 restricted to – MOV, ADD, CMP • PUSH/POP for stack manipulation – Descending stack (SP hardwired to R 13) 61
THUMB Instruction Set (3) • No MSR and MRS, must change to ARM to modify CPSR • ARM entered automatically after RESET or entering exception mode • Maximum 255 SWI calls 62
The Next Step • New ARM Cortex family of processors – New NEON™ media and signal processing extensions – Thumb®-2 blended 16/32 -bit instruction set for performance and low power – Improved Interrupt handling 63
Summary • Adoption of ARM technology has increased efficiency and lowered costs • ARM is the world’s leading architecture today – 3 billion ARM Powered chips and counting 64
References • www. arm. com • ARM Limited ARM Architecture Reference Manual, Addison Wesley, June 2000 • Trevor Martin The Insiders Guide To The Philips ARM 7 -Based Microcontrollers, Hitex (UK) Ltd. , February 2005 • Steve Furber ARM System-On-Chip Architecture edition), Addison Wesley, March 2000 (2 nd 65
The End 66