ARM 1 Introduction to ARM Processor Architecture Registers

  • Slides: 66
Download presentation
ARM 1

ARM 1

Introduction to ARM Processor: Architecture, Registers, Pipe Line, Interrupts, Architecture revisions, ARM Instructions, LPC

Introduction to ARM Processor: Architecture, Registers, Pipe Line, Interrupts, Architecture revisions, ARM Instructions, LPC 2148 Architecture, GPIO, Instruction Format, Processor Mode, Interrupts Vs Exceptions 2

3

3

4

4

5

5

What is ARM? • Advanced RISC Machine • First RISC microprocessor for commercial use

What is ARM? • Advanced RISC Machine • First RISC microprocessor for commercial use • Market-leader for low-power and cost-sensitive embedded applications 6

ARM Powered Products 7

ARM Powered Products 7

Features • Architectural simplicity • Very small implementations • Very low power consumption 8

Features • Architectural simplicity • Very small implementations • Very low power consumption 8

The History of ARM • Developed at Acorn Computers Limited, of Cambridge, England ,

The History of ARM • Developed at Acorn Computers Limited, of Cambridge, England , between 1983 and 1985 • Problems with CISC: • Slower memory parts • Clock cycles per instruction 9

Architecture Revisions 10

Architecture Revisions 10

The History of ARM (2) • Solution – the Berkeley RISC I: • Competitive

The History of ARM (2) • Solution – the Berkeley RISC I: • Competitive • Easy to develop (less than a year) • Cheap • Pointing the way to the future 11

ARM Architecture • Typical RISC architecture: • Large uniform register file • Load/store architecture

ARM Architecture • Typical RISC architecture: • Large uniform register file • Load/store architecture • Simple addressing modes • Uniform and fixed-length instruction fields 12

ARM Architecture (2) • Enhancements: • Each instruction controls the ALU and shifter •

ARM Architecture (2) • Enhancements: • Each instruction controls the ALU and shifter • Auto-increment and auto-decrement addressing modes • Multiple Load/Store • Conditional execution 13

ARM Architecture (3) • Results: • High performance • Low code size • Low

ARM Architecture (3) • Results: • High performance • Low code size • Low power consumption • Low silicon area 14

15

15

Pin Diagram 16

Pin Diagram 16

ARM architecture 17

ARM architecture 17

 • • • • LPC 2148 Chip Features: 16 -bit/32 -bit ARM 7

• • • • LPC 2148 Chip Features: 16 -bit/32 -bit ARM 7 TDMI-S microcontroller in a tiny LQFP 64 package. � 8 k. B to 40 k. B of on-chip static RAM and 32 k. B to 512 k. B of on-chip flash memory. � 128 -bit wide interface/accelerator enables high-speed 60 MHz operation. �In-System Programming/In-Application Programming (ISP/IAP) via on-chip boot loader software. Single flash sector or full chip erase in 400 ms and programming of 256 bytes in 1 ms. Embedded ICE RT and Embedded Trace interfaces offer real-time debugging with the on-chip Real Monitor software and high-speed tracing of instruction execution. USB 2. 0 Full-speed compliant device controller with 2 k. B of endpoint RAM. Single 10 -bit DAC provides variable analog output (LPC 2142/44/46/48 only). Two 32 -bit timers/external event counters (with four capture and four compare channels each), PWM unit (six outputs) and watchdog. Low power Real-Time Clock (RTC) with independent power and 32 k. Hz clock input. Multiple serial interfaces including two UARTs (16 C 550), two Fast I 2 C-bus (400 kbit/s), SPI and SSP with buffering and variable data length capabilities. 18

ARM Architecture (3) �Current low-end ARM core for applications like digital mobile phones �TDMI

ARM Architecture (3) �Current low-end ARM core for applications like digital mobile phones �TDMI � T: Thumb, 16 -bit instruction set � D: on-chip Debug support, enabling the processor to halt in response to a debug request � M: enhanced Multiplier, yield a full 64 -bit result, high performance � I: Embedded ICE hardware �Von Neumann architecture � 3 -stage pipeline 19

Nomenclature ARM-XYZ TDMI EJF- S X - series of ARM processor Y - Support

Nomenclature ARM-XYZ TDMI EJF- S X - series of ARM processor Y - Support of CACHE MEMORY Z – Support of Memory management and Memory protection Unit. T – Thumb architecture Support of 16 bit instruction. D – Debugger support M – Fast Multiplier I – Embedded ICE - In Circuit Emulator E – Embedded Trace Macro-cell. J – Jazelle Instruction set / Java byte code, support to java programs F – Floating point co-processor S – Synthesizable version means the ARM is a set of software instruction engine that can be compiled on a suitable compiler.

CPU + memory address memory 200 PC data CPU 200 ADD r 5, r

CPU + memory address memory 200 PC data CPU 200 ADD r 5, r 1, r 3 © 2008 Wayne Wolf ADD IR r 5, r 1, r 3 Overheads for Computers as Components 2 nd ed. 21

Harvard architecture address data memory data address program memory © 2008 Wayne Wolf PC

Harvard architecture address data memory data address program memory © 2008 Wayne Wolf PC CPU data Overheads for Computers as Components 2 nd ed. 22

23

23

ARM architecture

ARM architecture

Data Sizes and Instruction Sets • The ARM is a 32 -bit architecture. •

Data Sizes and Instruction Sets • The ARM is a 32 -bit architecture. • When used in relation to the ARM: – Byte means 8 bits – Half word means 16 bits (two bytes) – Word means 32 bits (four bytes) • Most ARM’s implement two instruction sets – 32 -bit ARM Instruction Set – 16 -bit Thumb Instruction Set • Jazelle cores can also execute Java byte code 25

ARM Registers • 31 general-purpose 32 -bit registers • 16 visible, R 0 –

ARM Registers • 31 general-purpose 32 -bit registers • 16 visible, R 0 – R 15 • Others speed up the exception process 26

Registers • Only 16 registers are visible to a specific mode. A mode could

Registers • Only 16 registers are visible to a specific mode. A mode could access – A particular set of r 0 -r 12 – r 13 (sp, stack pointer) – r 14 (lr, link register) – r 15 (pc, program counter) – Current program status register (cpsr) – The uses of r 0 -r 13 are orthogonal

ARM Registers (4) System & User FIQ Supervisor Abort IRQ Undefined R 0 R

ARM Registers (4) System & User FIQ Supervisor Abort IRQ Undefined R 0 R 1 R 2 R 3 R 4 R 5 R 6 R 7 R 8 R 9 R 10 R 11 R 12 R 13 R 14 R 15 (PC) R 0 R 1 R 2 R 3 R 4 R 5 R 6 R 7_fiq R 8_fiq R 9_fiq R 10_fiq R 11_fiq R 12_fiq R 13_fiq R 14_fiq R 15 (PC) R 0 R 1 R 2 R 3 R 4 R 5 R 6 R 7 R 8 R 9 R 10 R 11 R 12 R 13_svc R 14_svc R 15 (PC) R 0 R 1 R 2 R 3 R 4 R 5 R 6 R 7 R 8 R 9 R 10 R 11 R 12 R 13_abt R 14_abt R 15 (PC) R 0 R 1 R 2 R 3 R 4 R 5 R 6 R 7 R 8 R 9 R 10 R 11 R 12 R 13_irq R 14_irq R 15 (PC) R 0 R 1 R 2 R 3 R 4 R 5 R 6 R 7 R 8 R 9 R 10 R 11 R 12 R 13_und R 14_und R 15 (PC) CPSR SPSR_fiq CPSR SPSR_svc CPSR SPSR_abt CPSR SPSR_irq CPSR SPSR_und 28

Program counter • Store the address of the instruction to be executed • All

Program counter • Store the address of the instruction to be executed • All instructions are 32 -bit wide and wordaligned • Thus, the last two bits of pc are undefined.

Program status register (CPSR) mode bits overflow carry/borrow zero negative Thumb state FIQ disable

Program status register (CPSR) mode bits overflow carry/borrow zero negative Thumb state FIQ disable IRQ disable

Pipeline Organization • Increases speed – most instructions executed in single cycle • Versions:

Pipeline Organization • Increases speed – most instructions executed in single cycle • Versions: – 3 -stage (ARM 7 TDMI and earlier) – 5 -stage (ARMS, ARM 9 TDMI) – 6 -stage (ARM 10 TDMI) 31

Pipeline Organization (2) • 3 -stage pipeline: Fetch – Decode - Execute • Three-cycle

Pipeline Organization (2) • 3 -stage pipeline: Fetch – Decode - Execute • Three-cycle latency, one instruction per cycle throughput i n s t r u c t i o n i Fetch i+1 Decode Execute Fetch Decode Execute i+2 Fetch Decode Execute cycle t t+1 t+2 t+3 t+4 32

Pipeline Organization (3) • 5 -stage pipeline: – Reduces work per cycle => allows

Pipeline Organization (3) • 5 -stage pipeline: – Reduces work per cycle => allows higher clock frequency – Separates data and instruction memory => reduction of CPI (average number of clock Cycles Per Instruction) • Stages: Fetch Decode Execute Buffer/data Write-back 33

34

34

Operating Modes • Seven operating modes: – User – Privileged: • System (version 4

Operating Modes • Seven operating modes: – User – Privileged: • System (version 4 and above) • FIQ • IRQ • Abort exception modes • Undefined • Supervisor 35

Processor Modes The ARM has seven basic operating modes: – User : unprivileged mode

Processor Modes The ARM has seven basic operating modes: – User : unprivileged mode under which most tasks run – FIQ : entered when a high priority (fast) interrupt is raised – IRQ : entered when a low priority (normal) interrupt is raised – Supervisor : entered on reset and when a Software Interrupt instruction is executed – Abort : used to handle memory access violations – Undef : used to handle undefined instructions – System : privileged mode using the same registers as user mode 36

Operating Modes (2) User mode: Exception modes: – Normal program execution mode – Entered

Operating Modes (2) User mode: Exception modes: – Normal program execution mode – Entered upon exception – System resources unavailable – Full access to system resources – Mode changed by exception only – Mode changed freely 37

Exceptions Exception Mode Priority IV Address Reset Supervisor 1 0 x 0000 Undefined instruction

Exceptions Exception Mode Priority IV Address Reset Supervisor 1 0 x 0000 Undefined instruction Undefined 6 0 x 00000004 Software interrupt Supervisor 6 0 x 00000008 Pre fetch Abort 5 0 x 0000000 C Data Abort 2 0 x 00000010 Interrupt IRQ 4 0 x 00000018 Fast interrupt FIQ 3 0 x 0000001 C Table 1 - Exception types, sorted by Interrupt Vector addresses 38

Instruction Set • Two instruction sets: – ARM • Standard 32 -bit instruction set

Instruction Set • Two instruction sets: – ARM • Standard 32 -bit instruction set – THUMB • 16 -bit compressed form • Code density better than most CISC • Dynamic decompression in pipeline 39

ARM Instruction Set • Features: – Load/Store architecture – 3 -address data processing instructions

ARM Instruction Set • Features: – Load/Store architecture – 3 -address data processing instructions – Conditional execution – Load/Store multiple registers – Shift & ALU operation in single clock cycle 40

ARM Instruction Set (2) • Conditional execution: – Each data processing instruction prefixed by

ARM Instruction Set (2) • Conditional execution: – Each data processing instruction prefixed by condition code – Result – smooth flow of instructions through pipeline – 16 condition codes: EQ equal MI negative HI unsigned higher NE not equal PL positive or zero LS unsigned lower LE or same signed less than or equal CS unsigned higher or same VS overflow GE signed greater than or equal AL always CC unsigned lower VC no overflow LT signed less than NV special purpose GT signed greater than 41

ARM Instruction Set (3) ARM instruction set Data processing instructions Data transfer instructions Block

ARM Instruction Set (3) ARM instruction set Data processing instructions Data transfer instructions Block transfer instructions Branching instructions Multiply instructions Software interrupt instructions 42

Difference b/w ARM & THUMB 43

Difference b/w ARM & THUMB 43

Data Processing Instructions • Arithmetic and logical operations • 3 -address format: – Two

Data Processing Instructions • Arithmetic and logical operations • 3 -address format: – Two 32 -bit operands (op 1 is register, op 2 is register or immediate) – 32 -bit result placed in a register • Barrel shifter for op 2 allows full 32 -bit shift within instruction cycle 44

Data Processing Instructions (2) • Arithmetic operations: – ADD, ADDC, SUBC • Bit-wise logical

Data Processing Instructions (2) • Arithmetic operations: – ADD, ADDC, SUBC • Bit-wise logical operations: – AND, EOR, ORR • Register movement operations: – MOV, MVN • Comparison operations: – TST, TEQ, CMP, CMN 45

46

46

47

47

48

48

49

49

Data Processing Instructions (3) Conditional codes + Data processing instructions + Barrel shifter =

Data Processing Instructions (3) Conditional codes + Data processing instructions + Barrel shifter = Powerful tools for efficient coded programs 50

Data Processing Instructions (4) e. g. : if (z==1) R 1=R 2+(R 3*4) compiles

Data Processing Instructions (4) e. g. : if (z==1) R 1=R 2+(R 3*4) compiles to EQADDS R 1, R 2, R 3, LSL #2 ( SINGLE INSTRUCTION ! ) 51

Data Transfer Instructions • Load/store instructions • Used to move signed and unsigned Word,

Data Transfer Instructions • Load/store instructions • Used to move signed and unsigned Word, Half Word and Byte to and from registers • Can be used to load PC (if target address is beyond branch instruction range) LDR Load Word STR Store Word LDRH Load Half Word STRH Store Half Word LDRSH Load Signed Half Word STRSH Store Signed Half Word LDRB Load Byte STRB Store Byte LDRSB Load Signed Byte STRSB Store Signed Byte 52

Block Transfer Instructions • Load/Store Multiple instructions (LDM/STM) • Whole register bank or a

Block Transfer Instructions • Load/Store Multiple instructions (LDM/STM) • Whole register bank or a subset copied to memory or restored with single instruction LDM R 0 Mi Mi+1 Mi+2 R 1 R 2 Mi+14 R 14 Mi+15 STM R 15 53

Swap Instruction • Exchanges a word between registers • Two cycles but single atomic

Swap Instruction • Exchanges a word between registers • Two cycles but single atomic action • Support for RT semaphores R 0 R 1 R 2 R 7 R 8 R 15 54

Modifying the Status Registers • MSR moves contents from selected GPR to CPSR/SPSR •

Modifying the Status Registers • MSR moves contents from selected GPR to CPSR/SPSR • MRS moves contents from CPSR/SPSR to selected GPR R 0 R 1 MRS R 7 CPSR SPSR MSR R 8 R 14 R 15 55

Multiply Instructions • Integer multiplication (32 -bit result) • Long integer multiplication (64 -bit

Multiply Instructions • Integer multiplication (32 -bit result) • Long integer multiplication (64 -bit result) • Built in Multiply Accumulate Unit (MAC) • Multiply and accumulate instructions add product to running total 56

Multiply Instructions • Instructions: MUL Multiply 32 -bit result MULA Multiply accumulate 32 -bit

Multiply Instructions • Instructions: MUL Multiply 32 -bit result MULA Multiply accumulate 32 -bit result UMULL Unsigned multiply 64 -bit result UMLAL Unsigned multiply accumulate 64 -bit result SMULL Signed multiply 64 -bit result SMLAL Signed multiply accumulate 64 -bit result 57

Software Interrupt • SWI instruction – Forces CPU into supervisor mode – Usage: SWI

Software Interrupt • SWI instruction – Forces CPU into supervisor mode – Usage: SWI #n 31 28 27 Cond 24 23 Opcode 0 Ordinal Maximum 224 calls n Suitable for running privileged code and making OS calls n 58

Branching Instructions • Branch (B): jumps forwards/backwards up to 32 MB • Branch link

Branching Instructions • Branch (B): jumps forwards/backwards up to 32 MB • Branch link (BL): same + saves (PC+4) in LR • Suitable for function call/return • Condition codes for conditional branches 59

Thumb Instruction Set • Compressed form of ARM – Instructions stored as 16 -bit,

Thumb Instruction Set • Compressed form of ARM – Instructions stored as 16 -bit, – Decompressed into ARM instructions and – Executed • Lower performance (ARM 40% faster) • Higher density (THUMB saves 30% space) • Optimal – “interworking” (combining two sets) – compiler supported 60

THUMB Instruction Set (2) • More traditional: – No condition codes – Two-address data

THUMB Instruction Set (2) • More traditional: – No condition codes – Two-address data processing instructions • Access to R 0 – R 8 restricted to – MOV, ADD, CMP • PUSH/POP for stack manipulation – Descending stack (SP hardwired to R 13) 61

THUMB Instruction Set (3) • No MSR and MRS, must change to ARM to

THUMB Instruction Set (3) • No MSR and MRS, must change to ARM to modify CPSR • ARM entered automatically after RESET or entering exception mode • Maximum 255 SWI calls 62

The Next Step • New ARM Cortex family of processors – New NEON™ media

The Next Step • New ARM Cortex family of processors – New NEON™ media and signal processing extensions – Thumb®-2 blended 16/32 -bit instruction set for performance and low power – Improved Interrupt handling 63

Summary • Adoption of ARM technology has increased efficiency and lowered costs • ARM

Summary • Adoption of ARM technology has increased efficiency and lowered costs • ARM is the world’s leading architecture today – 3 billion ARM Powered chips and counting 64

References • www. arm. com • ARM Limited ARM Architecture Reference Manual, Addison Wesley,

References • www. arm. com • ARM Limited ARM Architecture Reference Manual, Addison Wesley, June 2000 • Trevor Martin The Insiders Guide To The Philips ARM 7 -Based Microcontrollers, Hitex (UK) Ltd. , February 2005 • Steve Furber ARM System-On-Chip Architecture edition), Addison Wesley, March 2000 (2 nd 65

The End 66

The End 66