Computer Architecture ARM Processor Why ARM As of

  • Slides: 72
Download presentation
Computer Architecture

Computer Architecture

ARM Processor

ARM Processor

Why ARM? As of 2007, about 98% of the more than one billion mobile

Why ARM? As of 2007, about 98% of the more than one billion mobile phones sold each year use at least one ARM processor. As of 2009, ARM processors account for approximately 90% of all embedded 32 -bit RISC processors source: http: //en. wikipedia. org/wiki/ARM_architecture

History ARM was developed at Acron Computer Limited of Cambridge, England between 1983 and

History ARM was developed at Acron Computer Limited of Cambridge, England between 1983 and 1985 RISC concept introduced in 1980 at Stanford and Berkley ARM Limited founded in 1990 ARM Cores Licensed to partners to develop and fabricate new micro-controllers

ARM Architecture Based upon RISC Architecture with enhancements to meet requirements of embedded applications

ARM Architecture Based upon RISC Architecture with enhancements to meet requirements of embedded applications A large uniform register file Load-store architecture, where data processing operations operate on register contents only Uniform and fixed length instructions 32 -bit processor Instructions are 32 -bit long Good Speed/Power Consumption Ratio High Code Density

Enhancement to Basic RISC Features Variable cycle execution for certain instructions load-store-multiple instructions Inline

Enhancement to Basic RISC Features Variable cycle execution for certain instructions load-store-multiple instructions Inline barrel shifter leading to more complex instructions Preprocessing one of the input registers before use Thumb 16 -bit instruction set Code density improved by 30% over 32 -bit instructions Enhanced DSP instructions Support fast 16 x 16 multiplier operations

Enhancement to Basic RISC Features Auto-increment and auto-decrement addressing modes to optimize program loops

Enhancement to Basic RISC Features Auto-increment and auto-decrement addressing modes to optimize program loops Load and Store Multiple instructions to maximize data throughput Conditional Execution of instruction to maximize execution throughput

ARM Architecture Versions Version 1 (1983 -85) 26 bit addressing, no multiply or co-processor

ARM Architecture Versions Version 1 (1983 -85) 26 bit addressing, no multiply or co-processor Version 2 Includes 32 -bit result multiply co-processor Version 3 32 bit addressing Version 4 Add signed, unsigned half-word and signed byte load and store instructions Version 4 T 16 -bit Thumb compressed form of instruction introduced

ARM Architecture Versions Version 5 T Superset of 4 T adding new instructions Version

ARM Architecture Versions Version 5 T Superset of 4 T adding new instructions Version 5 TE Add signal processing signal extension Examples: ARM 6: v 3 ARM 7: v 3, ARM 7 TDMI: v 4 T Strong. ARM: v 4 ARM 9 E-S: v 5 TE

Overview: Core Data Path Data items are placed in register file No data processing

Overview: Core Data Path Data items are placed in register file No data processing instructions directly manipulate data in memory Instructions typically use two source registers and single result or destination registers A Barrel shifter on the data path can pre-process data before it enters ALU Increment/decrement logic can update register content for sequential access independent of ALU

Basic ARM Organization

Basic ARM Organization

Registers General Purpose registers hold either data or address All registers are of 32

Registers General Purpose registers hold either data or address All registers are of 32 bits In user mode 16 data registers and 2 status registers are visible Data registers: r 0 to r 15 Three registers r 13, r 14, r 15 perform special functions r 13: stack pointer r 14: link register r 15: program counter

Registers (2) Depending upon context, registers r 13 and r 14 can also be

Registers (2) Depending upon context, registers r 13 and r 14 can also be used as GPR Any instruction which use r 0 can as well be used with any other GPR (r 1 -r 13) (Orthogonal) In addition, there are two status registers CPSR: current program status register SPSR: saved program status register

Status Registers CPSR: monitors and controls internal operations

Status Registers CPSR: monitors and controls internal operations

CPSR: Example

CPSR: Example

ARM Status Bits Every arithmetic, logical, or shifting operation sets CPSR bits: N (negative),

ARM Status Bits Every arithmetic, logical, or shifting operation sets CPSR bits: N (negative), Z (zero), C (carry), V (overflow).

Processor Modes Processor modes determine Which registers are active, and Each processor mode is

Processor Modes Processor modes determine Which registers are active, and Each processor mode is either Privileged: full read-write access to the CPSR Non-privileged: read-only access to the control field of CPSR but read-write access to the condition flags

Processor Modes (2) ARM has seven modes Privileged: abort, fast interrupt request, supervisor, system

Processor Modes (2) ARM has seven modes Privileged: abort, fast interrupt request, supervisor, system and undefined Non-privileged: user User mode is used for programs and applications

Privileged Modes Abort when there is a failed attempt to access memory Fast Interrupt

Privileged Modes Abort when there is a failed attempt to access memory Fast Interrupt Request (FIQ) & interrupt request correspond to interrupt levels available on ARM Supervisor mode state after reset and generally the mode in which OS kernel executes

Privileged Modes (2) System mode special version of user mode that allows full read-write

Privileged Modes (2) System mode special version of user mode that allows full read-write access of CPSR Undefined when processor encounters an undefined instruction

Processor Modes

Processor Modes

Processor Modes

Processor Modes

Banked Registers Register file contains in all 37 registers 20 registers are hidden from

Banked Registers Register file contains in all 37 registers 20 registers are hidden from program at different times These registers are called banked registers Banked registers are available only when the processor is in a particular mode Processor modes (other than system mode) have a set of associated banked registers that are subset of 16 registers Maps one-to-one onto a user mode register

Register Banking

Register Banking

SPSR Each privileged mode (except system mode) has associated with it, a Save Program

SPSR Each privileged mode (except system mode) has associated with it, a Save Program Status Register or SPSR This SPSR is used to save the state of CPSR (Current Program Status Register) when the privileged mode is entered in order that the user state can be fully restored when the user process is resumed

Mode Changing Mode changes by writing directly to CPSR or by hardware when the

Mode Changing Mode changes by writing directly to CPSR or by hardware when the processor responds to exception or interrupt To return to user mode a special return instruction is used that instructs the core to restore the original CPSR and banked registers

Mode Changing

Mode Changing

ARM Instruction Set

ARM Instruction Set

Instructions process data held in registers and access memory with load and store instructions

Instructions process data held in registers and access memory with load and store instructions Classes of instructions: Data processing Branch instructions Load-store instructions Software interrupt instructions Program status register instructions

Features of ARM instruction set 3 -address data processing instructions Conditional execution of every

Features of ARM instruction set 3 -address data processing instructions Conditional execution of every instruction Load and store multiple registers Shift, ALU operation in a single instruction

ARM data instructions Basic format: ADD r 0, r 1, r 2 Computes r

ARM data instructions Basic format: ADD r 0, r 1, r 2 Computes r 1+r 2, stores in r 0. Immediate operand: ADD r 0, r 1, #2 Computes r 1+2, stores in r 0.

Data Processing Manipulate data within registers MOVE instructions Arithmetic instructions Logical instructions Comparison instructions

Data Processing Manipulate data within registers MOVE instructions Arithmetic instructions Logical instructions Comparison instructions Suffix S on data processing instructions updates flags in CPSR

Data Processing Instructions Operands are 32 -bit wide; come from registers or specified as

Data Processing Instructions Operands are 32 -bit wide; come from registers or specified as literal (immediate operands) in the instruction itself Second operand sent to ALU via barrel shifter 32 -bit result placed in register; long multiply instruction produces 64 bit result

Move instruction MOV Rd, N Rd: destination register N: can be an immediate value

Move instruction MOV Rd, N Rd: destination register N: can be an immediate value or source register Example: mov r 7, r 5 MVN Rd, N Move into Rd not (inverse) of the 32 -bit value from source

Using Barrel Shifter Enables shifting 32 -bit operand in one of the source registers

Using Barrel Shifter Enables shifting 32 -bit operand in one of the source registers left or right by a specific number of positions Basic Barrel shifter operations Shift left, shift right, rotate right Facilitates fast multiply, division and increases code density Example: mov r 7, r 5, LSL # 2 Multiplies content of r 5 by 4 and puts result in r 7

Using Barrel Shifter

Using Barrel Shifter

Barrel Shift Instructions LSL, LSR : logical shift left/right fills with zeroes. ASL, ASR

Barrel Shift Instructions LSL, LSR : logical shift left/right fills with zeroes. ASL, ASR : arithmetic shift left/right fills with ones. ROR : rotate right RRX : rotate right extended with C performs 33 -bit rotate, including C bit from CPSR above sign bit.

Barrel Shift with Carry

Barrel Shift with Carry

Arithmetic Instructions Implements 32 bit addition and subtraction 3 -operand form Examples SUB r

Arithmetic Instructions Implements 32 bit addition and subtraction 3 -operand form Examples SUB r 0, r 1, r 2 Subtract value stored in r 2 from that of r 1 and store in r 0 SUBS r 1, #1 Subtract 1 from r 1 and store result in r 1 and update Z and C flags

Arithmetic Instructions ADD add SUB subtract MUL, MLA multiply (and accumulate)

Arithmetic Instructions ADD add SUB subtract MUL, MLA multiply (and accumulate)

Multiply Instructions Multiply contents of a pair of registers Long multiply generates 64 bit

Multiply Instructions Multiply contents of a pair of registers Long multiply generates 64 bit result Examples: MUL r 0, r 1, r 2 Contents of r 1 and r 2 multiplied and put in r 0 UMULL r 0, r 1, r 2, r 3 Unsigned multiply with result stored in r 0 and r 1

Multiply and Accumulate Result of multiplication can be accumulated with content of another register

Multiply and Accumulate Result of multiplication can be accumulated with content of another register MLA Rd, Rm, Rs, Rn Rd = (Rm * Rs) + Rn

Logical Instructions Bit-wise logical operations on the two source registers Operators: AND, OR, EOR

Logical Instructions Bit-wise logical operations on the two source registers Operators: AND, OR, EOR (Ex-OR), BIC (bit clear) Example: BIC r 0, r 1, r 2 contains a binary pattern where every binary 1 in r 2 clears a corresponding bit location in register r 1 Useful in manipulating status flags and interrupt masks

With Barrel Shifter Use of barrel shifter with arithmetic and logical instructions increases the

With Barrel Shifter Use of barrel shifter with arithmetic and logical instructions increases the set of possible available operations Example: ADD r 0, r 1 LSL # 1 Register r 1 is shifted to the left by 1, then it is added with r 1 and the result (3 times r 1) is stored in r 0.

Compare Instructions Enables comparison of 32 bit values Updates CPSR flags but do not

Compare Instructions Enables comparison of 32 bit values Updates CPSR flags but do not affect other registers Examples CMP r 0, r 9 Flags set as a result of r 0 – r 9 TEQ r 0, r 9 Flags set as a result r 0 ex-0 r r 9 TST r 0, r 9 Flags as a result of r 0 & r 9

Compare Instructions CMP : compare TST : bit-wise test TEQ : XOR These instructions

Compare Instructions CMP : compare TST : bit-wise test TEQ : XOR These instructions set only the NZCV bits of CPSR.

Load-Store Instructions Transfers data between memory and processor registers Single register transfer Data types

Load-Store Instructions Transfers data between memory and processor registers Single register transfer Data types supported are signed and unsigned words (32 bits), half-words, bytes Multiple-register transfer Transfer multiple registers between memory and the processor in a single instruction Swaps content of a memory location with the contents of a register

Single Transfer Instructions Load & Store data LDR, LDRH, LDRB: Load (word, half-word, byte)

Single Transfer Instructions Load & Store data LDR, LDRH, LDRB: Load (word, half-word, byte) STR, STRH, STRB Store (word, half-word, byte) Supports different addressing modes: 3 primary addressing modes Preindex with writeback, Preindex, Postindex Almost 9 derived addressing modes Immediate, Register, Scaled register, …

Addressing Modes (1) Preindex with writeback LDR r 0, [r 1, #4]! Updates the

Addressing Modes (1) Preindex with writeback LDR r 0, [r 1, #4]! Updates the address base register with new address

Addressing Modes (2) Preindex (Immediate Offset) LDR r 0, [r 1, #4] 12 -bit

Addressing Modes (2) Preindex (Immediate Offset) LDR r 0, [r 1, #4] 12 -bit offset added to the base register

Addressing Modes (3) Postindex LDR r 0, [r 1], #4 Updates the address register

Addressing Modes (3) Postindex LDR r 0, [r 1], #4 Updates the address register after address is used

Example (1) Initial: r 0 = 0 x 0000 r 1 = 0 x

Example (1) Initial: r 0 = 0 x 0000 r 1 = 0 x 00009000 mem 32 [0 x 00009000] = 0 x 0101 mem 32 [0 x 00009004] = 0 x 0202 Preindexing with writeback: LDR r 0, [r 1, #4]! r 0 = 0 x 0202 r 1 = 0 x 00009004 Preindexing: LDR r 0, [r 1, #4] r 0 = 0 x 0202 r 1 = 0 x 00009000

Example (2) Initial: r 0 = 0 x 0000 r 1 = 0 x

Example (2) Initial: r 0 = 0 x 0000 r 1 = 0 x 00009000 mem 32 [0 x 00009000] = 0 x 0101 mem 32 [0 x 00009004] = 0 x 0202 Postindexing: LDR r 0, [r 1], #4 r 0 = 0 x 0101 r 1 = 0 x 00009004

Derived Addressing Modes Register indirect: LDR r 0, [r 1] Register operation: LDR r

Derived Addressing Modes Register indirect: LDR r 0, [r 1] Register operation: LDR r 0, [r 1, -r 2] Calculated Address uses base register and another register Scaled: LDR r 0, [r 1, r 2, LSL #2] Address is calculated using the base address register and a barrel shift operation

Example: C assignments C: x = (a + b) - c; Assembler: ADR LDR

Example: C assignments C: x = (a + b) - c; Assembler: ADR LDR ADD ADR LDR r 4, a r 0, [r 4] r 4, b r 1, [r 4] r 3, r 0, r 1 r 4, c r 2[r 4] ; get address for a ; get value of a ; get address for b, reusing r 4 ; get value of b ; compute a+b ; get address for c value of c

C assignment, cont’d. SUB r 3, r 2 ; complete computation of x ADR

C assignment, cont’d. SUB r 3, r 2 ; complete computation of x ADR r 4, x ; get address for x STR r 3[r 4] ; store value of x

Example: C assignment C: y = a*(b+c); Assembler: ADR LDR ADD ADR LDR r

Example: C assignment C: y = a*(b+c); Assembler: ADR LDR ADD ADR LDR r 4, b ; get address for b r 0, [r 4] ; get value of b r 4, c ; get address for c r 1, [r 4] ; get value of c r 2, r 0, r 1 ; compute partial result r 4, a ; get address for a r 0, [r 4] ; get value of a

C assignment, cont’d. MUL r 2, r 0 ; compute final value for y

C assignment, cont’d. MUL r 2, r 0 ; compute final value for y ADR r 4, y ; get address for y STR r 2, [r 4] ; store y

Example: C assignment C: z = (a << 2) | (b & 15); Assembler:

Example: C assignment C: z = (a << 2) | (b & 15); Assembler: ADR LDR MOV ADR LDR AND ORR r 4, a ; get address for a r 0, [r 4] ; get value of a r 0, LSL 2 ; perform shift r 4, b ; get address for b r 1, [r 4] ; get value of b r 1, #15 ; perform AND r 1, r 0, r 1 ; perform OR

C assignment, cont’d. ADR r 4, z ; get address for z STR r

C assignment, cont’d. ADR r 4, z ; get address for z STR r 1, [r 4] ; store value for z

Multiple Register Transfer Load-store multiple instructions transfer multiple register contents between memory and the

Multiple Register Transfer Load-store multiple instructions transfer multiple register contents between memory and the processor in a single instruction More efficient – for moving blocks of memory and saving and restoring context and stack

Multiple Byte Load-Store Any subset of current bank of registers can be transferred to

Multiple Byte Load-Store Any subset of current bank of registers can be transferred to memory or fetched from memory LDM STM Syntax: <LDM|STM>{<cond>}<addressing mode> Rn{!}, <registers>{^} The base register Rn determines source or destination address

Address Modes (load-store multiple)

Address Modes (load-store multiple)

Load/Store Multiple Addressing

Load/Store Multiple Addressing

Control Flow Instructions Branch Instructions Conditional Branches Conditional Execution Branch and Link Instructions Subroutine

Control Flow Instructions Branch Instructions Conditional Branches Conditional Execution Branch and Link Instructions Subroutine Return Instructions

Branch Instruction Branch instruction: B label Example: B forward Address label is stored in

Branch Instruction Branch instruction: B label Example: B forward Address label is stored in the instruction as a signed pc-relative offset Conditional Branch: B<cond> label Example: BNE loop Branch has a condition associated with it and executed if condition codes have the correct value

Conditional Execution An unusual feature of ARM instruction set is that conditional execution applies

Conditional Execution An unusual feature of ARM instruction set is that conditional execution applies not only to branches but to all ARM instructions Example: ADDEQ r 0, r 1, r 2 Instruction will only be executed when the zero flag is set to 1

Thumb encodes a subset of the 32 bit instruction set into a 16 -bit

Thumb encodes a subset of the 32 bit instruction set into a 16 -bit subspace Thumb has higher performance than ARM on a processor with a 16 -bit data bus Thumb has higher code density For memory constrained embedded system On average, a Thumb implementation takes 30% less memory than the equivalent ARM implementation. (source: ARM System Developer’s Guide)

Thumb Instruction Decoding Each Thumb instruction is related to a 32 -bit ARM instruction.

Thumb Instruction Decoding Each Thumb instruction is related to a 32 -bit ARM instruction.

ARMv 5 E Extensions to facilitate signal processing operations Supports Signed multiply accumulate instruction

ARMv 5 E Extensions to facilitate signal processing operations Supports Signed multiply accumulate instruction Greater flexibility and efficiency when manipulating 16 bit values for applications such as 16 bit digital audio processing.

Summary We have studied instruction set of ARM processors We discussed the use of

Summary We have studied instruction set of ARM processors We discussed the use of barrel shifters We studied various addressing modes We have examined Thumb mode of operation