The ARM Architecture with focus on CortexM 3

  • Slides: 48
Download presentation
The ARM Architecture (with focus on Cortex-M 3) Joe Bungo Applications Engineer ARM University

The ARM Architecture (with focus on Cortex-M 3) Joe Bungo Applications Engineer ARM University Program 1

Agenda § Introduction to ARM Ltd ARM Architecture/Programmers Model Data Path and Pipelines System

Agenda § Introduction to ARM Ltd ARM Architecture/Programmers Model Data Path and Pipelines System Design Development Tools 2

ARM Ltd § Founded in November 1990 § Spun out of Acorn Computers §

ARM Ltd § Founded in November 1990 § Spun out of Acorn Computers § Initial funding from Apple, Acorn and VLSI § Designs the ARM range of RISC processor cores § Licenses ARM core designs to semiconductor partners who fabricate and sell to their customers § ARM does not fabricate silicon itself § Also develop technologies to assist with the designin of the ARM architecture § Software tools, boards, debug hardware § Application software § Bus architectures § Peripherals, etc 3

ARM’s Activities Connected Community Development Tools Software IP Processors memory System Level IP: Data

ARM’s Activities Connected Community Development Tools Software IP Processors memory System Level IP: Data Engines Fabric 3 D Graphics Physical IP 4 So. C

ARM Connected Community – 700+ 5 5

ARM Connected Community – 700+ 5 5

Huge Range of Applications Intelligent toys Utility Meters IR Fire Detector Exercise Machines Energy

Huge Range of Applications Intelligent toys Utility Meters IR Fire Detector Exercise Machines Energy Efficient Appliances Tele-parking Equipment Adopting 32 -bit ARM Microcontrollers 6 Intelligent Vending

World’s Smallest ARM Computer? Wireless Sensor Network Sensors, timers Cortex-M 0 +16 KB RAM

World’s Smallest ARM Computer? Wireless Sensor Network Sensors, timers Cortex-M 0 +16 KB RAM 65 nm UWB Radio antenna 10 k. B Storage memory ~3 f. W/bit 12µAh Li-ion Battery A B C Wirelessly networked into large scale sensor arrays Cortex-M 0; 65¢ 7 University of Michigan

World’s Largest ARM Computer? 4200 ARM powered Neutrino Detectors 70 bore holes 2. 5

World’s Largest ARM Computer? 4200 ARM powered Neutrino Detectors 70 bore holes 2. 5 km deep 60 detectors per string starting 1. 5 km down 2. 5 km 1 km 3 of active telescope Work supported by the National Science Foundation and University of Wisconsin-Madison 8

0. 35 m m From 1 mm 3 to 1 km 3 1 mm

0. 35 m m From 1 mm 3 to 1 km 3 1 mm 3 1 km 3 m 0. 7 m 1. 2 m m 2 mm 10¢ Home Mobile Computing Embedded Consumer Enterprise PC 9 $1000 Server HPC

Agenda Introduction to ARM Ltd § ARM Architecture/Programmers Model Data Path and Pipelines System

Agenda Introduction to ARM Ltd § ARM Architecture/Programmers Model Data Path and Pipelines System Design Development Tools 10

ARM Cortex Processors (v 7) § ARM Cortex-A family (v 7 -A): § Applications

ARM Cortex Processors (v 7) § ARM Cortex-A family (v 7 -A): § Applications processors for full OS and 3 rd party applications § ARM Cortex-R family (v 7 -R): § Cortex-A 15. . . 2. 5 GHz x 1 -4 Cortex-A 9 Cortex-A 8 x 1 -4 Embedded processors for real-time signal processing, control applications § ARM Cortex-M family (v 7 -M): § x 1 -4 Microcontroller-oriented processors for MCU and So. C applications Cortex-A 5 1 -2 R Heron Cortex-R 4 Cortex-M 4 Cortex™-M 3 SC 300™ Cortex-M 1 Cortex-M 0 12 k gates. . . 11

Cortex family Cortex-A 8 Cortex-R 4 Cortex-M 3 § § § 12 Architecture v

Cortex family Cortex-A 8 Cortex-R 4 Cortex-M 3 § § § 12 Architecture v 7 A MMU AXI VFP & NEON support Architecture v 7 R MPU (optional) AXI Dual Issue Architecture v 7 M MPU (optional) AHB Lite & APB

Relative Performance* 2500 Max Frequency (Mhz) 2000 1500 1000 500 0 Cortex-M 0 Max

Relative Performance* 2500 Max Frequency (Mhz) 2000 1500 1000 500 0 Cortex-M 0 Max Freq (MHz) Min Power (m. W/MHz) Cortex-M 3 ARM 7 50 184 0. 0120. 06000000010. 35 ARM 926 470 0. 235 ARM 1026 ARM 1136 ARM 1176 540 610 750 0. 36 0. 33500000010. 568 *Represents attainable speeds in 130, 90, 65, or 45 nm processes 13 Cortex-A 8 1100 0. 43 Cortex-A 9 Dual-core 2000 0. 5

Data Sizes and Instruction Sets § The ARM is a 32 -bit architecture. §

Data Sizes and Instruction Sets § The ARM is a 32 -bit architecture. § When used in relation to the ARM: § § § Byte means 8 bits Halfword means 16 bits (two bytes) Word means 32 bits (four bytes) § Most ARM’s implement two instruction sets § § 32 -bit ARM Instruction Set 16 -bit Thumb Instruction Set § Jazelle cores can also execute Java bytecode 14

ARM and Thumb Performance Dhrystone 2. 1/sec @ 20 MHz Memory width (zero wait

ARM and Thumb Performance Dhrystone 2. 1/sec @ 20 MHz Memory width (zero wait state) 15

The Thumb-2 instruction set § Variable-length instructions § § ARM instructions are a fixed

The Thumb-2 instruction set § Variable-length instructions § § ARM instructions are a fixed length of 32 bits Thumb instructions are a fixed length of 16 bits § Thumb-2 instructions can be either 16 -bit or 32 -bit § Thumb-2 gives approximately 26% improvement in code density over ARM § Thumb-2 gives approximately 25% improvement in performance over Thumb 16

Cortex-M Programmer’s Model Main § Fully programmable in C § Stack-based exception model §

Cortex-M Programmer’s Model Main § Fully programmable in C § Stack-based exception model § Only two processor modes § § Thread Mode for User tasks Handler Mode for OS tasks and exceptions § Vector table contains addresses r 0 r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 r 9 r 10 r 11 r 12 sp lr r 15 (pc) x. PSR 17 Process sp

Cortex-M 3 Processor Privilege ARM Cortex-M 3 Privileged Aborts Interrupts Reset Supervisor Handler Mode

Cortex-M 3 Processor Privilege ARM Cortex-M 3 Privileged Aborts Interrupts Reset Supervisor Handler Mode OS System Call (SVCall) Undefined Instruction User Non-Privileged Thread Mode Application code Memory Instructions & Data 18

Cortex-M 3 Interrupt Handling § § One Non-Maskable Interrupt (INTNMI) supported 1 -240 prioritizable

Cortex-M 3 Interrupt Handling § § One Non-Maskable Interrupt (INTNMI) supported 1 -240 prioritizable interrupts supported § Interrupts can be masked § § § Implementation option selects number of interrupts supported Nested Vectored Interrupt Controller (NVIC) is tightly coupled with processor core Interrupt inputs are active HIGH INTNMI NVIC … 1 -240 Interrupts INTISR[239: 0] Cortex-M 3 19 Cortex-M 3 Processor Core

Cortex-M 3 Exception Handling § § § Reset : power-on or system reset NMI

Cortex-M 3 Exception Handling § § § Reset : power-on or system reset NMI : cannot be stopped or preempted by any exception other than reset Faults § Hard Fault : default Fault or any fault unable to activate § Memory Manage : MPU violations § Bus Fault : prefetch and memory access violations § Usage Fault : undef instructions, divide by zero, etc. § § SVCall : privileged OS requests Debug Monitor : debug monitor program Pend. SV : pending SVCalls Sys. Tick Interrupt : internal sys timer, i. e. , used by RTOS to periodically check resources or peripherals § 20 External Interrupt : i. e. , external peripherals

Cortex-M 3 Program Status Register 31 28 27 26 25 24 N Z C

Cortex-M 3 Program Status Register 31 28 27 26 25 24 N Z C V Q § IT/ICI 7 0 ISR Number APSR - Application Program Status Register – ALU flags IPSR - Interrupt Program Status Register – Interrupt/Exception No. EPSR - Execution Program Status Register § IT field – If/Then block information § ICI field – Interruptible-Continuable Instruction information x. PSR § § 21 10 16 15 One Status Register consisting of § § IT T 23 Composite of the 3 PSRs Stored on the stack on exception entry

Conditional Execution § If – Then (IT) instruction added (16 bit) § Up to

Conditional Execution § If – Then (IT) instruction added (16 bit) § Up to 3 additional “then” or “else” conditions maybe specified (T or E) § Makes up to 4 following instructions conditional ITTET EQ Inst 1 Inst 2 Inst 3 Inst 4 § § § 22 MOVEQ ADDEQ SUBNE ORREQ Any normal ARM condition code can be used 16 -bit instructions in block do not affect condition code flags § Apart from comparison instruction § 32 bit instructions may affect flags (normal rules apply) Current “if-then status” stored in CPSR § Conditional block maybe safely interrupted and returned to § Must NOT branch into or out of ‘if-then’ block

Classes of Instructions (v 4 T) Load/Store Miscellaneous Data Operations Change of Flow MOV

Classes of Instructions (v 4 T) Load/Store Miscellaneous Data Operations Change of Flow MOV Bcc BL BLX 23 PC, Rm

Data processing Instructions § Consist of : § Arithmetic: § Logical: § Comparisons: §

Data processing Instructions § Consist of : § Arithmetic: § Logical: § Comparisons: § Data movement: ADD AND CMP MOV ADC ORR CMN MVN SUB EOR TST SBC BIC TEQ RSB § These instructions only work on registers, NOT memory. § Syntax: <Operation>{<cond>}{S} Rd, Rn, Operand 2 § § § 24 Comparisons set flags only - they do not specify Rd Data movement does not specify Rn Second operand is sent to the ALU via barrel shifter. RSC

Using a Barrel Shifter: The 2 nd Operand 1 Operand 2 Register, optionally with

Using a Barrel Shifter: The 2 nd Operand 1 Operand 2 Register, optionally with shift operation § Shift value can be either be: § § Barrel Shifter § 5 bit unsigned integer Specified in bottom byte of another register. Used for multiplication by constant Immediate value § ALU § Result 25 8 bit number, with a range of 0 -255. § Rotated right through even number of positions Allows increased range of 32 -bit constants to be loaded directly into registers

Single register data transfer Word LDR STR LDRB STRB Byte LDRH STRH Halfword LDRSB

Single register data transfer Word LDR STR LDRB STRB Byte LDRH STRH Halfword LDRSB LDRSH Signed byte load Signed halfword load § Memory system must support all access sizes § Syntax: § § LDR{<cond>}{<size>} Rd, <address> STR{<cond>}{<size>} Rd, <address> e. g. LDREQB 26

Agenda Introduction to ARM Ltd ARM Architecture/Programmers Model § Data Path and Pipelines System

Agenda Introduction to ARM Ltd ARM Architecture/Programmers Model § Data Path and Pipelines System Design Development Tools 27

Cortex-M 3 Datapath I_HRDATA Instruction Decode Write Data Register Address Incrementer D_HADDR Read Data

Cortex-M 3 Datapath I_HRDATA Instruction Decode Write Data Register Address Incrementer D_HADDR Read Data Register Address Register D_HWDATA D_HRDATA B Address Incrementer Register Bank Mul/Div ALU I_HADDR A Address Register Writeback INTADDR 28 Barrel Shifter ALU

Cortex-M 3 Pipeline § Cortex-M 3 has 3 -stage fetch-decode-execute pipeline § § Similar

Cortex-M 3 Pipeline § Cortex-M 3 has 3 -stage fetch-decode-execute pipeline § § Similar to ARM 7 Cortex-M 3 does more in each stage to increase overall performance 1 st Stage - Fetch 2 nd Stage - Decode AGU Fetch (Prefetch) 3 rd Stage - Execute Address Phase & Write Back Instruction Decode & Register Read Branch forwarding & speculation Branch Execute stage branch (ALU branch & Load Store Branch) 29 Data Phase Load/Store & Branch Multiply & Divide Shift ALU & Branch Write

ARM 10 vs. ARM 11 Pipelines ARM 10 Branch Prediction Instruction Fetch FETCH ARM

ARM 10 vs. ARM 11 Pipelines ARM 10 Branch Prediction Instruction Fetch FETCH ARM or Thumb Instruction Decode ISSUE Reg Read DECODE Shift + ALU Memory Access Multiply Add EXECUTE MEMORY ARM 11 Fetch 1 30 Fetch 2 Decode Issue Shift ALU Saturate MAC 1 MAC 2 MAC 3 Address Data Cache 1 Data Cache 2 Write back Reg Write WRITE

Full Cortex-A 8 Pipeline Diagram 13 -Stage Integer Pipeline 31 10 -Stage NEON Pipeline

Full Cortex-A 8 Pipeline Diagram 13 -Stage Integer Pipeline 31 10 -Stage NEON Pipeline

Agenda Introduction to ARM Ltd ARM Architecture/Programmers Model Data Path and Pipelines § System

Agenda Introduction to ARM Ltd ARM Architecture/Programmers Model Data Path and Pipelines § System Design Development Tools 32

An Example AMBA System High Performance ARM processor High Bandwidth External Memory Interface AHB

An Example AMBA System High Performance ARM processor High Bandwidth External Memory Interface AHB High-bandwidth on-chip RAM DMA Bus Master High Performance Pipelined Burst Support Multiple Bus Masters 33 APB Bridge UART Timer Keypad PIO Low Power Non-pipelined Simple Interface

Agenda Introduction to ARM Ltd ARM Architecture/Programmers Model Data Path and Pipelines System Design

Agenda Introduction to ARM Ltd ARM Architecture/Programmers Model Data Path and Pipelines System Design § Development Tools 34

ARM Debug Architecture Ethernet Debugger (+ optional trace tools) § § Embedded. ICE Logic

ARM Debug Architecture Ethernet Debugger (+ optional trace tools) § § Embedded. ICE Logic § Provides breakpoints and processor/system access JTAG interface (ICE) § Converts debugger commands to JTAG signals Embedded trace Macrocell (ETM) § Compresses real-time instruction and data access trace § Contains ICE features (trigger & filter logic) Trace port analyzer (TPA) § Captures trace in a deep buffer 35 Trace Port JTAG port TAP controller ETM Embedded. ICE Logic ARM core

Keil Development Tools for ARM § § Includes ARM macro assembler, compilers (ARM Real.

Keil Development Tools for ARM § § Includes ARM macro assembler, compilers (ARM Real. View C/C++ Compiler, Keil CARM Compiler, or GNU compiler), ARM linker, Keil u. Vision Debugger and Keil u. Vision IDE Keil u. Vision Debugger accurately simulates on-chip peripherals (I 2 C, CAN, UART, SPI, Interrupts, I/O Ports, A/D and D/A converters, PWM, etc. ) Evaluation Limitations § 16 K byte object code + 16 K data limitation § Some linker restrictions such as base addresses for code/constants § GNU tools provided are not restricted in any way http: //www. keil. com/demo/ 36

Keil Development Tools for ARM 37

Keil Development Tools for ARM 37

University Resources § http: //www. arm. com/support/university/ § University@arm. com 38

University Resources § http: //www. arm. com/support/university/ § University@arm. com 38

Your Future at ARM… § Graduate and Internship/Co-op Opportunities § § Corporate: IT, Patents,

Your Future at ARM… § Graduate and Internship/Co-op Opportunities § § Corporate: IT, Patents, Services (Training and Support), and Human Resources Incredible Culture and Comprehensive Benefit Package § § § Engineering: Memory, Validation, Performance, DFT, R&D, GPU and more! Sales and Marketing: Corporate and Technical Competitive Reward Work/Life Balance Personal Development Brilliant Minds and Innovative Solutions Keep in Touch! § 39 www. arm. com/about/careers

TI Panda Board OMAP 4430 Processor § 1 GHz Dual-core ARM Cortex-A 9 (NEON+VFP)

TI Panda Board OMAP 4430 Processor § 1 GHz Dual-core ARM Cortex-A 9 (NEON+VFP) § C 64 x+ DSP § Power. VR SGX 3 D GPU § 1080 p Video Support POP Memory § 1 GB LPDDR 2 RAM USB Powered § < 4 W max consumption (OMAP small % of that) § Many adapter options (Car, wall, battery, solar, . . ) 40

Project Ideas Using Panda § OS Projects § § OS porting to ARM/Cortex (TI

Project Ideas Using Panda § OS Projects § § OS porting to ARM/Cortex (TI OMAP) Myth. TV system “Super-Panda” – stack of Pandas as compute engine and task distribution Linux applications § NEON Optimization Projects § § § 41 Codec optimization in ffmpeg (pick your favorite codec) Voice and image recognition Open-source Flash player optimizations (swfdec)

Fin 42

Fin 42

Nokia N 95 Multimedia Computer OMAP™ 2420 Applications Processor ARM 1136™ processor-based So. C,

Nokia N 95 Multimedia Computer OMAP™ 2420 Applications Processor ARM 1136™ processor-based So. C, developed using Magma ® Blast® family and winner of 2005 INSIGHT Award for ‘Most Innovative So. C’ Symbian OS™ v 9. 2 Operating System supporting ARM processor-based mobile devices, developed using ARM® Real. View® Compilation Tools S 60™ 3 rd Edition S 60 Platform supporting ARM processor-based mobile devices Mobiclip™ Video Codec Software video codec for ARM processor-based mobile devices ST WLAN Solution Ultra-low power 802. 11 b/g WLAN chip with ARM 9™ processor-based MAC Connect. Collaborate. Create. 43

Beagle Board 44

Beagle Board 44

Targeting community development $149 > 1000 participants and growing Active & technical community Open

Targeting community development $149 > 1000 participants and growing Active & technical community Open access to hardware documentation Opportunity to tinker and learn 45 Personally affordable Wikis, blogs, promotion of community activity Freedom to innovate Addressing open source community needs Instant access to >10 million lines of code Free software

Fast, low power, flexible expansion OMAP 3530 Processor § 600 MHz Cortex-A 8 §

Fast, low power, flexible expansion OMAP 3530 Processor § 600 MHz Cortex-A 8 § NEON+VFPv 3 § 16 KB/16 KB L 1$ § 256 KB L 2$ § 430 MHz C 64 x+ DSP § 32 K/32 K L 1$ § 48 K L 1 D § 32 K L 2 § Power. VR SGX GPU § 64 K on-chip RAM POP Memory § 128 MB LPDDR RAM § 256 MB NAND flash 46 3” Peripheral I/O § DVI-D video out § SD/MMC+ § S-Video out § USB 2. 0 HS OTG § I 2 C, I 2 S, SPI, MMC/SD § JTAG § Stereo in/out § Alternate power § RS-232 serial USB Powered § 2 W maximum consumption § OMAP is small % of that § Many adapter options § Car, wall, battery, solar, …

And more… Other Features § 4 LEDs § USR 0 § USR 1 §

And more… Other Features § 4 LEDs § USR 0 § USR 1 § PMU_STAT § PWR § 2 buttons § USER § RESET § 4 boot sources § SD/MMC § NAND flash § USB § Serial 47 On-going collaboration at Beagle. Board. org § Live chat via IRC for 24/7 community support § Links to software projects to download 3” Peripheral I/O § DVI-D video out § SD/MMC+ § S-Video out § USB HS OTG § I 2 C, I 2 S, SPI, MMC/SD § JTAG § Stereo in/out § Alternate power § RS-232 serial

Project Ideas Using Beagle § OS Projects § § OS porting to ARM/Cortex (TI

Project Ideas Using Beagle § OS Projects § § OS porting to ARM/Cortex (TI OMAP) Myth. TV system “Super-Beagle” – stack of Beagles as compute engine and task distribution Linux applications § NEON Optimization Projects § § § 48 Codec optimization in ffmpeg (pick your favorite codec) Voice and image recognition Open-source Flash player optimizations (swfdec)