The ARM Architecture with focus on CortexM 3
















































- Slides: 48
The ARM Architecture (with focus on Cortex-M 3) Joe Bungo Applications Engineer ARM University Program 1
Agenda § Introduction to ARM Ltd ARM Architecture/Programmers Model Data Path and Pipelines System Design Development Tools 2
ARM Ltd § Founded in November 1990 § Spun out of Acorn Computers § Initial funding from Apple, Acorn and VLSI § Designs the ARM range of RISC processor cores § Licenses ARM core designs to semiconductor partners who fabricate and sell to their customers § ARM does not fabricate silicon itself § Also develop technologies to assist with the designin of the ARM architecture § Software tools, boards, debug hardware § Application software § Bus architectures § Peripherals, etc 3
ARM’s Activities Connected Community Development Tools Software IP Processors memory System Level IP: Data Engines Fabric 3 D Graphics Physical IP 4 So. C
ARM Connected Community – 700+ 5 5
Huge Range of Applications Intelligent toys Utility Meters IR Fire Detector Exercise Machines Energy Efficient Appliances Tele-parking Equipment Adopting 32 -bit ARM Microcontrollers 6 Intelligent Vending
World’s Smallest ARM Computer? Wireless Sensor Network Sensors, timers Cortex-M 0 +16 KB RAM 65 nm UWB Radio antenna 10 k. B Storage memory ~3 f. W/bit 12µAh Li-ion Battery A B C Wirelessly networked into large scale sensor arrays Cortex-M 0; 65¢ 7 University of Michigan
World’s Largest ARM Computer? 4200 ARM powered Neutrino Detectors 70 bore holes 2. 5 km deep 60 detectors per string starting 1. 5 km down 2. 5 km 1 km 3 of active telescope Work supported by the National Science Foundation and University of Wisconsin-Madison 8
0. 35 m m From 1 mm 3 to 1 km 3 1 mm 3 1 km 3 m 0. 7 m 1. 2 m m 2 mm 10¢ Home Mobile Computing Embedded Consumer Enterprise PC 9 $1000 Server HPC
Agenda Introduction to ARM Ltd § ARM Architecture/Programmers Model Data Path and Pipelines System Design Development Tools 10
ARM Cortex Processors (v 7) § ARM Cortex-A family (v 7 -A): § Applications processors for full OS and 3 rd party applications § ARM Cortex-R family (v 7 -R): § Cortex-A 15. . . 2. 5 GHz x 1 -4 Cortex-A 9 Cortex-A 8 x 1 -4 Embedded processors for real-time signal processing, control applications § ARM Cortex-M family (v 7 -M): § x 1 -4 Microcontroller-oriented processors for MCU and So. C applications Cortex-A 5 1 -2 R Heron Cortex-R 4 Cortex-M 4 Cortex™-M 3 SC 300™ Cortex-M 1 Cortex-M 0 12 k gates. . . 11
Cortex family Cortex-A 8 Cortex-R 4 Cortex-M 3 § § § 12 Architecture v 7 A MMU AXI VFP & NEON support Architecture v 7 R MPU (optional) AXI Dual Issue Architecture v 7 M MPU (optional) AHB Lite & APB
Relative Performance* 2500 Max Frequency (Mhz) 2000 1500 1000 500 0 Cortex-M 0 Max Freq (MHz) Min Power (m. W/MHz) Cortex-M 3 ARM 7 50 184 0. 0120. 06000000010. 35 ARM 926 470 0. 235 ARM 1026 ARM 1136 ARM 1176 540 610 750 0. 36 0. 33500000010. 568 *Represents attainable speeds in 130, 90, 65, or 45 nm processes 13 Cortex-A 8 1100 0. 43 Cortex-A 9 Dual-core 2000 0. 5
Data Sizes and Instruction Sets § The ARM is a 32 -bit architecture. § When used in relation to the ARM: § § § Byte means 8 bits Halfword means 16 bits (two bytes) Word means 32 bits (four bytes) § Most ARM’s implement two instruction sets § § 32 -bit ARM Instruction Set 16 -bit Thumb Instruction Set § Jazelle cores can also execute Java bytecode 14
ARM and Thumb Performance Dhrystone 2. 1/sec @ 20 MHz Memory width (zero wait state) 15
The Thumb-2 instruction set § Variable-length instructions § § ARM instructions are a fixed length of 32 bits Thumb instructions are a fixed length of 16 bits § Thumb-2 instructions can be either 16 -bit or 32 -bit § Thumb-2 gives approximately 26% improvement in code density over ARM § Thumb-2 gives approximately 25% improvement in performance over Thumb 16
Cortex-M Programmer’s Model Main § Fully programmable in C § Stack-based exception model § Only two processor modes § § Thread Mode for User tasks Handler Mode for OS tasks and exceptions § Vector table contains addresses r 0 r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 r 9 r 10 r 11 r 12 sp lr r 15 (pc) x. PSR 17 Process sp
Cortex-M 3 Processor Privilege ARM Cortex-M 3 Privileged Aborts Interrupts Reset Supervisor Handler Mode OS System Call (SVCall) Undefined Instruction User Non-Privileged Thread Mode Application code Memory Instructions & Data 18
Cortex-M 3 Interrupt Handling § § One Non-Maskable Interrupt (INTNMI) supported 1 -240 prioritizable interrupts supported § Interrupts can be masked § § § Implementation option selects number of interrupts supported Nested Vectored Interrupt Controller (NVIC) is tightly coupled with processor core Interrupt inputs are active HIGH INTNMI NVIC … 1 -240 Interrupts INTISR[239: 0] Cortex-M 3 19 Cortex-M 3 Processor Core
Cortex-M 3 Exception Handling § § § Reset : power-on or system reset NMI : cannot be stopped or preempted by any exception other than reset Faults § Hard Fault : default Fault or any fault unable to activate § Memory Manage : MPU violations § Bus Fault : prefetch and memory access violations § Usage Fault : undef instructions, divide by zero, etc. § § SVCall : privileged OS requests Debug Monitor : debug monitor program Pend. SV : pending SVCalls Sys. Tick Interrupt : internal sys timer, i. e. , used by RTOS to periodically check resources or peripherals § 20 External Interrupt : i. e. , external peripherals
Cortex-M 3 Program Status Register 31 28 27 26 25 24 N Z C V Q § IT/ICI 7 0 ISR Number APSR - Application Program Status Register – ALU flags IPSR - Interrupt Program Status Register – Interrupt/Exception No. EPSR - Execution Program Status Register § IT field – If/Then block information § ICI field – Interruptible-Continuable Instruction information x. PSR § § 21 10 16 15 One Status Register consisting of § § IT T 23 Composite of the 3 PSRs Stored on the stack on exception entry
Conditional Execution § If – Then (IT) instruction added (16 bit) § Up to 3 additional “then” or “else” conditions maybe specified (T or E) § Makes up to 4 following instructions conditional ITTET EQ Inst 1 Inst 2 Inst 3 Inst 4 § § § 22 MOVEQ ADDEQ SUBNE ORREQ Any normal ARM condition code can be used 16 -bit instructions in block do not affect condition code flags § Apart from comparison instruction § 32 bit instructions may affect flags (normal rules apply) Current “if-then status” stored in CPSR § Conditional block maybe safely interrupted and returned to § Must NOT branch into or out of ‘if-then’ block
Classes of Instructions (v 4 T) Load/Store Miscellaneous Data Operations Change of Flow MOV Bcc BL BLX 23 PC, Rm
Data processing Instructions § Consist of : § Arithmetic: § Logical: § Comparisons: § Data movement: ADD AND CMP MOV ADC ORR CMN MVN SUB EOR TST SBC BIC TEQ RSB § These instructions only work on registers, NOT memory. § Syntax: <Operation>{<cond>}{S} Rd, Rn, Operand 2 § § § 24 Comparisons set flags only - they do not specify Rd Data movement does not specify Rn Second operand is sent to the ALU via barrel shifter. RSC
Using a Barrel Shifter: The 2 nd Operand 1 Operand 2 Register, optionally with shift operation § Shift value can be either be: § § Barrel Shifter § 5 bit unsigned integer Specified in bottom byte of another register. Used for multiplication by constant Immediate value § ALU § Result 25 8 bit number, with a range of 0 -255. § Rotated right through even number of positions Allows increased range of 32 -bit constants to be loaded directly into registers
Single register data transfer Word LDR STR LDRB STRB Byte LDRH STRH Halfword LDRSB LDRSH Signed byte load Signed halfword load § Memory system must support all access sizes § Syntax: § § LDR{<cond>}{<size>} Rd, <address> STR{<cond>}{<size>} Rd, <address> e. g. LDREQB 26
Agenda Introduction to ARM Ltd ARM Architecture/Programmers Model § Data Path and Pipelines System Design Development Tools 27
Cortex-M 3 Datapath I_HRDATA Instruction Decode Write Data Register Address Incrementer D_HADDR Read Data Register Address Register D_HWDATA D_HRDATA B Address Incrementer Register Bank Mul/Div ALU I_HADDR A Address Register Writeback INTADDR 28 Barrel Shifter ALU
Cortex-M 3 Pipeline § Cortex-M 3 has 3 -stage fetch-decode-execute pipeline § § Similar to ARM 7 Cortex-M 3 does more in each stage to increase overall performance 1 st Stage - Fetch 2 nd Stage - Decode AGU Fetch (Prefetch) 3 rd Stage - Execute Address Phase & Write Back Instruction Decode & Register Read Branch forwarding & speculation Branch Execute stage branch (ALU branch & Load Store Branch) 29 Data Phase Load/Store & Branch Multiply & Divide Shift ALU & Branch Write
ARM 10 vs. ARM 11 Pipelines ARM 10 Branch Prediction Instruction Fetch FETCH ARM or Thumb Instruction Decode ISSUE Reg Read DECODE Shift + ALU Memory Access Multiply Add EXECUTE MEMORY ARM 11 Fetch 1 30 Fetch 2 Decode Issue Shift ALU Saturate MAC 1 MAC 2 MAC 3 Address Data Cache 1 Data Cache 2 Write back Reg Write WRITE
Full Cortex-A 8 Pipeline Diagram 13 -Stage Integer Pipeline 31 10 -Stage NEON Pipeline
Agenda Introduction to ARM Ltd ARM Architecture/Programmers Model Data Path and Pipelines § System Design Development Tools 32
An Example AMBA System High Performance ARM processor High Bandwidth External Memory Interface AHB High-bandwidth on-chip RAM DMA Bus Master High Performance Pipelined Burst Support Multiple Bus Masters 33 APB Bridge UART Timer Keypad PIO Low Power Non-pipelined Simple Interface
Agenda Introduction to ARM Ltd ARM Architecture/Programmers Model Data Path and Pipelines System Design § Development Tools 34
ARM Debug Architecture Ethernet Debugger (+ optional trace tools) § § Embedded. ICE Logic § Provides breakpoints and processor/system access JTAG interface (ICE) § Converts debugger commands to JTAG signals Embedded trace Macrocell (ETM) § Compresses real-time instruction and data access trace § Contains ICE features (trigger & filter logic) Trace port analyzer (TPA) § Captures trace in a deep buffer 35 Trace Port JTAG port TAP controller ETM Embedded. ICE Logic ARM core
Keil Development Tools for ARM § § Includes ARM macro assembler, compilers (ARM Real. View C/C++ Compiler, Keil CARM Compiler, or GNU compiler), ARM linker, Keil u. Vision Debugger and Keil u. Vision IDE Keil u. Vision Debugger accurately simulates on-chip peripherals (I 2 C, CAN, UART, SPI, Interrupts, I/O Ports, A/D and D/A converters, PWM, etc. ) Evaluation Limitations § 16 K byte object code + 16 K data limitation § Some linker restrictions such as base addresses for code/constants § GNU tools provided are not restricted in any way http: //www. keil. com/demo/ 36
Keil Development Tools for ARM 37
University Resources § http: //www. arm. com/support/university/ § University@arm. com 38
Your Future at ARM… § Graduate and Internship/Co-op Opportunities § § Corporate: IT, Patents, Services (Training and Support), and Human Resources Incredible Culture and Comprehensive Benefit Package § § § Engineering: Memory, Validation, Performance, DFT, R&D, GPU and more! Sales and Marketing: Corporate and Technical Competitive Reward Work/Life Balance Personal Development Brilliant Minds and Innovative Solutions Keep in Touch! § 39 www. arm. com/about/careers
TI Panda Board OMAP 4430 Processor § 1 GHz Dual-core ARM Cortex-A 9 (NEON+VFP) § C 64 x+ DSP § Power. VR SGX 3 D GPU § 1080 p Video Support POP Memory § 1 GB LPDDR 2 RAM USB Powered § < 4 W max consumption (OMAP small % of that) § Many adapter options (Car, wall, battery, solar, . . ) 40
Project Ideas Using Panda § OS Projects § § OS porting to ARM/Cortex (TI OMAP) Myth. TV system “Super-Panda” – stack of Pandas as compute engine and task distribution Linux applications § NEON Optimization Projects § § § 41 Codec optimization in ffmpeg (pick your favorite codec) Voice and image recognition Open-source Flash player optimizations (swfdec)
Fin 42
Nokia N 95 Multimedia Computer OMAP™ 2420 Applications Processor ARM 1136™ processor-based So. C, developed using Magma ® Blast® family and winner of 2005 INSIGHT Award for ‘Most Innovative So. C’ Symbian OS™ v 9. 2 Operating System supporting ARM processor-based mobile devices, developed using ARM® Real. View® Compilation Tools S 60™ 3 rd Edition S 60 Platform supporting ARM processor-based mobile devices Mobiclip™ Video Codec Software video codec for ARM processor-based mobile devices ST WLAN Solution Ultra-low power 802. 11 b/g WLAN chip with ARM 9™ processor-based MAC Connect. Collaborate. Create. 43
Beagle Board 44
Targeting community development $149 > 1000 participants and growing Active & technical community Open access to hardware documentation Opportunity to tinker and learn 45 Personally affordable Wikis, blogs, promotion of community activity Freedom to innovate Addressing open source community needs Instant access to >10 million lines of code Free software
Fast, low power, flexible expansion OMAP 3530 Processor § 600 MHz Cortex-A 8 § NEON+VFPv 3 § 16 KB/16 KB L 1$ § 256 KB L 2$ § 430 MHz C 64 x+ DSP § 32 K/32 K L 1$ § 48 K L 1 D § 32 K L 2 § Power. VR SGX GPU § 64 K on-chip RAM POP Memory § 128 MB LPDDR RAM § 256 MB NAND flash 46 3” Peripheral I/O § DVI-D video out § SD/MMC+ § S-Video out § USB 2. 0 HS OTG § I 2 C, I 2 S, SPI, MMC/SD § JTAG § Stereo in/out § Alternate power § RS-232 serial USB Powered § 2 W maximum consumption § OMAP is small % of that § Many adapter options § Car, wall, battery, solar, …
And more… Other Features § 4 LEDs § USR 0 § USR 1 § PMU_STAT § PWR § 2 buttons § USER § RESET § 4 boot sources § SD/MMC § NAND flash § USB § Serial 47 On-going collaboration at Beagle. Board. org § Live chat via IRC for 24/7 community support § Links to software projects to download 3” Peripheral I/O § DVI-D video out § SD/MMC+ § S-Video out § USB HS OTG § I 2 C, I 2 S, SPI, MMC/SD § JTAG § Stereo in/out § Alternate power § RS-232 serial
Project Ideas Using Beagle § OS Projects § § OS porting to ARM/Cortex (TI OMAP) Myth. TV system “Super-Beagle” – stack of Beagles as compute engine and task distribution Linux applications § NEON Optimization Projects § § § 48 Codec optimization in ffmpeg (pick your favorite codec) Voice and image recognition Open-source Flash player optimizations (swfdec)