Lecture 3 Embedded System Implementation Platforms 1 Learning

Lecture 3 Embedded System Implementation Platforms 1

Learning Outcomes Identify the advantages and disadvantages of common embedded platforms Relate embedded platform advantages and system specifications Evaluate and compare alternative implementation platforms Select an appropriate implementation platform according to system functional and non-functional requirements and specifications 2

Embedded System Design: Software and Hardware (1/2) Embedded systems usually are delivered as a combination of software and hardware Unlike general purpose systems, where the hardware developer does not know which software the system will execute, we are here responsible for both—hardware and software, when developing an embedded system 3

Embedded System Design: Software and Hardware (1/2) This gives us the unique chance, and at the same time the daunting task to build hardware and software, together orchestrating a finely tuned solution to satisfy the customer’s requirements in functionality, safety, and economical feasibility. In other words we select a hardware implementation platform based on the software we will write and write software with the hardware platform in mind 4

Implementation Platform The hardware on which the embedded computing system will execute the software Could be a combination of platforms Hardware platforms: ◦ General Purpose Processors RISC CISC ◦ Application-Specific Processors Microcontrollers Digital Signal Processors ASIPs ◦ Hardware Accelerators GPUs FPGAs 5

Processor technology The architecture of the computation engine used to implement a system’s desired functionality Processor does not have to be programmable ◦ “Processor” not equal to general-purpose processor Controller Datapath Control logic and State register Register file Control logic and State register Registers Control logic index State register + IR PC General ALU IR Custom ALU PC Data memory Program memory Assembly code for: Data memory total = 0 for i =1 to … General-purpose (“software”) total Data memory Program memory Assembly code for: total = 0 for i =1 to … Application-specific Single-purpose (“hardware”) 6

Processor technology Processors vary in their customization for the problem at hand Desired functionality General-purpose processor Application-specific processor total = 0 for i = 1 to N loop total += M[i] end loop Single-purpose processor 7

General-purpose processors Programmable device used in a variety of applications ◦ Also known as “microprocessor” Features ◦ Program memory ◦ General datapath with large register file and general ALU User benefits ◦ Low time-to-market and NRE costs ◦ High flexibility “Pentium” the most well-known, but there are hundreds of others Controller Datapath Control logic and State register Register file IR PC Program memory General ALU Data memory Assembly code for: total = 0 for i =1 to … 8

Single-purpose processors Digital circuit designed to execute exactly one program ◦ a. k. a. coprocessor, accelerator or peripheral Features ◦ Contains only the components needed to execute a single program ◦ No program memory Benefits Controller Datapath Control logic index total State register + Data memory ◦ Fast ◦ Low power ◦ Small size 9

Application-specific processors Programmable processor optimized for a particular class of applications having common characteristics ◦ Compromise between general-purpose and single-purpose processors Features ◦ Program memory ◦ Optimized datapath ◦ Special functional units Benefits ◦ Some flexibility, good performance, size and power Controller Datapath Control logic and State register Registers Custom ALU IR PC Program memory Data memory Assembly code for: total = 0 for i =1 to … 10

General Purpose Processors High Flexibility ◦ General Instruction Set ◦ Good for all applications, optimized for none Low development time ◦ Programming in high-level language Low/Medium Processing Power Medium Cost High Power Consumption 11

Application-Specific Processors: Microcontrollers Medium Flexibility ◦ ISA optimized for control applications Low processing power ◦ Not suitable for data intensive applications Low power consumption Low cost Low development time 12

Application-Specific Processors: DSPs Medium Flexibility ◦ ISA optimized for fast execution of numerical algorithms necessary for analyzing signals Medium/High processing power Low/Medium power consumption Medium cost Low development time 13

Computer Architecture Taxonomy Unified vs separate memories ◦ Von Neumann vs Harvard Instruction format ◦ RISC vs CISC vs VLIW Data ◦ Register-Memory vs Load-Store vs Accumulator 14

Accumulator One operand implicitly in accumulator, the other in memory and result in accumulator

Register - Memory One operand is in a register the other in memory and the result is stored in a register

Register – Register/ Load – Store Both operands and result are stored in registers

Instruction Format Variable length (x 86) Fixed length (ARM, MIPS, Power. PC) Hybrid (MIPS 16, Thumb, TI TMS 320 C 54 x)

The 8051 microcontroller a Harvard architecture (separate instruction/data memories) single chip microcontroller (µC) developed by Intel in 1980 for use in embedded systems. today largely superseded by a vast range of faster and/or functionally enhanced 8051 compatible devices manufactured by more than 20 independent manufacturers

Block Diagram External interrupts Interrupt Control On-chip ROM for program code Timer/Counter On-chip RAM Timer 1 Timer 0 CPU OSC Bus Control 4 I/O Ports Serial Port P 0 P 1 P 2 P 3 Tx. D Rx. D Address/Data Counter Inputs

A Registers B R 0 DPTR DPH DPL R 1 R 2 PC PC R 3 R 4 Some 8051 16 -bit Register R 5 R 6 R 7 Some 8 -bit Registers of the 8051 A: Accumulator B: Used specially in MUL/DIV R 0 -R 7: GPRs ACOE 343 - Embedded Real-Time Processor Systems - Frederick University 21

The MOV Instruction – Addressing Modes MOV dest, source ; dest = source MOV A, #72 H ; A=72 H MOV A, #’r’ ; A=‘r’ OR 72 H MOV R 4, #62 H ; R 4=62 H MOV B, 0 F 9 H ; B=the content of F 9’th byte of RAM MOV DPTR, #7634 H MOV DPL, #34 H MOV DPH, #76 H MOV P 1, A ; mov A to port 1 Note 1: MOV A, #72 H After instruction “MOV ≠ MOV A, 72 H ” the content of 72’th byte of RAM will replace in Accumulator. 8086 8051 MOV AL, 72 H MOV A, #72 H MOV AL, ’r’ MOV A, #’r’ MOV BX, 72 H MOV AL, [BX] MOV A, 72 H MOV A, 3 Note 2: MOV A, R 3 ≡ ACOE 343 - Embedded Real-Time Processor Systems - Frederick University 22

Arithmetic Instructions ADD A, Source ; A=A+SOURCE ADD A, #6 ; A=A+6 ADD A, R 6 ; A=A+R 6 ADD A, 6 ; A=A+[6] or A=A+R 6 ADD A, 0 F 3 H ; A=A+[0 F 3 H] ACOE 343 - Embedded Real-Time Processor Systems - Frederick University 23

Set and Clear Instructions SETB CLR SETB SETB bit C P 0. 0 P 3. 7 ACC. 2 05 ; bit=1 ; bit=0 ; CY=1 ; bit 0 from port 0 =1 ; bit 7 from port 3 =1 ; bit 2 from ACCUMULATOR =1 ; set high D 5 of RAM loc. 20 h Note: CLR instruction is as same as SETB i. e: CLR C ; CY=0 But following instruction is only for CLR: CLR A ; A=0 ACOE 343 - Embedded Real-Time Processor Systems - Frederick University 24

SUBB A, source ; A=A-source-CY SETBC SUBB A, R 5 ; A=A-R 5 -1 ADC A, source ; A=A+source+CY SETBC ADC ; CY=1 A, R 5 ; A=A+R 5+1 ACOE 343 - Embedded Real-Time Processor Systems - Frederick University 25

DEC INC byte ; byte=byte-1 ; byte=byte+1 INC DEC R 7 A 40 H ; [40]=[40]-1 CPL A ; 1’s complement Example: MOV L 01: CPL MOV ACALL SJMP A, #55 H ; A=0101 B A P 1, A DELAY L 01 CALL NOP & RETI All are like 8086 instructions. ACOE 343 - Embedded Real-Time Processor Systems - Frederick University 26

Logic Instructions ANL byte/bit ORL byte/bit XRL byte EXAMPLE: MOV R 5, #89 H ANL R 5, #08 H ACOE 343 - Embedded Real-Time Processor Systems - Frederick University 27

Rotate Instructions RR A Accumulator rotate right RL A Accumulator Rotate left RRC A Accumulator Rotate right through the carry. RLC A Accumulator Rotate left through the carry. ACOE 343 - Embedded Real-Time Processor Systems - Frederick University 28

Structure of Assembly language and Running an 8051 program ORG MOV MOV ADD HERE: SJMP END 0 H R 5, #25 H R 7, #34 H A, #0 Myfile. lst A, R 5 A, #12 H HERE EDITOR PROGRAM Myfile. asm ASSEMBLER PROGRAM Other obj file Myfile. obj LINKER PROGRAM Myfile. abs OH PROGRAM Myfile. hex ACOE 343 - Embedded Real-Time Processor Systems - Frederick University 29

Addressing Modes Immediate Register Direct Register Indirect Indexed ACOE 343 - Embedded Real-Time Processor Systems - Frederick University 30

Immediate Addressing Mode MOV MOV MOV A, #65 H A, #’A’ R 6, #65 H DPTR, #2343 H P 1, #65 H Example : Num … MOV … ORG data 1: EQU 30 R 0, Num DPTR, #data 1 100 H db “Example” ACOE 343 - Embedded Real-Time Processor Systems - Frederick University 31

Direct Addressing Mode Although the entire of 128 bytes of RAM can be accessed using direct addressing mode, it is most often used to access RAM loc. 30 – 7 FH. MOV MOV R 0, 40 H 56 H, A A, 4 6, 2 ; ≡ MOV A, R 4 ; copy R 2 to R 6 ; MOV R 6, R 2 is invalid ! SFR register and their address MOV 0 E 0 H, #66 H MOV 0 F 0 H, R 2 MOV 80 H, A ; ≡ MOV A, #66 H ; ≡ MOV B, R 2 ; ≡ MOV P 1, A ACOE 343 - Embedded Real-Time Processor Systems - Frederick University 32

Register Indirect Addressing Mode In this mode, register is used as a pointer to the data. MOV A, @Ri MOV @R 1, B ; move content of RAM loc. Where address is held by Ri into A ( i=0 or 1 ) In other word, the content of register R 0 or R 1 is sources or target in MOV, ADD and SUBB insructions. Example: Write a program to copy a block of 10 bytes from RAM location sterting at 37 h to RAM location starting at 59 h. Solution: MOV R 0, 37 h MOV R 1, 59 h MOV R 2, 10 L 1: MOV A, @R 0 MOV @R 1, A INC R 0 INC R 1 DJNZ R 2, L 1 ; source pointer ; dest pointer ; counter jum p ACOE 343 - Embedded Real-Time Processor Systems - Frederick University 33

Indexed Addressing Mode And On-Chip ROM Access This mode is widely used in accessing data elements of look-up table entries located in the program (code) space ROM at the 8051 MOVC A, @A+DPTR A= content of address A +DPTR from ROM Note: Because the data elements are stored in the program (code ) space ROM of the 8051, it uses the instruction MOVC instead of MOV. The “C” means code. ACOE 343 - Embedded Real-Time Processor Systems - Frederick University 34

MUL & DIV MUL MOV MUL AB A, #25 H B, #65 H AB DIV MOV DIV AB A, #25 B, #10 AB ; B|A = A*B ; 25 H*65 H=0 E 99 ; B=0 EH, A=99 H ; A = A/B, B = A mod B ; A=2, B=5 ACOE 343 - Embedded Real-Time Processor Systems - Frederick University 35

Stack in the 8051 The register used to access the stack is called SP (stack pointer) register. 7 FH Scratch pad RAM 30 H The stack pointer in the 8051 is only 8 bits wide, which means that it can take value 00 to FFH. When 8051 powered up, the SP register contains value 07. 2 FH Bit-Addressable RAM 20 H 1 FH 18 H 17 H 10 H 0 FH 08 H 07 H 00 H Register Bank 3 Register Bank 2 (Stack) Register Bank 1 ACOE 343 - Embedded Real-Time Processor Systems - Frederick University Register Bank 0 36

Example: MOV MOV PUSH R 6, #25 H R 1, #12 H R 4, #0 F 3 H 6 1 4 0 BH 0 BH 0 AH 0 AH F 3 09 H 09 H 12 08 H 08 H 25 Start SP=07 H 25 SP=08 H SP=09 H ACOE 343 - Embedded Real-Time Processor Systems - Frederick University SP=10 H 37

Example (cont. ) POP POP 4 1 6 0 BH 0 BH 0 AH 09 H 0 AH F 3 0 AH 09 H 12 09 H 08 H 25 08 H SP=10 H SP=09 H 25 SP=08 H ACOE 343 - Embedded Real-Time Processor Systems - Frederick University 08 H Start SP=07 H 38

LOOP and JUMP Instructions Ø DJNZ: Write a program to clear ACC, then add 3 to the accumulator ten times Solution: MOV AGAIN: ADD DJNZ MOV A, #0; R 2, #10 A, #03 R 2, AGAING ; repeat until R 2=0 (10 times) R 5, A ACOE 343 - Embedded Real-Time Processor Systems - Frederick University 39

Other conditional jumps : JZ Jump if A=0 JNZ Jump if A/=0 DJNZ Decrement and jump if A/=0 CJNE A, byte Jump if A/=byte CJNE reg, #data Jump if byte/=#data JC Jump if CY=1 JNC Jump if CY=0 JB Jump if bit=1 JNB Jump if bit=0 JBC Jump if bit=1 and clear bit ACOE 343 - Embedded Real-Time Processor Systems - Frederick University 40

SJMP and LJMP: LJMP(long jump) LJMP is an unconditional jump. It is a 3 -byte instruction in which the first byte is the opcode, and the second and third bytes represent the 16 -bit address of the target location. The 20 byte target address allows a jump to any memory location from 0000 to FFFFH. SJMP(short jump) In this 2 -byte instruction. The first byte is the opcode and the second byte is the relative address of the target location. The relative address range of 00 -FFH is divided into forward and backward jumps, that is , within -128 to +127 bytes of memory relative to the address of the current PC. ACOE 343 - Embedded Real-Time Processor Systems - Frederick University 41

CJNE , JNC Exercise: Write a program that compare R 0, R 1. If R 0>R 1 then send 1 to port 2, else if R 0<R 1 then send 0 FFh to port 2, else send 0 to port 2. ACOE 343 - Embedded Real-Time Processor Systems - Frederick University 42

INTRODUCTION ARM is a RISC processor. It is used for small size and high performance applications. Simple architecture – low power consumption. ARM System - On - Chip Architecture 44

TIMELINE (1/2) 1985: Acorn Computer Group manufactures the first commercial RISC microprocessor. 1990: Acorn and Apple participation leads to the founding of Advanced RISC Machines (A. R. M. ). 1991: ARM 6, First embeddable RISC microprocessor. 1992 – 1994: Various companies use ARM (Sharp, Shamsung), while in 1993 ARM 7, the first multimedia microprocessor is introduced. ARM System - On - Chip Architecture 45

TIMELINE (2/2) 1995: Introduction of Thumb and ARM 8. 1996 – 2000: Alcatel, Huindai, Philips, Sony, use ΑRM, while in 1999 η ARM cooperates with Erickson for the development of Bluetooth. 2000 – 2002: ARM’s share of the 32 – bit embedded RISC microprocessor market is 80%. ARM Developer Suite is introduced. ARM System - On - Chip Architecture 46

THE ARM ARCHITECTURE

GENERAL INFO (1/2) AIM: Simple design Load – store architecture 32 bit data bus 3 addressing modes ARM System - On - Chip Architecture 48

Γενικά (2/2) Simple architecture + Simple instruction set + Code density Small size Low power consumption ARM System - On - Chip Architecture 49

Registers 32 general purpose registers 7 modes of operation Different set of visible registers and different cpsr control level in each mode. ARM System - On - Chip Architecture 50

Οι ορατοί καταχωρητές του ARM r 0 r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 r 9 r 10 r 11 r 12 r 13 r 14 r 15 (PC) CPSR user mode usable in user mode system modes only r 8_fiq r 9_fiq r 10_fiq r 11_fiq r 12_fiq r 13_fiq r 14_fiq SPSR_fiq mode r 13_svc r 14_svc SPSR_svc SPSR_abt svc mode abort mode r 13_irq r 14_irq r 13_abt r 14_abt r 13_und r 14_und SPSR_irq SPSR_und irq mode undefined mode

CPSR ARM CPSR format N: Negative Z: Zero C: Carry V: Overflow Q: Saturation (for enhanced DSP instructions) ARM System - On - Chip Architecture 52

Memory Organization Address bus: 32 – bits 1 word = 32 – bits ARM System - On - Chip Architecture 53

Instruction Set Three instruction types ◦ Data processing ◦ Data transfer ◦ Control flow ARM System - On - Chip Architecture 54

Supervisor mode In user mode the operating system handles operations outside user privileges. Using “supervisor calls”, the user goes to system level and can perform system functions. ARM System - On - Chip Architecture 55

I/O System ARM handles peripherals as “memory mapped devices with interrupt support”. Interrupts: ◦ IRQ: normal interrupt ◦ FIQ: fast interrupt ARM System - On - Chip Architecture 56

Exceptions Exceptions: When an exception takes place: ◦ Interrupts ◦ Supervisor Call ◦ Traps ◦ The value of PC is copied to r 14_exc ◦ The operating mode changes into the respective exception mode. ◦ The PC takes the exception handler vector address. ARM System - On - Chip Architecture 57

ARM programming model r 0 r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 r 9 r 10 r 11 r 12 r 13 r 14 r 15 (PC) CPSR user mode usable in user mode system modes only r 8_fiq r 9_fiq r 10_fiq r 11_fiq r 12_fiq r 13_fiq r 14_fiq SPSR_fiq mode r 13_svc r 14_svc SPSR_svc SPSR_abt svc mode abort mode r 13_irq r 14_irq r 13_abt r 14_abt r 13_und r 14_und SPSR_irq SPSR_und irq mode undefined mode

THE ARM INSTRUCTION SET

Data Processing Instructions (1/2) Arithmetic Operations ADD r 0, r 1, r 2 ; r 0: = r 1+r 2 and don’t update flags ADDS r 0, r 1, r 2 ; r 0: = r 1+r 2 and update flags Logical Operations AND r 0, r 1, r 2 ; r 0: = r 1 AND r 2 Register Movement MOV r 0, r 2 Comparison CMP r 1, r 2 ARM System - On - Chip Architecture 60

Data Processing Instructions (2/2) Operands: ◦ Immediate operands ADD r 3, #1 ◦ Shifted register operands: ADD r 3, r 2, r 1, LSL #3 Miscellaneous data processing instructions: ◦ Multiplication: MUL r 4, r 3, r 2 ARM System - On - Chip Architecture 61

Data transfer instructions Load and store instructions: Multiple data transfers: LDR r 0, [r 1] STR r 0, [r 1] ◦ Offset: LDR r 0, [r 1, #4] ◦ Post – indexed: LDR r 0, [r 1], #16 ◦ Auto – indexed: LDR r 0, [r 1, #16]! LDMIA r 1, {r 0, r 2, r 5} ARM System - On - Chip Architecture 62

Examples PRE: ◦ ◦ r 0 = 0 x 0000 r 1 = 0 x 00009000 mem 32[0 x 00009000] = 0 x 0101 mem 32[0 x 00009004] = 0 x 0202 LDR r 0, [r 1, #4]! POST: ◦ r 0 = 0 x 0202 ◦ r 1 = 0 x 00009004 ARM System - On - Chip Architecture 63

Examples PRE: ◦ ◦ r 0 = 0 x 0000 r 1 = 0 x 00009000 mem 32[0 x 00009000] = 0 x 0101 mem 32[0 x 00009004] = 0 x 0202 LDR r 0, [r 1, #4] POST: ◦ r 0 = 0 x 0202 ◦ r 1 = 0 x 00009000 ARM System - On - Chip Architecture 64

Examples PRE: ◦ ◦ r 0 = 0 x 0000 r 1 = 0 x 00009000 mem 32[0 x 00009000] = 0 x 0101 mem 32[0 x 00009004] = 0 x 0202 LDR r 0, [r 1], #4 POST: ◦ r 0 = 0 x 0101 ◦ r 1 = 0 x 00009004 ARM System - On - Chip Architecture 65

Examples ◦ mem 32[0 x 80018] = 0 x 03 ◦ mem 32[0 x 80014] = 0 x 02 ◦ mem 32[0 x 80010] = 0 x 01 ◦ r 0 = 0 x 00080010 LDMIA r 0!, {r 1 -r 3} ◦ r 0 = 0 x 0008001 c ◦ r 1 = 0 x 00000001 ◦ r 2 = 0 x 00000002 ◦ r 3 = 0 x 00000003 ARM System - On - Chip Architecture 66

Examples ◦ mem 32[0 x 8001 c] = 0 x 04 ◦ mem 32[0 x 80018] = 0 x 03 ◦ mem 32[0 x 80014] = 0 x 02 ◦ mem 32[0 x 80010] = 0 x 01 ◦ r 0 = 0 x 00080010 LDMIB r 0!, {r 1 -r 3} ◦ r 0 = 0 x 0008001 c ◦ r 1 = 0 x 00000002 ◦ r 2 = 0 x 00000003 ◦ r 3 = 0 x 00000004 ARM System - On - Chip Architecture 67

Conditional execution ARM System - On - Chip Architecture 68

Control flow instructions Branch instruction: B label Conditional branch: BNE label Branch and Link: BL label BL … Loop … loop … … MOV PC, r 14 ; return ARM System - On - Chip Architecture 69

Common DSP features Harvard architecture Dedicated single-cycle Multiply-Accumulate (MAC) instruction (hardware MAC units) Single-Instruction Multiple Data (SIMD) Very Large Instruction Word (VLIW) architecture Pipelining Saturation arithmetic Zero overhead looping Hardware circular addressing Cache DMA ACOE 343 - Embedded Real-Time Processor Systems - Frederick University 70

Harvard Architecture Physically separate memories and paths for instruction and data ACOE 343 - Embedded Real-Time Processor Systems - Frederick University 71

Single-Cycle MAC unit Can compute a sum of n-products in n cycles ACOE 343 - Embedded Real-Time Processor Systems - Frederick University 72

Single Instruction - Multiple Data (SIMD) A technique for data-level parallelism by employing a number of processing elements working in parallel ACOE 343 - Embedded Real-Time Processor Systems - Frederick University 73

Very Long Instruction Word (VLIW) A technique for instruction-level parallelism by executing instructions without dependencies (known at compiletime) in parallel Example of a single VLIW instruction: F=a+b; c=e/g; d=x&y; w=z*h; ACOE 343 - Embedded Real-Time Processor Systems - Frederick University 74

CISC vs. RISC vs. VLIW ACOE 343 - Embedded Real-Time Processor Systems - Frederick University 75

Pipelining DSPs commonly feature deep pipelines TMS 320 C 6 x processors have 3 pipeline stages with a number of phases (cycles): ◦ Fetch Program ◦ Decode Address Generate (PG) Address Send (PS) ready wait (PW) receive (PR) Dispatch (DP) Decode (DC) ◦ Execute 6 to 10 phases ACOE 343 - Embedded Real-Time Processor Systems - Frederick University 76

Saturation Arithmetic fixed range for operations like addition and multiplication normal overflow and underflow produce the maximum and minimum allowed value, respectively Associativity and distributivity no longer apply 1 signed byte saturation arithmetic examples: 64 + 69 = 127 -127 – 5 = -128 (64 + 70) – 25 = 122 ≠ 64 + (70 -25) = 109 ACOE 343 - Embedded Real-Time Processor Systems - Frederick University 77

Lecture 5 The TMS 320 C 6 x Family of DSPs

Features High-Performance Fixed-Point Digital Signal Processor (TMS 320 C 6413/C 6410) − TMS 320 C 6413 − TMS 320 C 6410 Eight 32 -Bit Instructions/Cycle ◦ 2 -ns Instruction Cycle Time ◦ 500 -MHz Clock Rate ◦ 4000 MIPS ◦ 2. 5 -ns Instruction Cycle Time ◦ 400 -MHz Clock Rate ◦ 3200 MIPS ACOE 343 - Real-Time Embedded Processor Systems - Frederick University

Features Eight Highly Independent Functional Units ◦ Six ALUs (32 -/40 -Bit), Each Supports Single 32 Bit, Dual 16 -Bit, or Quad 8 -Bit Arithmetic per Clock Cycle ◦ Two Multipliers Support Four 16 x 16 -Bit Multiplies (32 -Bit Results) per Clock Cycle or Eight 8 x 8 -Bit Multiplies (16 -Bit Results) per Clock Cycle Load-Store Architecture 64 32 -Bit General-Purpose Registers Instruction Packing Reduces Code Size All Instructions Conditional ACOE 343 - Real-Time Embedded Processor Systems - Frederick University

Features L 1/L 2 Memory Architecture − 128 K-Bit (16 K-Byte) L 1 P Program Cache (Direct Mapped) − 128 K-Bit (16 K-Byte) L 1 D Data Cache (2 -Way Set. Associative) − 2 M-Bit (256 K-Byte) L 2 Unified Mapped RAM/Cache [C 6413] − 1 M-Bit (128 K-Byte) L 2 Unified Mapped RAM/Cache [C 6410] Endianess: Little Endian, Big Endian − 512 M-Byte Total Addressable External Memory Space Enhanced Direct-Memory-Access (EDMA) Controller (64 Independent Channels) 16 prioritized interrupts ACOE 343 - Real-Time Embedded Processor Systems - Frederick University

Block Diagram ACOE 343 - Real-Time Embedded Processor Systems - Frederick University

Programming the TMS 320 C 6 x Family of DSPs Programming model Assembly language ◦ Assembly code structure ◦ Assembly instructions C/C++ ◦ ◦ ◦ Intrinsic functions Optimizations Software Pipelining Inline Assembly Calling Assembly functions Using Interrupts Using DMA ACOE 343 - Embedded Real-Time Processor Systems - Frederick University

Programming model Two register files: A and B 16 registers in each register file (A 0 -A 15), (B 0 -B 15) A 0, A 1, B 0, B 1 used in conditions A 4 -A 7, B 4 -B 7 used for circular addressing ACOE 343 - Embedded Real-Time Processor Systems - Frederick University

Assembly language structure A TMS 320 C 6 x assembly instruction includes up to seven items: ◦ Label ◦ Parallel bars ◦ Conditions ◦ Instruction ◦ Functional unit ◦ Operands ◦ Comment Format of assembly instruction: Label: parallel bars [condition] instruction unit operands ; comment ACOE 343 - Embedded Real-Time Processor Systems - Frederick University

Parallel bars || : indicates that current instruction executes in parallel with previous instruction, otherwise left blank ACOE 343 - Embedded Real-Time Processor Systems - Frederick University

Condition All assembly instructions are conditional If no condition is specified, the instruction executes always If a condition is specified, the instruction executes only if the condition is valid Registers used in conditions are A 1, A 2, B 0, B 1, and B 2 Examples: [A] ; executes if A ≠ 0 [!A] ; executes if A = 0 [B 0] ADD. L 1 A 1, A 2, A 3 || [!B 0] ADD. L 2 B 1, B 2, B 3 ACOE 343 - Embedded Real-Time Processor Systems - Frederick University

Instruction Either directive or mnemonic Directives must begin with a period (. ) Mnemonics should be in column 2 or higher Examples: . sect data ; creates a code section. word value ; one word of data ACOE 343 - Embedded Real-Time Processor Systems - Frederick University

Functional units (optional) L units: 32/40 bit arithmetic/compare and 32 bit logic operations S units: 32 -bit arithmetic operations, 32/40 -bit shifts and 32 -bit bit-field operations, 32 -bit logical operations, Branches, Constant generation, Register transfers to/from control register file (. S 2 only) M units: 16 x 16 multiply operations D units: 32 -bit add, subtract, linear and circular address calculation, Loads and stores with 5 -bit constant offset, Loads and stores with 15 -bit constant, offset (. D 2 only) ACOE 343 - Embedded Real-Time Processor Systems - Frederick University

Operands All instructions require a destination operand. Most instructions require one or two source operands. The destination operand must be in the same register file as one source operand. One source operand from each register file per execute packet can come from the register file opposite that of the other source operand. Example: ◦ ADD. L 1 A 0, A 1, A 3 ◦ ADD. L 1 A 0, B 1, A 2 ACOE 343 - Embedded Real-Time Processor Systems - Frederick University

Instruction format Fetch packet The same functional unit cannot be used in the same fetch packet ◦ ADD. S 1 A 0, A 1, A 2 ; . S 1 is used for ◦ || SHR. S 1 A 3, 15, A 4 ; . . . both instructions ACOE 343 - Embedded Real-Time Processor Systems - Frederick University

Arithmetic instructions Add/subtract/multiply: ADD. L 1 A 3, A 2, A 1 ; A 1←A 2+A 3 SUB. S 1 A 1, 1, A 1 ; decrement A 1 MPY. M 2 A 7, B 6 ; multiply LSBs || MPYH. M 1 A 7, B 7, A 6 ; multiply MSBs ACOE 343 - Embedded Real-Time Processor Systems - Frederick University

Move and Load/store Instructions. Addressing Modes Loading constants: MVK. S 1 val 1, A 4 MVKH. S 1 val 1, A 4 ; move low halfword ; move high halfword Indirect Addressing Mode: LDH. D 2 *B 2++, B 7 ; load halfword B 7←[B 2], increment B 2 || LDH. D 1 *A 2++, A 7 ; load halfword A 7←[A 2], increment A 2 STW. D 2 A 1, *+A 4[20] ; store [A 4]+20 words ← A 2, ; preincrement/don’t modify A 4 ACOE 343 - Embedded Real-Time Processor Systems - Frederick University

Example Calculate the values of register and memory for the following instructions: A 2= 0 x 00000010, MEM[0 x 00000010] = 0 x 0, MEM[0 x 00000014] = 0 x 1, MEM[0 x 00000018] = 0 x 2, MEM[0 x 0000001 C] = 0 x 3, LDH LDH . D 1 *++A 2, A 7 A 2= ? *A 2 --[2], A 7 A 2= ? A 7= *-A 2, A 7 A 2= ? *++A 2[2], A 7 A 2= ? A 7= ACOE 343 - Embedded Real-Time Processor Systems - Frederick University ? ? A 7= ?

Branch and Loop Instructions Loop example: LOOP MVK. S 1 count, A 1; loop counter || MVKH. S 2 count, A 1 MVK. S 1 val 1, A 4 ; loop MVKH. S 1 val 1, A 4; body [A 1] SUB. S 1 A 1, 1, A 1 B. S 2 Loop NOP 5 ; decrement counter ; branch if A 1 ≠ 0 ; 5 NOPs for branch ACOE 343 - Embedded Real-Time Processor Systems - Frederick University

Programmable Logic Devices All layers (diffusion, polysilicon, [multi-] metal) may exist ◦ Designers can purchase an IC ◦ Connections on the IC are either created or destroyed to implement desired functionality ◦ Field-Programmable Gate Array (FPGA) and recently Gate Arrays are very popular Benefits ◦ Low NRE costs, almost instant IC availability Drawbacks ◦ Bigger, expensive (perhaps $30 per unit), power hungry, slower 96

Xilinx XC 4000 FPGA CLBs are configured to implement simple logic 97

XC 4000 CLB 98

Xilinx Zynq-7000 Extensible Processing Platform (EPP): Dual Cortex A 9 + FPGA So. C dual-core ARM Cortex-A 9 28 -nm programmable digital FPGA programmable analog capabilities automotive (video processing and analytics requirements for driver assistance systems), broadcast (high-bit-rate bandwidth for highaccuracy video processing and analytics), industrial control 99

Platform-Based Design “Only the consumer gets freedom of choice; designers need freedom from choice” (Orfali, et al, 1996, p. 522) A platform is a restriction on the space of possible implementation choices, providing a well-defined abstraction of the underlying technology for the application developer New platforms will be defined at the architecture-microarchitecture boundary They will be component-based, and will provide a range of choices from structured-custom to fully programmable implementations Key to such approaches is the representation of communication in the platform model Source: R. Newton 10 0

Platform-based Design – System-on. Chip Use of predefined Intellectual Property (IP) A platform-based system consists of a RISC processor, memories, busses and a common language Platform-based design poses the problem of partitioning a solution between hardware (HDL) and software (programming processors) 10 1

Platforms Enable Simplified So. C Design Core n Near Peripherals n Far Peripherals Customer demands – Fast turn-around time – Easy access to pre-qualified building blocks – Web enabled Design technology – – – Core platforms ‘Big’ IP Emerging So. C bus standards Embedded software HW/SW co-verification 10 2

And Automation of IP Selection & Integration 10 3

Heterogeneous Programmable Platforms FPGA Fabric Embedded memories Embedded Power. Pc Hardwired multipliers Xilinx Vertex-II Pro High-speed I/O 10 4

Xilinx’s products 10 5

Xilinx’s products 10 6

Comparison of CMOS design methods/ Implementation Platforms Design Method NRE Unit Cost Power Dissipation Complexity of Implement ation Time-to. Market Performance Flexibility μProcessor /DSP low medium high low low high PLA low medium low FPGA low high medium medium Gate/Array medium low medium Cell Based high low high low Custom Design high low high Very high low Platform Based high Low/mediu m low high Medium/l ow high medium 107