Embedded Systems Design A Unified HardwareSoftware Introduction Microprocessors
Embedded Systems Design: A Unified Hardware/Software Introduction Microprocessors 1
CMOS transistor on silicon • Transistor – The basic electrical component in digital systems – Acts as an on/off switch – Voltage at “gate” controls whether current flows from source to drain – Don’t confuse this “gate” with a logic gate 1 IC package IC source gate oxide channel source Conducts if gate=1 drain Silicon substrate Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 2
CMOS transistor implementations • Complementary Metal Oxide Semiconductor • We refer to logic levels source gate Conducts if gate=1 drain p. MOS n. MOS – Typically 0 is 0 V, 1 is 5 V Conducts if gate=0 drain • Two basic CMOS types – n. MOS conducts if gate=1 – p. MOS conducts if gate=0 – Hence “complementary” • Basic gates – Inverter, NAND, NOR Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 1 1 1 x x F = x' x 0 y x F = (xy)' y y x 0 inverter F = (x+y)' NAND gate y 0 NOR gate 3
Basic logic gates x F x 0 1 F=x Driver x F = x’ Inverter x F y F=xy AND F x 0 1 F 1 0 x y F F = (x y)’ NAND Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis x 0 0 1 1 y 0 1 F 0 0 0 1 x y x 0 0 1 1 y 0 1 F 1 1 1 0 x y F F=x+y OR F = (x+y)’ NOR F x 0 0 1 1 y 0 1 F 0 1 1 1 x x 0 0 1 1 y 0 1 F 1 0 0 0 x F y F=x y XOR F y F=x y XNOR x 0 0 1 1 y 0 1 F 0 1 1 0 x 0 0 1 1 y 0 1 F 1 0 0 1 4
Combinational logic design A) Problem description y is 1 if a is to 1, or b and c are 1. z is 1 if b or c is to 1, but not both, or if all are 1. D) Minimized output equations y bc 00 01 11 10 a 0 0 0 1 1 1 y = a + bc z a bc 00 0 0 01 1 11 0 10 1 1 1 0 B) Truth table a 0 0 1 1 Inputs b c 0 0 0 1 1 C) Output equations Outputs y z 0 0 0 1 0 1 1 1 1 y = a'bc + ab'c' + ab'c + abc' + abc z = a'b'c + a'bc' + ab'c + abc' + abc E) Logic Gates a b c y z z = ab + b’c + bc’ Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 5
Combinational components I(m-1) I 1 I 0 n … S 0 n-bit, m x 1 … Multiplexor S(log m) n O O= I 0 if S=0. . 00 I 1 if S=0. . 01 … I(m-1) if S=1. . 11 I(log n -1) I 0 … A B n n log n x n Decoder … n-bit Adder O(n-1) O 1 O 0 carry sum B n n-bit Comparator n O 0 =1 if I=0. . 00 O 1 =1 if I=0. . 01 … O(n-1) =1 if I=1. . 11 sum = A+B (first n bits) carry = (n+1)’th bit of A+B With enable input e all O’s are 0 if e=0 With carry-in input Ci Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis A sum = A + B + Ci less equal greater less = 1 if A<B equal =1 if A=B greater=1 if A>B A n B n n bit, m function S 0 ALU … S(log m) n O O = A op B op determined by S. May have status outputs carry, zero, etc. 6
Sequential components I n load clear n-bit Register n shift I n-bit Shift register Q Q Q= 0 if clear=1, I if load=1 and clock=1, Q(previous) otherwise. Q n-bit Counter n Q = lsb - Content shifted - I stored in msb Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis Q= 0 if clear=1, Q(prev)+1 if count=1 and clock=1. 7
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 8
Gated R-S Latch (clocked S-R flip-flop) Enb = 1, latch closed (outputs unchanged) Enb = 0, enabled (outputs depend on inputs) Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 9
J-K Flip-flop How to eliminate the forbidden state? Idea: use output feedback to guarantee that R and S are never both one J, K both one yields toggle Characteristic Equation: Q+ = Q K + Q J Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 10
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 12
Sequential logic design A) Problem Description C) Implementation Model You want to construct a clock divider. Slow down your preexisting clock so that you output a 1 for every four clock cycles a Combinational logic B) State Diagram a=0 0 a=1 a=0 x=0 a=0 I 1 3 a=1 1 Q 0 State register x=1 x=0 x I 1 I 0 Q 1 I 0 D) State Table (Moore-type) Q 1 0 0 1 1 Inputs Q 0 a 0 0 0 1 1 I 1 0 0 0 1 1 0 Outputs I 0 0 1 1 0 x 0 0 0 1 a=1 2 x=0 a=0 • Given this implementation model Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis – Sequential logic design quickly reduces to combinational logic design 13
Sequential logic design (cont. ) F) Combinational Logic E) Minimized Output Equations I 1 Q 1 Q 0 00 a 01 11 10 0 1 1 1 0 1 01 11 10 I 0 Q 1 Q 0 00 a 0 0 1 1 0 0 1 x Q 1 Q 0 00 a a x I 1 = Q 1’Q 0 a + Q 1 a’ + Q 1 Q 0’ I 1 I 0 = Q 0 a’ + Q 0’a I 0 01 11 10 0 1 0 x = Q 1 Q 0 Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis Q 1 Q 0 14
Basic Architecture • Control unit and datapath Processor Control unit – Note similarity to single -purpose processor Datapath ALU Controller Control /Status • Key differences – Datapath is general – Control unit doesn’t store the algorithm – the algorithm is “programmed” into the memory Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis Registers PC IR I/O Memory 15
Datapath Operations • Load Processor – Read memory location into register • Control unit Datapath ALU operation Controller – Input certain registers through ALU, store back in register • Registers Store – Write register to memory location 10 PC 11 IR I/O Memory Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis +1 Control /Status . . . 10. . . 11 16
Control Unit • Control unit: configures the datapath operations Processor – Sequence of desired operations (“instructions”) stored in memory – “program” • Control unit ALU Controller Instruction cycle – broken into several sub-operations, each one clock cycle, e. g. : – Fetch: Get next instruction into IR – Decode: Determine what the instruction means – Fetch operands: Move data from memory to datapath register – Execute: Move data through the ALU – Store results: Write data from register to memory Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis Datapath Control /Status Registers PC IR R 0 I/O 100 load R 0, M[500] 101 inc R 1, R 0 102 store M[501], R 1 Memory R 1 . . . 10 501. . . 500 17
Control Unit Sub-Operations • Fetch – Get next instruction into IR – PC: program counter, always points to next instruction – IR: holds the fetched instruction Processor Control unit ALU Controller Control /Status Registers PC 100 IR load R 0, M[500] R 0 I/O 100 load R 0, M[500] 101 inc R 1, R 0 102 store M[501], R 1 Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis Datapath Memory R 1 . . . 500 501 10. . . 18
Control Unit Sub-Operations • Decode Processor Control unit – Determine what the instruction means Datapath ALU Controller Control /Status Registers PC 100 IR load R 0, M[500] R 0 I/O 100 load R 0, M[500] 101 inc R 1, R 0 102 store M[501], R 1 Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis Memory R 1 . . . 500 501 10. . . 19
Control Unit Sub-Operations • Fetch operands Processor Control unit – Move data from memory to datapath register Datapath ALU Controller Control /Status Registers 10 PC 100 IR load R 0, M[500] R 0 I/O 100 load R 0, M[500] 101 inc R 1, R 0 102 store M[501], R 1 Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis Memory R 1 . . . 500 501 10. . . 20
Control Unit Sub-Operations • Execute – Move data through the ALU – This particular instruction does nothing during this sub-operation Processor Control unit Datapath ALU Controller Control /Status Registers 10 PC 100 IR load R 0, M[500] R 0 I/O 100 load R 0, M[500] 101 inc R 1, R 0 102 store M[501], R 1 Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis Memory R 1 . . . 500 501 10. . . 21
Control Unit Sub-Operations • Store results – Write data from register to memory – This particular instruction does nothing during this sub-operation Processor Control unit Datapath ALU Controller Control /Status Registers 10 PC 100 IR load R 0, M[500] R 0 I/O 100 load R 0, M[500] 101 inc R 1, R 0 102 store M[501], R 1 Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis Memory R 1 . . . 500 501 10. . . 22
Instruction Cycles PC=100 Fetch Decode Fetch Exec. Store ops results clk Processor Control unit Datapath ALU Controller Control /Status Registers 10 PC 100 IR load R 0, M[500] R 0 I/O 100 load R 0, M[500] 101 inc R 1, R 0 102 store M[501], R 1 Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis Memory R 1 . . . 10 501. . . 500 23
Instruction Cycles PC=100 Fetch Decode Fetch Exec. Store ops results clk Processor Control unit Datapath ALU Controller +1 Control /Status PC=101 Registers Fetch Decode Fetch Exec. Store ops results clk 10 PC 101 IR inc R 1, R 0 I/O 100 load R 0, M[500] 101 inc R 1, R 0 102 store M[501], R 1 Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis Memory 11 R 1 . . . 10 501. . . 500 24
Instruction Cycles PC=100 Fetch Decode Fetch Exec. Store ops results clk Processor Control unit Datapath ALU Controller Control /Status PC=101 Registers Fetch Decode Fetch Exec. Store ops results clk 10 PC 102 IR store M[501], R 1 R 0 11 R 1 PC=102 Fetch Decode Fetch Exec. Store ops results clk Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis I/O 100 load R 0, M[500] 101 inc R 1, R 0 102 store M[501], R 1 Memory . . . 10 501 11. . . 500 25
Architectural Considerations • N-bit processor – N-bit ALU, registers, buses, memory data interface – Embedded: 8 -bit, 16 bit, 32 -bit common – Desktop/servers: 32 bit, even 64 • PC size determines address space Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis Processor Control unit Datapath ALU Controller Control /Status Registers PC IR I/O Memory 26
Architectural Considerations • Clock frequency – Inverse of clock period – Must be longer than longest register to register delay in entire processor – Memory access is often the longest Processor Control unit Datapath ALU Controller Control /Status Registers PC IR I/O Memory Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 27
Pipelining: Increasing Instruction Throughput Wash 1 2 3 4 5 6 7 8 1 2 3 Non-pipelined Dry 1 Decode 1 2 3 4 5 6 7 1 Time 4 5 6 7 8 1 2 3 4 5 6 7 Instruction 1 pipelined instruction execution Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 6 7 8 2 3 4 5 6 7 pipelined dish cleaning 3 Execute Store res. 8 2 Fetch ops. 5 Pipelined non-pipelined dish cleaning Fetch-instr. 4 8 Time Pipelined 8 Time 28
Superscalar and VLIW Architectures • Performance can be improved by: – Faster clock (but there’s a limit) – Pipelining: slice up instruction into stages, overlap stages – Multiple ALUs to support more than one instruction stream • Superscalar – Scalar: non-vector operations – Fetches instructions in batches, executes as many as possible • May require extensive hardware to detect independent instructions – VLIW: each word in memory has multiple independent instructions • Relies on the compiler to detect and schedule instructions • Currently growing in popularity Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 29
Two Memory Architectures Processor • Princeton Processor – Fewer memory wires • Harvard – Simultaneous program and data memory access Program memory Data memory Harvard Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis Memory (program and data) Princeton 30
Cache Memory • Memory access may be slow • Cache is small but fast memory close to processor – Holds copy of part of memory – Hits and misses Fast/expensive technology, usually on the same chip Processor Cache Memory Slower/cheaper technology, usually on a different chip Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 31
Programmer’s View • Programmer doesn’t need detailed understanding of architecture – Instead, needs to know what instructions can be executed • Two levels of instructions: – Assembly level – Structured languages (C, C++, Java, etc. ) • Most development today done using structured languages – But, some assembly level programming may still be necessary – Drivers: portion of program that communicates with and/or controls (drives) another device • Often have detailed timing considerations, extensive bit manipulation • Assembly level may be best for these Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 32
Assembly-Level Instructions Instruction 1 opcode operand 1 operand 2 Instruction 2 opcode operand 1 operand 2 Instruction 3 opcode operand 1 operand 2 Instruction 4 opcode operand 1 operand 2 . . . • Instruction Set – Defines the legal set of instructions for that processor • Data transfer: memory/register, register/register, I/O, etc. • Arithmetic/logical: move register through ALU and back • Branches: determine next PC value when not just PC+1 Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 33
A Simple (Trivial) Instruction Set Assembly instruct. First byte Second byte Operation MOV Rn, direct 0000 Rn direct Rn = M(direct) MOV direct, Rn 0001 Rn direct M(direct) = Rn MOV @Rn, Rm 0010 Rn MOV Rn, #immed. 0011 Rn ADD Rn, Rm 0100 Rn Rm Rn = Rn + Rm SUB Rn, Rm 0101 Rn Rm Rn = Rn - Rm JZ Rn, relative 0110 Rn opcode Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis Rm immediate relative M(Rn) = Rm Rn = immediate PC = PC+ relative (only if Rn is 0) operands 34
Addressing Modes Addressing mode Operand field Immediate Data Register-direct Register-file contents Memory contents Register address Data Register indirect Register address Memory address Direct Memory address Data Indirect Memory address Data Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 35
Sample Programs Equivalent assembly program C program int total = 0; for (int i=10; i!=0; i--) total += i; // next instructions. . . 0 1 2 3 MOV R 0, #0; MOV R 1, #10; MOV R 2, #1; MOV R 3, #0; // total = 0 // i = 10 // constant 1 // constant 0 Loop: 5 6 7 JZ R 1, Next; ADD R 0, R 1; SUB R 1, R 2; JZ R 3, Loop; // Done if i=0 // total += i // i-// Jump always Next: // next instructions. . . • Try some others – Handshake: Wait until the value of M[254] is not 0, set M[255] to 1, wait until M[254] is 0, set M[255] to 0 (assume those locations are ports). – (Harder) Count the occurrences of zero in an array stored in memory locations 100 through 199. Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 36
Application-Specific Instruction-Set Processors (ASIPs) • General-purpose processors – Sometimes too general to be effective in demanding application • e. g. , video processing – requires huge video buffers and operations on large arrays of data, inefficient on a GPP – But single-purpose processor has high NRE, not programmable • ASIPs – targeted to a particular domain – Contain architectural features specific to that domain • e. g. , embedded control, digital signal processing, video processing, network processing, telecommunications, etc. – Still programmable Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 37
A Common ASIP: Microcontroller • For embedded control applications – Reading sensors, setting actuators – Mostly dealing with events (bits): data is present, but not in huge amounts – e. g. , VCR, disk drive, digital camera (assuming SPP for image compression), washing machine, microwave oven • Microcontroller features – On-chip peripherals • Timers, analog-digital converters, serial communication, etc. • Tightly integrated for programmer, typically part of register space – On-chip program and data memory – Direct programmer access to many of the chip’s pins – Specialized instructions for bit-manipulation and other low-level operations Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 38
Another Common ASIP: Digital Signal Processors (DSP) • For signal processing applications – Large amounts of digitized data, often streaming – Data transformations must be applied fast – e. g. , cell-phone voice filter, digital TV, music synthesizer • DSP features – Several instruction execution units – Multiple-accumulate single-cycle instruction, other instrs. – Efficient vector operations – e. g. , add two arrays • Vector ALUs, loop buffers, etc. Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 39
Trend: Even More Customized ASIPs • In the past, microprocessors were acquired as chips • Today, we increasingly acquire a processor as Intellectual Property (IP) – e. g. , synthesizable VHDL model • Opportunity to add a custom datapath hardware and a few custom instructions, or delete a few instructions – Can have significant performance, power and size impacts – Problem: need compiler/debugger for customized ASIP • Remember, most development uses structured languages • One solution: automatic compiler/debugger generation – e. g. , www. tensillica. com • Another solution: retargettable compilers – e. g. , www. improvsys. com (customized VLIW architectures) Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 40
Programmer Considerations • Program and data memory space – Embedded processors often very limited • e. g. , 64 Kbytes program, 256 bytes of RAM (expandable) • Registers: How many are there? – Only a direct concern for assembly-level programmers • I/O – How communicate with external signals? • Interrupts Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 41
Selecting a Microprocessor • Issues – Technical: speed, power, size, cost – Other: development environment, prior expertise, licensing, etc. • Speed: how evaluate a processor’s speed? – Clock speed – but instructions per cycle may differ – Instructions per second – but work per instr. may differ – Dhrystone: Synthetic benchmark, developed in 1984. Dhrystones/sec. • MIPS: 1 MIPS = 1757 Dhrystones per second (based on Digital’s VAX 11/780). A. k. a. Dhrystone MIPS. Commonly used today. – So, 750 MIPS = 750*1757 = 1, 317, 750 Dhrystones per second – SPEC: set of more realistic benchmarks, but oriented to desktops – EEMBC – EDN Embedded Benchmark Consortium, www. eembc. org • Suites of benchmarks: automotive, consumer electronics, networking, office automation, telecommunications Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 42
General Purpose Processors Sources: Intel, Motorola, MIPS, ARM, TI, and IBM Website/Datasheet; Embedded Systems Programming, Nov. 1998 Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 43
Microprocessor Architecture Overview • If you are using a particular microprocessor, now is a good time to review its architecture Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 44
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 45
Microcontroller catalogue Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 47
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Microcontroller packaging Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 49
- Slides: 49