DIGITAL SIGNAL PROCESSING Dr Hugh Blanton ENTC 4337ENTC
DIGITAL SIGNAL PROCESSING Dr. Hugh Blanton ENTC 4337/ENTC 5337
Outline • Signal processing applications • Conventional DSP architecture • TI TMS 320 C 6000 DSP architecture introduction • Signal processing on general-purpose processors • Conclusion Dr. Blanton - ENTC 4337 - General Architecture 2
Signal Processing Applications • Embedded system demand: product volume matters • 400 Million units/year: automobiles, PCs, and cell phones • 30 Million units/year: ADSL modems and printers • Embedded system cost and input/output rates • Low-cost, medium-throughput: low-end printers, wireless handsets, sound cards, car audio, disk drives • High-cost, high-throughput: high-end printers, wireless basestations, 3 -D sonar, 3 -D images from 2 -D X-rays (tomographic reconstruction) Single DSP Multiple DSPs • Embedded processor requirements • Inexpensive with small area and volume • Predictable input/output (I/O) rates to/from processor • Power constraints (severe for handheld devices) Dr. Blanton - ENTC 4337 - General Architecture 3
Conventional DSP Processors • Low cost: $3/processor in volume • Deterministic interrupt service routine latency guarantees predictable input/output rates • On-chip direct memory access (DMA) controllers • Processes streaming input/output separately from CPU • Sends interrupt to CPU when block has been read/written • Ping-pong buffering • CPU reads/writes buffer 1 while DMA reads/writes buffer 2 • When DMA finishes with buffer 2, roles of buffer 1 & 2 switch Dr. Blanton - ENTC 4337 - General Architecture 4
Conventional DSP Processors • Low power consumption: 10 -100 m. W • TI TMS 320 C 54 0. 32 m. A/MIP 76. 8 m. W at 1. 5 V, 160 MHz • TI TMS 320 C 55 0. 05 m. A/MIP 22. 5 m. W at 1. 5 V, 300 MHz Dr. Blanton - ENTC 4337 - General Architecture 5
Conventional DSP Architecture • Multiply-accumulate (MAC) in one instruction cycle • Harvard architecture for fast on-chip input/output • Data memory/bus(es) separate from program memory/bus • One read from program memory per instruction cycle • Two reads/writes from/to data memory per instruction cycle • Instructions to keep pipeline (3 -6 stages) full • Zero-overhead looping (one pipeline flush to set up) • Delayed branches • Special addressing modes supported in hardware • Bit-reversed addressing (e. g. fast Fourier transforms) • Modulo addressing for circular buffers (e. g. FIR filters) Dr. Blanton - ENTC 4337 - General Architecture 6
Conventional DSP Architecture (con’t) • Buffer of length K • Used in finite and infinite impulse response filters • Linear buffer • Order by time index • Data shifting update: discard oldest data, copy old data left, insert new data Dr. Blanton - ENTC 4337 - Data Shifting Time Buffer contents Next sample n=N x. N-K+1 x. N-K+2 x. N-1 x. N+1 n=N+1 x. N-K+2 x. N-K+3 x. N+1 x. N+2 n=N+2 x. N-K+3 x. N-K+4 x. N+1 x. N+2 x. N+3 General Architecture 7
Conventional DSP Architecture (con’t) • Circular buffer • Index oldest sample • Modulo addressing update: insert new data at oldest index, update oldest index Modulo Addressing Time Next sample Buffer contents n=N x. N-2 x. N-1 x. N-K+1 n=N+1 x. N-2 x. N-1 x. N+1 n=N+2 x. N-1 x. NN x. N+1 Dr. Blanton - ENTC 4337 - x. N-K+2 x. N+1 x. N-K+2 x. N-K+3 x. N+2 x. N-K+3 xx. N-K+4 General Architecture x. N+3 8
Conventional DSP Processors Summary Dr. Blanton - ENTC 4337 - General Architecture 9
Conventional DSP Processor Families • Floating-point DSPs • Used in first pass prototyping of algorithms DSP Market Fixed-point 95% Floating-point 5% • Resurgence due to professional and car audio • Different on-chip configurations in each family • Size and map of data and program memory • A/D, input/output buffers, interfaces, timers, and D/A • Drawbacks to conventional DSP processors • No byte addressing (needed for images and video) • Limited on-chip memory • Limited addressable memory on fixed-point DSPs (exceptions include Motorola 56300 and TI C 5409) • Non-standard C extensions for fixed-point data type Dr. Blanton - ENTC 4337 - General Architecture 10
TI TMS 320 C 6000 DSP Architecture Simplified Architecture Program RAM or Cache Data RAM Addr Internal Buses Data . D 2 . M 1 . M 2 . L 1 . L 2 . S 1 . S 2 Regs (B 0 -B 15) Regs (A 0 -A 15) External Memory -Sync -Async . D 1 DMA Serial Port Host Port Boot Load Timers Control Regs Pwr Down CPU Dr. Blanton - ENTC 4337 - General Architecture 11
TI TMS 320 C 6000 DSP Architecture • Families: All support same C 6000 instruction set • C 6200 fixed-point 150 - 300 MHz ADSL, printers • C 6400 fixed-point 300 -1, 000 MHz video/comm. apps. • C 6700 floating-point 100 - 225 MHz medical imaging, pro-audio • TMS 320 C 6211: 150 MHz, $21 in volume • 300 million multiply-accumulates/s, 1200 RISC MIPS • On-chip memory: 16 kwords program, 16 kwords data • TMS 320 C 6701 Evaluation Module Board 167 MHz • 334 million multiply-accumulates/s, 1336 RISC MIPS • On-chip memory: 16 kwords program, 16 kwords data • External: one 133 -MHz 64 -kword, two 100 -MHz 1 -Mword Dr. Blanton - ENTC 4337 - General Architecture 12
TI TMS 320 C 6000 DSP Architecture • Very long instruction word (VLIW) size of 256 bits • Eight 32 -bit functional units with single cycle throughput • One instruction cycle per clock cycle • Data word size is 32 bits • 16 (32 on C 64) 32 -bit registers in each of two data paths • 40 bits can be stored in adjacent even/odd registers • Two parallel data paths • Data unit - 32 -bit address calculations (modulo, linear) • Multiplier unit - 16 bit with 32 -bit result • Logical unit - 40 -bit (saturation) arithmetic & compares • Shifter unit - 32 -bit integer ALU and 40 -bit shifter Dr. Blanton - ENTC 4337 - General Architecture 13
TI TMS 320 C 6000 Instruction Set by Functional Unit. S Unit ADD NEG ADDK NOT ADD 2 OR AND SET B SHL CLR SHR EXT SSHL MV SUB MVC SUB 2 MVK XOR MVKH ZERO . L Unit ABS NOT ADD OR AND SADD CMPEQ SAT CMPGT SSUB CMPLT SUB LMBD SUBC MV XOR NEG ZERO NORM . D Unit ADD ST ADDA SUB LD SUBA MV ZERO NEG. M Unit MPY SMPY MPYH SMPYH Other NOP IDLE Six of the eight functional units can perform integer add, subtract, and move operations Dr. Blanton - ENTC 4337 - General Architecture 14
TI TMS 320 C 6000 Instruction Set Arithmetic ABS ADDA ADDK ADD 2 MPYH NEG SMPYH SADD SAT SSUB SUBA SUBC SUB 2 ZERO Dr. Blanton - ENTC 4337 - Data Management LD MV MVC MVKH ST Logical AND CMPEQ CMPGT CMPLT NOT OR SHL SHR SSHL XOR Bit Management CLR EXT LMBD NORM SET Program Control B IDLE NOP C 6000 Instruction Set by Category (un)signed multiplication saturation/packed arithmetic General Architecture 15
C 6000 vs. C 5000 Addressing Modes • Immediate • The operand is part of the instruction • Register • Operand is specified in a register TI C 5000 TI C 6000 ADD #0 FFh add. L 1 -13, A 1, A 6 (implied) add. L 1 A 7, A 6, A 7 ADD 010 h not supported ADD * ldw. L 1 *A 5++[8], A 1 • Direct • Address of operand is part of the instruction (added to imply memory page) • Indirect • Address of operand is stored in a register Dr. Blanton - ENTC 4337 - General Architecture 16
TI TMS 320 C 6000 DSP Architecture • Deep pipeline • 7 -11 stages in C 6200: fetch 4, decode 2, execute 1 -5 • 7 -16 stages in C 6700: fetch 4, decode 2, execute 1 -10 • Pentium IV has an estimated 20 pipeline stages • Avoid using branch instructions in code • Branch instruction in pipeline disables interrupts: latency of a branch is 5 cycles • Avoid branches by using conditional execution: every instruction can be conditionally executed • No hardware protection against pipeline hazards • Compiler and assembler must prevent pipeline hazards Dr. Blanton - ENTC 4337 - General Architecture 17
TI TMS 320 C 6700 Extensions C 6700 Floating Point Extensions by Unit. S Unit ABSDP CMPLTSP ABSSP RCPDP CMPEQDP RCPSP CMPEQSP RSARDP CMPGTDP RSQRSP CMPGTSP SPDP CMPLTDP. D Unit ADDAD LDDW . L Unit ADDDP INTSP ADDSP SPINT DPINT SPTRUNC DPSP SUBDP DPTRUNC SUBSP INTDP. M Unit MPYDP MPYID MPYI MPYSP Four functional units can perform IEEE single-precision (SP) and double-precision (DP) floating-point add, subtract, move. Operations beginning with R are reciprocal calculations. Dr. Blanton - ENTC 4337 - General Architecture 18
Digital Signal Processor Cores Application Specific Integrated Circuit • ASIC with • Programmable digital signal processor core • RAM • ROM • Standard cells • Codec • Peripherals • Gate array • Microcontroller core Dr. Blanton - ENTC 4337 - General Architecture 19
General Purpose Processors • Multimedia applications on PCs • Video, audio, graphics and animation • Repetitive parallel sequences of instructions • Single Instruction Multiple Data (SIMD) • One instruction acts on multiple data in parallel • Well-suited for graphics • Native signal processing extensions use SIMD • Sun Visual Instruction Set [1995] (Ultra. SPARC 1/2) • Intel MMX [1996] (Pentium I/II/IV) • Intel Streaming SIMD Extensions (Pentium III) Dr. Blanton - ENTC 4337 - General Architecture 20
DSP on General Purpose Processors (con’t) • Programming is considerably tougher • Ability of compilers to generate code for instruction set extensions may lag (e. g. four years for Pentium MMX) • Libraries of routines using native signal processing • Hand code in assembly for best performance • Single-instruction multiple-data (SIMD) approach • Pack/unpack data not aligned on SIMD word boundaries • Saturation arithmetic in MMX; not supported in VIS • Extended-precision accumulation in MMX; none in VIS • Application speedup for Intel MMX and Sun VIS • Signal and image processing: 1. 5: 1 to 2: 1 • Graphics: 4: 1 to 6: 1 (no packing/unpacking) Dr. Blanton - ENTC 4337 - General Architecture 21
Concluding Remarks • Conventional digital signal processors • High performance vs. power consumption/cost/volume • Excellent at one-dimensional processing • Per cycle: 1 16 MAC & 4 16 -bit RISC instructions • TMS 320 C 6000 VLIW DSP family • High performance vs. cost/volume • Excellent at multidimensional signal processing • Per cycle: 2 16 MACs & 4 32 -bit RISC instructions • Native signal processing • Available on desktop computers • Excels at graphics • Per cycle: 2 8 16 MACs OR 8 8 -bit RISC instructions • Use assembly for computational kernels and C for main program (control code, interrupt def. ) Dr. Blanton - ENTC 4337 - General Architecture 22
Concluding Remarks • Digital signal processor market • 40% annual growth rate 1990 -2000 fastest in semiconductor market • Revenue: $3. 5 B ‘ 98, $4. 4 B ‘ 99, $6. 1 B ‘ 00, $4. 5 B ‘ 01, $4. 9 B ‘ 02 • 2000: 44% TI, 23% Agere, 13% Motorola, 10% Analog Devices • 2001: 40% TI, 16% Agere, 12% Motorola, 8% Analog Devices • 2002: 43% TI, 14% Motorola, 14% Agere, 9% Analog Devices • Independent processor benchmarking by industry • Berkeley Design Technology Inc. http: //www. bdti. com • EDN Embedded Microprocessor Benchmark Consortium http: //www. eembc. org • Web resources Dr. Blanton - ENTC 4337 - General Architecture 23 • Newsgroup comp. dsp: FAQ
Concluding Remarks • Web resources • Newsgroup comp. dsp: FAQ http: //www. bdti. com/faq/dsp_faq. html • Embedded processors and systems: http: //www. eg 3. com • On-line courses: http: //www. techonline. com Dr. Blanton - ENTC 4337 - General Architecture 24
- Slides: 24