Recursive Filtering on a Vector DSP with Linear

  • Slides: 19
Download presentation
Recursive Filtering on a Vector DSP with Linear Speedup Martijn v/d Horst M. G.

Recursive Filtering on a Vector DSP with Linear Speedup Martijn v/d Horst M. G. v. d. Horst@tue. nl 12/16/2021 Martijn v/d Horst, M. G. v. d. Horst@tue. nl TU/e Computer Science, System Architecture and Networking 1

Outline • • • 12/16/2021 Introduction Vector DSP Linear Speedup Recursive (IIR) Filters Implementation

Outline • • • 12/16/2021 Introduction Vector DSP Linear Speedup Recursive (IIR) Filters Implementation Generalization Improvement Conclusion Future Work Martijn v/d Horst, M. G. v. d. Horst@tue. nl TU/e Computer Science, System Architecture and Networking 2

Introduction • Moore’s Law: The processing power of a microchip doubles every 18 months.

Introduction • Moore’s Law: The processing power of a microchip doubles every 18 months. • Gilder’s Law: The total bandwidth of communication systems triples every 12 months. • Corollary: Without parallelism, our communication systems will run out of processing power. 12/16/2021 Martijn v/d Horst, M. G. v. d. Horst@tue. nl TU/e Computer Science, System Architecture and Networking 3

Vector DSP • SIMD processor with vector length P • Operations – Basic element-wise

Vector DSP • SIMD processor with vector length P • Operations – Basic element-wise operations – Strided Memory Access – Intra-add operation • One operation per clock cycle • Why? – Flexibility – Parallelism – Low cost 12/16/2021 Martijn v/d Horst, M. G. v. d. Horst@tue. nl TU/e Computer Science, System Architecture and Networking 4

Linear Speedup • If you pay twice the cost you get twice the performance

Linear Speedup • If you pay twice the cost you get twice the performance (No diminishing returns) • Measure of performance: Throughput (Outputs per clock cycle) • Measure of cost: vector size of the DSP • Approach: produce a number (depending on the vector size) of outputs in constant time. 12/16/2021 Martijn v/d Horst, M. G. v. d. Horst@tue. nl TU/e Computer Science, System Architecture and Networking 5

FIR Filters Input Output The output of an N-th order FIR filter is: the

FIR Filters Input Output The output of an N-th order FIR filter is: the weighted sum of the current input and N previous inputs. 12/16/2021 Martijn v/d Horst, M. G. v. d. Horst@tue. nl TU/e Computer Science, System Architecture and Networking 6

IIR Filters Input Output The output of an N-th order IIR filter is: the

IIR Filters Input Output The output of an N-th order IIR filter is: the weighted sum of the current input, N previous inputs and N previous outputs. 12/16/2021 Martijn v/d Horst, M. G. v. d. Horst@tue. nl TU/e Computer Science, System Architecture and Networking 7

Describing Filters • Transfer Function: • Difference Equation: • State space form: 12/16/2021 Martijn

Describing Filters • Transfer Function: • Difference Equation: • State space form: 12/16/2021 Martijn v/d Horst, M. G. v. d. Horst@tue. nl TU/e Computer Science, System Architecture and Networking 8

Block-State The state space form can be rewritten into block state space form: 12/16/2021

Block-State The state space form can be rewritten into block state space form: 12/16/2021 Martijn v/d Horst, M. G. v. d. Horst@tue. nl TU/e Computer Science, System Architecture and Networking 9

Block-State Architecture 12/16/2021 Martijn v/d Horst, M. G. v. d. Horst@tue. nl TU/e Computer

Block-State Architecture 12/16/2021 Martijn v/d Horst, M. G. v. d. Horst@tue. nl TU/e Computer Science, System Architecture and Networking 10

Block-State Architecture • State of the art (2004) in SIMD • A better VLSI

Block-State Architecture • State of the art (2004) in SIMD • A better VLSI implementation exists since 1987 12/16/2021 Martijn v/d Horst, M. G. v. d. Horst@tue. nl TU/e Computer Science, System Architecture and Networking 11

Incremental Block-State • Linear dependency between block size and hardware • Problem: How to

Incremental Block-State • Linear dependency between block size and hardware • Problem: How to map it onto SIMD? 12/16/2021 Martijn v/d Horst, M. G. v. d. Horst@tue. nl TU/e Computer Science, System Architecture and Networking 12

Incremental Block-State • Choose L = I P • Remove dependencies with pipelining •

Incremental Block-State • Choose L = I P • Remove dependencies with pipelining • Assign each stage to a SIMD slice 12/16/2021 Martijn v/d Horst, M. G. v. d. Horst@tue. nl TU/e Computer Science, System Architecture and Networking 13

Philips EVP 16 • • VLIW SIMD processor with vector length 16 Simulated strided

Philips EVP 16 • • VLIW SIMD processor with vector length 16 Simulated strided access We implemented a second order filter Speedup is based on a VLIW DSP Characteristic Choice for I I=1 12/16/2021 I=2 I=3 I=4 Clock cycles/block 13 24 36 111 Block size L 16 32 48 64 Throughput 1, 23 1, 33 0, 58 Speedup 6, 15 6, 67 2, 88 Martijn v/d Horst, M. G. v. d. Horst@tue. nl TU/e Computer Science, System Architecture and Networking 14

Generalization 12/16/2021 Martijn v/d Horst, M. G. v. d. Horst@tue. nl TU/e Computer Science,

Generalization 12/16/2021 Martijn v/d Horst, M. G. v. d. Horst@tue. nl TU/e Computer Science, System Architecture and Networking 15

Improvement • No intra-add operation • Achieved by applying our method to a MVM

Improvement • No intra-add operation • Achieved by applying our method to a MVM f 12/16/2021 f f Martijn v/d Horst, M. G. v. d. Horst@tue. nl TU/e Computer Science, System Architecture and Networking f 16

Conclusion • Recursive filtering on vector DSPs with linear speedup is possible, provided that

Conclusion • Recursive filtering on vector DSPs with linear speedup is possible, provided that the DSP supports strided memory access • This speedup is not bounded by the order of the filter • This speedup holds for any order filter • The method used can be applied to other cases as well 12/16/2021 Martijn v/d Horst, M. G. v. d. Horst@tue. nl TU/e Computer Science, System Architecture and Networking 17

Future Work • Implementation on Vector DSPs without strided memory access • Adaptive Filters

Future Work • Implementation on Vector DSPs without strided memory access • Adaptive Filters • Other signal processing algorithms 12/16/2021 Martijn v/d Horst, M. G. v. d. Horst@tue. nl TU/e Computer Science, System Architecture and Networking 18

Questions? 12/16/2021 Martijn v/d Horst, M. G. v. d. Horst@tue. nl TU/e Computer Science,

Questions? 12/16/2021 Martijn v/d Horst, M. G. v. d. Horst@tue. nl TU/e Computer Science, System Architecture and Networking 19