Fourier Transforms Roger Hoang Overview Fourier Transforms Discrete

  • Slides: 22
Download presentation
Fourier Transforms Roger Hoang

Fourier Transforms Roger Hoang

Overview Fourier Transforms Discrete Fourier Transforms Fast Fourier Transforms CUFFT NPOT FFT CPU/GPU Hybrid

Overview Fourier Transforms Discrete Fourier Transforms Fast Fourier Transforms CUFFT NPOT FFT CPU/GPU Hybrid FFT

Fourier Transforms

Fourier Transforms

Fourier Transforms Time/Space Domain Frequency Domain

Fourier Transforms Time/Space Domain Frequency Domain

What are they used for?

What are they used for?

What are they used for?

What are they used for?

What are they used for?

What are they used for?

Discrete Fourier Transforms

Discrete Fourier Transforms

Complexity N elements each summing N elements. . . naively, O(N^2).

Complexity N elements each summing N elements. . . naively, O(N^2).

Why bother? Example: 2 D FIR Filter Simple Approach: Nx. N Image Mx. M

Why bother? Example: 2 D FIR Filter Simple Approach: Nx. N Image Mx. M Box Blur Kernel Computational Complexity: O(N^2 M^2) Reasonable for small M

Convolution

Convolution

Convolution Theorem

Convolution Theorem

Why bother? Example: 2 D FIR Filter Simple Approach: Nx. N Image Mx. M

Why bother? Example: 2 D FIR Filter Simple Approach: Nx. N Image Mx. M Box Blur Kernel Computational Complexity: O(N^2 M^2) Reasonable for small M DFT Based Approach: Nx. N Image => DFT costs O(N^3) Mx. M Box Blur Kernel => DFT costs O(M^3) Point-wise Multiply in Frequency Domain => O(max(N, M)^2) IDFT costs the same as DFT Computational Complexity: O(N^3 + M^3) Cubics are rather demotivational. . .

Fast Fourier Transforms (FFT)

Fast Fourier Transforms (FFT)

Complexity Composed recursively of half-sized DFTs O(N log N)

Complexity Composed recursively of half-sized DFTs O(N log N)

Is it worth it yet? Example: 2 D FIR Filter Simple Approach: Nx. N

Is it worth it yet? Example: 2 D FIR Filter Simple Approach: Nx. N Image Mx. M Box Blur Kernel Computational Complexity: O(N^2 M^2) Reasonable for small M FFT Based Approach: Nx. N Image => FFT costs O(N^2 log N) Mx. M Box Blur Kernel => FFT costs O(M^2 log M) Point-wise Multiply in Frequency Domain => O(max(N, M)^2) IFFT costs the same as FFT Computational Complexity: O(N^2 log N + M^2 log M) As long as K log N < M^2. . .

CUFFT NVIDIA's GPU FFT Library Part of the CUDA Toolkit Based on FFTW (Fastest

CUFFT NVIDIA's GPU FFT Library Part of the CUDA Toolkit Based on FFTW (Fastest Fourier Transform in the West) Given a size, direction, and data layout, generates a plan Reuse plan to perform multiple similar FFTs

Fast Computation of General Fourier Transforms on GPUs

Fast Computation of General Fourier Transforms on GPUs

In summary. . . Direct 3 D No CUDA card required Supports non-power-of-two (NPOT)

In summary. . . Direct 3 D No CUDA card required Supports non-power-of-two (NPOT) sizes Handles large 1 D FFTs Packs real-valued FFTs (CUFFT does not do this)

Real-valued Input FFTs

Real-valued Input FFTs

An Efficient, Model-Based CPU-GPU Heterogeneous FFT Library

An Efficient, Model-Based CPU-GPU Heterogeneous FFT Library

Summary Measured timings for every substep Modeled performance and picked minimum time 20% prediction

Summary Measured timings for every substep Modeled performance and picked minimum time 20% prediction time error at 16 x