Fourier Transforms Roger Hoang Overview Fourier Transforms Discrete

Overview Fourier Transforms Discrete Fourier Transforms Fast Fourier Transforms CUFFT NPOT FFT CPU/GPU Hybrid

Fourier Transforms Time/Space Domain Frequency Domain

Complexity N elements each summing N elements. . . naively, O(N^2).

Why bother? Example: 2 D FIR Filter Simple Approach: Nx. N Image Mx. M

Complexity Composed recursively of half-sized DFTs O(N log N)

Is it worth it yet? Example: 2 D FIR Filter Simple Approach: Nx. N

CUFFT NVIDIA's GPU FFT Library Part of the CUDA Toolkit Based on FFTW (Fastest

Fast Computation of General Fourier Transforms on GPUs

In summary. . . Direct 3 D No CUDA card required Supports non-power-of-two (NPOT)

An Efficient, Model-Based CPU-GPU Heterogeneous FFT Library

Summary Measured timings for every substep Modeled performance and picked minimum time 20% prediction

Slides: 22

Download presentation

Fourier Transforms Roger Hoang

Overview Fourier Transforms Discrete Fourier Transforms Fast Fourier Transforms CUFFT NPOT FFT CPU/GPU Hybrid FFT

Fourier Transforms

Fourier Transforms Time/Space Domain Frequency Domain

What are they used for?

Discrete Fourier Transforms

Complexity N elements each summing N elements. . . naively, O(N^2).

Why bother? Example: 2 D FIR Filter Simple Approach: Nx. N Image Mx. M Box Blur Kernel Computational Complexity: O(N^2 M^2) Reasonable for small M

Convolution

Convolution Theorem

Why bother? Example: 2 D FIR Filter Simple Approach: Nx. N Image Mx. M Box Blur Kernel Computational Complexity: O(N^2 M^2) Reasonable for small M DFT Based Approach: Nx. N Image => DFT costs O(N^3) Mx. M Box Blur Kernel => DFT costs O(M^3) Point-wise Multiply in Frequency Domain => O(max(N, M)^2) IDFT costs the same as DFT Computational Complexity: O(N^3 + M^3) Cubics are rather demotivational. . .

Fast Fourier Transforms (FFT)

Complexity Composed recursively of half-sized DFTs O(N log N)

Is it worth it yet? Example: 2 D FIR Filter Simple Approach: Nx. N Image Mx. M Box Blur Kernel Computational Complexity: O(N^2 M^2) Reasonable for small M FFT Based Approach: Nx. N Image => FFT costs O(N^2 log N) Mx. M Box Blur Kernel => FFT costs O(M^2 log M) Point-wise Multiply in Frequency Domain => O(max(N, M)^2) IFFT costs the same as FFT Computational Complexity: O(N^2 log N + M^2 log M) As long as K log N < M^2. . .

CUFFT NVIDIA's GPU FFT Library Part of the CUDA Toolkit Based on FFTW (Fastest Fourier Transform in the West) Given a size, direction, and data layout, generates a plan Reuse plan to perform multiple similar FFTs

Fast Computation of General Fourier Transforms on GPUs

In summary. . . Direct 3 D No CUDA card required Supports non-power-of-two (NPOT) sizes Handles large 1 D FFTs Packs real-valued FFTs (CUFFT does not do this)

Real-valued Input FFTs

An Efficient, Model-Based CPU-GPU Heterogeneous FFT Library

Summary Measured timings for every substep Modeled performance and picked minimum time 20% prediction time error at 16 x