Fourier Transforms Roger Hoang Overview Fourier Transforms Discrete






















- Slides: 22
Fourier Transforms Roger Hoang
Overview Fourier Transforms Discrete Fourier Transforms Fast Fourier Transforms CUFFT NPOT FFT CPU/GPU Hybrid FFT
Fourier Transforms
Fourier Transforms Time/Space Domain Frequency Domain
What are they used for?
What are they used for?
What are they used for?
Discrete Fourier Transforms
Complexity N elements each summing N elements. . . naively, O(N^2).
Why bother? Example: 2 D FIR Filter Simple Approach: Nx. N Image Mx. M Box Blur Kernel Computational Complexity: O(N^2 M^2) Reasonable for small M
Convolution
Convolution Theorem
Why bother? Example: 2 D FIR Filter Simple Approach: Nx. N Image Mx. M Box Blur Kernel Computational Complexity: O(N^2 M^2) Reasonable for small M DFT Based Approach: Nx. N Image => DFT costs O(N^3) Mx. M Box Blur Kernel => DFT costs O(M^3) Point-wise Multiply in Frequency Domain => O(max(N, M)^2) IDFT costs the same as DFT Computational Complexity: O(N^3 + M^3) Cubics are rather demotivational. . .
Fast Fourier Transforms (FFT)
Complexity Composed recursively of half-sized DFTs O(N log N)
Is it worth it yet? Example: 2 D FIR Filter Simple Approach: Nx. N Image Mx. M Box Blur Kernel Computational Complexity: O(N^2 M^2) Reasonable for small M FFT Based Approach: Nx. N Image => FFT costs O(N^2 log N) Mx. M Box Blur Kernel => FFT costs O(M^2 log M) Point-wise Multiply in Frequency Domain => O(max(N, M)^2) IFFT costs the same as FFT Computational Complexity: O(N^2 log N + M^2 log M) As long as K log N < M^2. . .
CUFFT NVIDIA's GPU FFT Library Part of the CUDA Toolkit Based on FFTW (Fastest Fourier Transform in the West) Given a size, direction, and data layout, generates a plan Reuse plan to perform multiple similar FFTs
Fast Computation of General Fourier Transforms on GPUs
In summary. . . Direct 3 D No CUDA card required Supports non-power-of-two (NPOT) sizes Handles large 1 D FFTs Packs real-valued FFTs (CUFFT does not do this)
Real-valued Input FFTs
An Efficient, Model-Based CPU-GPU Heterogeneous FFT Library
Summary Measured timings for every substep Modeled performance and picked minimum time 20% prediction time error at 16 x