Radar Pulse Compression Using the NVIDIA CUDA SDK

  • Slides: 7
Download presentation
Radar Pulse Compression Using the NVIDIA CUDA SDK Stephen Bash, David Carpman, and David

Radar Pulse Compression Using the NVIDIA CUDA SDK Stephen Bash, David Carpman, and David Holl HPEC 2008 September 23 -25, 2008 MIT Lincoln Laboratory 0839 -118 -1 This work is sponsored by the Air Force Research Laboratory under Air Force contract FA 8721 -05 -C-0002. Opinions, interpretations, conclusions and recommendations are those of the author and not necessarily endorsed by the United States Government.

NVIDIA Compute Unified Device Architecture SDK • • Create custom kernels that run on

NVIDIA Compute Unified Device Architecture SDK • • Create custom kernels that run on GPU Extension of C language Provides driver- and runtime-level APIs Includes numerical libraries – CUFFT – CUBLAS • $/GFLOP GPU=$1. 27 CPU=$29. 18 HPEC 07: MIT Lincoln Laboratory 0839 -118 -2

NVIDIA Compute Unified Device Architecture SDK • • Create custom kernels that run on

NVIDIA Compute Unified Device Architecture SDK • • Create custom kernels that run on GPU Extension of C language Provides driver- and runtime-level APIs Includes numerical libraries – CUFFT – CUBLAS • $/GFLOP GPU=$1. 27 CPU=$29. 18 HPEC 07: MIT Lincoln Laboratory 0839 -118 -3

Radar Pulse Compression • • Waveform design and processing to achieve higher range resolution

Radar Pulse Compression • • Waveform design and processing to achieve higher range resolution and sensitivity* Fast Time FFT Processing consists of convolution with FIR filter – Doppler tolerant (top): traditional frequency domain convolution – Doppler intolerant (bottom): additional FFT and Doppler correction required Replica Slow Time FFT Fast Time IFFT Doppler Correction Replica Fast Time FFT Fast Time IFFT MIT Lincoln Laboratory 0839 -118 -4 * Skolnik, Radar Handbook, Second Edition. Mc. Graw Hill Publishing, Boston, MA, 1990.

GPU vs. CPU Comparison CPU vs GPU comparison in real-world conditions – – 2

GPU vs. CPU Comparison CPU vs GPU comparison in real-world conditions – – 2 GHz dual quad-core AMD Opterons vs e. VGA e. Ge. Force 8800 Ultra Memory transfer to and from GPU included in timing 1 D FFT 3 490 2 480 1 0. 5 0. 3 4 16 64 Batch Size 256 Stage Time (ms) 27000 10368 4725 1960 1000 65536 32768 16384 8192 4096 2048 1024 1 Processing Time Per Stage 500 GPU Speedup FFT Size • CPU GPU 60 50 40 30 20 10 0 M F F D M M D E E E pp opp ast T ultip ast T xtra ulti ast T xtra ler ler c im ct R ply i im ly R im ct R ply Wi FF Co e F R R me t R e e nd T rre FT eplic IFFTange epli IFFTange ow cti a 1 Re ca 2 Re Re ca 3 on gio gio n n n Do MIT Lincoln Laboratory 0839 -118 -5

Backups MIT Lincoln Laboratory 0839 -118 -6

Backups MIT Lincoln Laboratory 0839 -118 -6

Reference: $/GFLOP As of July 2007, these products represent the top of the line

Reference: $/GFLOP As of July 2007, these products represent the top of the line consumer CPU and graphics card according to floating point computational power: 1. Kentsfield Core 2 Extreme QX 6800 37. 7 GFLOPS – fastest CPU as of 7/16/2007 http: //www. tomshardware. com/2007/07/16/cpu_charts_2007/page 36. html $1100 – price as of March 10, 2008 http: //www. google. com/products? q=Kentsfield+Core+2+Extreme+QX 6800 $/GFLOPS = $29. 18 Notes: Price excludes motherboard + power supply + memory + GPU 2. EVGA Ge. Force 8800 Ultra Superclocked (NVIDIA) 576 GFLOPS – theoretical peak http: //en. wikipedia. org/wiki/Ge. Force_8_Series $730 – price as of March 10, 2008 http: //www. google. com/products? q=768 -P 2 -N 887 -AR&scoring=p $/GFLOPS = $1. 27 Notes: Price includes 768 MB GDDR 3 memory, but excludes: motherboard + power supply + CPU MIT Lincoln Laboratory 0839 -118 -7