Finding Body Parts with Vector Processing Cynthia Bruyns
Finding Body Parts with Vector Processing Cynthia Bruyns Bryan Feldman CS 252
Introduction l l Take existing algorithm for tracking human motion, speed up by computing on the GPU. Demonstrate that many vision algorithms are prime candidates for using vector processing
Demo Results after false candidates have been removed
Vision Algorithms l l l Often computationally expensivesearching over many pixels for objects at many orientations and scales E. g. • [((1024 x 768)pix)x 3 colors]x[12 orientations]x[5 scales] Very often the case that highly parallizable
Limb Finding l l Goal – find candidate limbs Limbs look like long dark rectangles on light backgrounds or long light things on dark backgrounds
Algorithm specifics 1. Convolution with filter convolve using FFT • • * x Response indicates how much pixels go from low to high intensity Convolve over all three color channels so as to not miss red – blue of same intensity
Algorithm specifics 2. For every pixel location get respconv from “left” and “right”, put into new matrix resplimb x respconv x x x -respconv resplimb
Algorithm specifics 3. Find local maximums – for every pixel replace with max. of local neighbors. If resplimb=loc. Max it’s a max. 50. 25. 40. 23 . 75. 98. 98 resplimb. 75. 41. 98. 75 loc. Max. 75. 98. 98 . 11. 43. 15. 23 . 78. 98. 98 . 78. 34. 13. 15 . 78. 87. 23
GPU l l It’s a good choice because each operation is per pixel – SIMD-like Data stored in texture buffers equivalent to local cache Clean instruction set and developing interface language to exploit vector operations Justify your gaming habits
GPU dataflow model Textures l l Hardware supports several data types for bandwidth optimization, i. e. 32 bit floating point, half etc. Data passed to main memory stages via binding Framebuffer Fragment Processor Framebuffer Operations Vertex Processor Assembly & Rasterization Application
Fragment processor has high resource limits l l l 1024 instructions 512 constants or uniform parameters • Each constant counts as one instruction 16 texture units • Reuse as many times as desired • But, can do a lot with condition codes • Use texture reads instead No branching No indexed reads from registers No memory writes
The algorithm l l Draw invokes the fragment programs The texture becomes a data structure – use two for framebuffers to avoid RAW hazzards Image For each orientation to search Convolution Program FFT Fragment program Mask FFT Fragment program Cylinder Program Find Max Program
Results n Mask size fixed (22 x 13) vary image size *Additional GPU optimizations possible (CPU-2. 53 GHz P 4 GPU Nvidia FX 5900)
Results – log scale n Mask size fixed (22 x 13) vary image size 252. 1 sec 42. 7 sec *Additional GPU optimizations possible (CPU-2. 53 GHz P 4 GPU Nvidia FX 5900)
Results n Image size fixed (512 x 512) vary mask size Varying mask sizes allow for varying limb sizes on same image
Results
Comments l GPU and image processing are a good match l Time to move memory from CPU to GPU is cumbersome – but can be overcome Non-uniformity of installations, products, exact specifications are hearsay l
Acknowledgements l l l Kenneth Moreland Deva Ramanan Okan Arikan
- Slides: 18