Photon Mapping on Programmable Graphics Hardware Timothy J

Photon Mapping on Programmable Graphics Hardware Timothy J. Purcell Mike Cammarano Pat Hanrahan Stanford University Craig Donner Henrik Wann Jensen University of California, San Diego

Motivation

Motivation • Interactive global illumination on the GPU • Nearly have sufficient compute power and flexibility • Explore GPU-based computation algorithms
![Related Work • CPU-based interactive global illumination • Supercomputers [Parker et al. ] • Related Work • CPU-based interactive global illumination • Supercomputers [Parker et al. ] •](http://slidetodoc.com/presentation_image_h2/e46acfc04b9fd3c299996b9c4beeaabb/image-4.jpg)
Related Work • CPU-based interactive global illumination • Supercomputers [Parker et al. ] • Clusters [Tole et al. , Wald et al. ] • Global illumination on programmable GPUs • Ray tracing [Carr et al. , Purcell et al. ] • Photon mapping [Ma et al. ] • Radiosity [Carr et al. , Coombe et al. ] • Translucency [Carr et al. , Stamminger et al. ]

Photon Mapping Algorithm Review • Photon tracing • Emission, scattering, storing into kd-tree • Similar to ray tracing • Rendering • Ray tracing for direct illumination • Photon map visualization • Indirect bounce

Computational Challenge for GPUs #1 • Constructing a irregular or sparse data structure

Computational Challenge for GPUs #2 • Adaptive nearest neighbor search • Noise vs. blur

Computational Challenge for GPUs #2 • Adaptive nearest neighbor search • Noise vs. blur

Photon Mapping on the CPU • Balanced kd-tree • Compact storage of photons • Efficient • O(log n) search • Priority queue • Nearest neighbor search • Incremental insertion and removal of photons

Algorithmic Changes for the GPU • Direct visualization of photon map • Keeps rendering costs low • Use grid instead of kd-tree • Tried kd-tree… • Kd-tree construction is difficult • Radiance estimate – Fixed radius search works fine – Adaptive search needs priority queue • No priority queue • Can’t build on GPU • Too much state

Contributions • Mapped complete grid-based photon mapping algorithm onto the GPU • Including photon tracing, ray tracing, etc. • Implemented an adaptive k-nearest neighbor search • k. NN-grid • Show to construct a sparse data structure on the GPU • Bitonic merge sort with binary search • Stencil routing

Configuring the GPU for Computing • GPU as data parallel compute engine • Fragment programs execute compute kernels • Screen sized quad initializes computation • SIMD execution • Floating point texture memory • Render-to-texture for intermediate results • Data structure storage • Pointer dereferencing via dependent fetches

Computational Challenge #1 Building a Sparse Data Structure

Building a Sparse Data Structure • Requires scatter • Dependent texture write • Why don’t we have fragment scatter? • Fragment processing has highly coherent blocked memory writes • Extra hardware support would be needed • Write hazards • Memory latencies

Scatter on the GPU • Sort photons into grid cells • Grid cell is sort key • Simulate scatter with fragment programs • Bitonic merge sort followed by binary search • Compact grid • O(log 2 n) rendering passes

Bitonic Merge Sort 3 3 3 2 1 7 7 4 4 4 1 2 4 8 8 7 2 3 3 8 4 7 8 1 4 4 6 2 5 6 6 6 5 2 6 6 5 5 5 6 1 5 2 2 7 7 7 5 1 1 1 8 8 8 O(log 2 n) rendering passes

Binary Search • Grid cell searches for self in photon list • If none, find first element in next cell • Empty grid cells waste compute • Log(n) + 1 steps

Binary Search • Grid cell searches for self in photon list • If none, find first element in next cell • Empty grid cells waste compute • Log(n) + 1 steps Sorted Photon List Searching for first v 5 photon v 0 v 0 v 2 v 2 v 5 initialize

Binary Search • Grid cell searches for self in photon list • If none, find first element in next cell • Empty grid cells waste compute • Log(n) + 1 steps Sorted Photon List Searching for first v 5 photon v 0 v 0 v 2 v 2 v 5 initialize v 0 v 0 v 2 v 2 v 5 step 1

Binary Search • Grid cell searches for self in photon list • If none, find first element in next cell • Empty grid cells waste compute • Log(n) + 1 steps Sorted Photon List Searching for first v 5 photon v 0 v 0 v 2 v 2 v 5 initialize v 0 v 0 v 2 v 2 v 5 step 1 v 0 v 0 v 2 v 2 v 5 step 2

Binary Search • Grid cell searches for self in photon list • If none, find first element in next cell • Empty grid cells waste compute • Log(n) + 1 steps Sorted Photon List Searching for first v 5 photon v 0 v 0 v 2 v 2 v 5 initialize v 0 v 0 v 2 v 2 v 5 step 1 v 0 v 0 v 2 v 2 v 5 step 2 v 0 v 0 v 2 v 2 v 5 step 3

Binary Search • Grid cell searches for self in photon list • If none, find first element in next cell • Empty grid cells waste compute • Log(n) + 1 steps Sorted Photon List Searching for first v 5 photon v 0 v 0 v 2 v 2 v 5 initialize v 0 v 0 v 2 v 2 v 5 step 1 v 0 v 0 v 2 v 2 v 5 step 2 v 0 v 0 v 2 v 2 v 5 step 3 v 0 v 0 v 2 v 2 v 5 step 4

Scatter on the GPU • Vertex programs can scatter • Draw point to buffer • Collisions?

Scatter on the GPU • Vertex programs can scatter • Draw point to buffer • Collisions? • Stencil routing • Limit photon count per grid cell – Pre-allocate grid cell space • Draw photons as points – Vertex program computes grid cell • Stencil buffer controls location within cell • Single rendering pass

Stencil Routing Vertex ( photon_pos ) • Fix each grid cell size • Vertex Program to n 2 pixels Draw fat points to cover each fat cell • gl. Point. Size(n) 4 pixels Flattened Grid

Stencil Routing Vertex ( photon_pos ) Vertex Program • Control location written to with stencil • Pass when stencil is n 2 -1 • Stencil always increments • Location written depends on draw order 4 pixels Stencil 1 pixel Flattened Grid Stencil Values 2 0 3 1 4 2 2 0 3 1

Computational Challenge #2 Adaptive Nearest Neighbor Search

Adaptive Nearest Neighbor Search • Iterative algorithm • Accept or reject photons in cell visit order

k. NN-grid Algorithm sample point candidate photons in estimate Want a 4 photon estimate

k. NN-grid Algorithm • Candidate photons • sample point candidate photons in estimate Want a 4 photon estimate must be within max search radius Visit voxels in order of distance to sample point

k. NN-grid Algorithm • If current number of photons in estimate is less than number requested, grow search radius 1 sample point candidate photons in estimate Want a 4 photon estimate

k. NN-grid Algorithm • If current number of photons in estimate is less than number requested, grow search radius 2 sample point candidate photons in estimate Want a 4 photon estimate

k. NN-grid Algorithm • Don’t add photons • 2 sample point candidate photons in estimate Want a 4 photon estimate outside maximum search radius Don’t grow search radius when photon is outside maximum radius

k. NN-grid Algorithm • Add photons within search radius 3 sample point candidate photons in estimate Want a 4 photon estimate

k. NN-grid Algorithm • Add photons within search radius 4 sample point candidate photons in estimate Want a 4 photon estimate

k. NN-grid Algorithm • Don’t expand search radius if enough photons already found 4 sample point candidate photons in estimate Want a 4 photon estimate

k. NN-grid Algorithm • Add photons within search radius 5 sample point candidate photons in estimate Want a 4 photon estimate

k. NN-grid Algorithm • Visit all other voxels • 6 sample point candidate photons in estimate Want a 4 photon estimate accessible within determined search radius Add photons within search radius

k. NN-grid Algorithm • Finds all photons within • 6 sample point candidate photons in estimate Want a 4 photon estimate a sphere centered about sample point May locate more than requested k-nearest neighbors

System Implementation • NVIDIA Ge. Force FX 5900 Ultra (NV 35) • Cg compiler 1. 1 Compute Lighting Trace Photons Build Photon Map Render Image Ray Trace Scene Compute Radiance Estimate

Demos

Glass Ball – Bitonic Sort 18 s @ 512 x 384, 5 K photons

Glass Ball – Stencil Routing 11 s @ 512 x 384, 5 K photons

Ring – Bitonic Sort 9 s @ 512 x 384, 16 K photons

Ring – Stencil Routing 8 s @ 512 x 384, 16 K photons

Cornell Box – Bitonic Sort 64 s @ 512 x 512, 65 K photons

Cornell Box – Stencil Routing 47 s @ 512 x 512, 65 K photons

Cornell Box – Increased Search Radius

Open Issues (1) • How to prevent program execution over a subset of pixels? • Non-uniform pixel computation distribution • Radiance estimate • KILL is only a write mask • Early-z occlusion culling • No pixel level control • Compute mask, branching, or stream buffer? • Improve radiance estimate speed by 30 -70% over tiling

Open Issues (2) • Scatter • Makes (a programmer’s) life easier • Is it worth implementing? • Gain factor of log 2 n avoiding sort

Future Work • Kd-trees • Photon power redistribution • Adaptive sampling • Progressive refinement

Conclusions • The GPU can compute an entire global illumination • • • solution • Nearly interactive Implemented an adaptive k-nearest neighbor query for the GPU • k. NN-grid Shown how to construct sparse data structures on the GPU • Bitonic merge sort and binary search • Stencil routing Sorting and searching algorithms applicable to other computations

Acknowledgments • Stanford Flash. G • Ian Buck, Mike Houston, Kekoa Proudfoot • Stencil routing • Kurt Akeley, Matt Papakipos • Hardware and drivers • David Kirk, Nick Triantos • Funding • NVIDIA, DARPA, NSF, 3 Com
- Slides: 53