Photon Mapping on Programmable Graphics Hardware Timothy J
Photon Mapping on Programmable Graphics Hardware Timothy J. Purcell Mike Cammarano Pat Hanrahan Stanford University Craig Donner Henrik Wann Jensen University of California, San Diego
Motivation
Motivation • Interactive global illumination on the GPU • Nearly have sufficient compute power and flexibility • Explore GPU-based computation algorithms
Related Work • CPU-based interactive global illumination • Supercomputers [Parker et al. ] • Clusters [Tole et al. , Wald et al. ] • Global illumination on programmable GPUs • Ray tracing [Carr et al. , Purcell et al. ] • Photon mapping [Ma et al. ] • Radiosity [Carr et al. , Coombe et al. ] • Translucency [Carr et al. , Stamminger et al. ]
Photon Mapping Algorithm Review • Photon tracing • Emission, scattering, storing into kd-tree • Similar to ray tracing • Rendering • Ray tracing for direct illumination • Photon map visualization • Indirect bounce
Computational Challenge for GPUs #1 • Constructing a irregular or sparse data structure
Computational Challenge for GPUs #2 • Adaptive nearest neighbor search • Noise vs. blur
Computational Challenge for GPUs #2 • Adaptive nearest neighbor search • Noise vs. blur
Photon Mapping on the CPU • Balanced kd-tree • Compact storage of photons • Efficient • O(log n) search • Priority queue • Nearest neighbor search • Incremental insertion and removal of photons
Algorithmic Changes for the GPU • Direct visualization of photon map • Keeps rendering costs low • Use grid instead of kd-tree • Tried kd-tree… • Kd-tree construction is difficult • Radiance estimate – Fixed radius search works fine – Adaptive search needs priority queue • No priority queue • Can’t build on GPU • Too much state
Contributions • Mapped complete grid-based photon mapping algorithm onto the GPU • Including photon tracing, ray tracing, etc. • Implemented an adaptive k-nearest neighbor search • k. NN-grid • Show to construct a sparse data structure on the GPU • Bitonic merge sort with binary search • Stencil routing
Configuring the GPU for Computing • GPU as data parallel compute engine • Fragment programs execute compute kernels • Screen sized quad initializes computation • SIMD execution • Floating point texture memory • Render-to-texture for intermediate results • Data structure storage • Pointer dereferencing via dependent fetches
Computational Challenge #1 Building a Sparse Data Structure
Building a Sparse Data Structure • Requires scatter • Dependent texture write • Why don’t we have fragment scatter? • Fragment processing has highly coherent blocked memory writes • Extra hardware support would be needed • Write hazards • Memory latencies
Scatter on the GPU • Sort photons into grid cells • Grid cell is sort key • Simulate scatter with fragment programs • Bitonic merge sort followed by binary search • Compact grid • O(log 2 n) rendering passes
Bitonic Merge Sort 3 3 3 2 1 7 7 4 4 4 1 2 4 8 8 7 2 3 3 8 4 7 8 1 4 4 6 2 5 6 6 6 5 2 6 6 5 5 5 6 1 5 2 2 7 7 7 5 1 1 1 8 8 8 O(log 2 n) rendering passes
Binary Search • Grid cell searches for self in photon list • If none, find first element in next cell • Empty grid cells waste compute • Log(n) + 1 steps
Binary Search • Grid cell searches for self in photon list • If none, find first element in next cell • Empty grid cells waste compute • Log(n) + 1 steps Sorted Photon List Searching for first v 5 photon v 0 v 0 v 2 v 2 v 5 initialize
Binary Search • Grid cell searches for self in photon list • If none, find first element in next cell • Empty grid cells waste compute • Log(n) + 1 steps Sorted Photon List Searching for first v 5 photon v 0 v 0 v 2 v 2 v 5 initialize v 0 v 0 v 2 v 2 v 5 step 1
Binary Search • Grid cell searches for self in photon list • If none, find first element in next cell • Empty grid cells waste compute • Log(n) + 1 steps Sorted Photon List Searching for first v 5 photon v 0 v 0 v 2 v 2 v 5 initialize v 0 v 0 v 2 v 2 v 5 step 1 v 0 v 0 v 2 v 2 v 5 step 2
Binary Search • Grid cell searches for self in photon list • If none, find first element in next cell • Empty grid cells waste compute • Log(n) + 1 steps Sorted Photon List Searching for first v 5 photon v 0 v 0 v 2 v 2 v 5 initialize v 0 v 0 v 2 v 2 v 5 step 1 v 0 v 0 v 2 v 2 v 5 step 2 v 0 v 0 v 2 v 2 v 5 step 3
Binary Search • Grid cell searches for self in photon list • If none, find first element in next cell • Empty grid cells waste compute • Log(n) + 1 steps Sorted Photon List Searching for first v 5 photon v 0 v 0 v 2 v 2 v 5 initialize v 0 v 0 v 2 v 2 v 5 step 1 v 0 v 0 v 2 v 2 v 5 step 2 v 0 v 0 v 2 v 2 v 5 step 3 v 0 v 0 v 2 v 2 v 5 step 4
Scatter on the GPU • Vertex programs can scatter • Draw point to buffer • Collisions?
Scatter on the GPU • Vertex programs can scatter • Draw point to buffer • Collisions? • Stencil routing • Limit photon count per grid cell – Pre-allocate grid cell space • Draw photons as points – Vertex program computes grid cell • Stencil buffer controls location within cell • Single rendering pass
Stencil Routing Vertex ( photon_pos ) • Fix each grid cell size • Vertex Program to n 2 pixels Draw fat points to cover each fat cell • gl. Point. Size(n) 4 pixels Flattened Grid
Stencil Routing Vertex ( photon_pos ) Vertex Program • Control location written to with stencil • Pass when stencil is n 2 -1 • Stencil always increments • Location written depends on draw order 4 pixels Stencil 1 pixel Flattened Grid Stencil Values 2 0 3 1 4 2 2 0 3 1
Computational Challenge #2 Adaptive Nearest Neighbor Search
Adaptive Nearest Neighbor Search • Iterative algorithm • Accept or reject photons in cell visit order
k. NN-grid Algorithm sample point candidate photons in estimate Want a 4 photon estimate
k. NN-grid Algorithm • Candidate photons • sample point candidate photons in estimate Want a 4 photon estimate must be within max search radius Visit voxels in order of distance to sample point
k. NN-grid Algorithm • If current number of photons in estimate is less than number requested, grow search radius 1 sample point candidate photons in estimate Want a 4 photon estimate
k. NN-grid Algorithm • If current number of photons in estimate is less than number requested, grow search radius 2 sample point candidate photons in estimate Want a 4 photon estimate
k. NN-grid Algorithm • Don’t add photons • 2 sample point candidate photons in estimate Want a 4 photon estimate outside maximum search radius Don’t grow search radius when photon is outside maximum radius
k. NN-grid Algorithm • Add photons within search radius 3 sample point candidate photons in estimate Want a 4 photon estimate
k. NN-grid Algorithm • Add photons within search radius 4 sample point candidate photons in estimate Want a 4 photon estimate
k. NN-grid Algorithm • Don’t expand search radius if enough photons already found 4 sample point candidate photons in estimate Want a 4 photon estimate
k. NN-grid Algorithm • Add photons within search radius 5 sample point candidate photons in estimate Want a 4 photon estimate
k. NN-grid Algorithm • Visit all other voxels • 6 sample point candidate photons in estimate Want a 4 photon estimate accessible within determined search radius Add photons within search radius
k. NN-grid Algorithm • Finds all photons within • 6 sample point candidate photons in estimate Want a 4 photon estimate a sphere centered about sample point May locate more than requested k-nearest neighbors
System Implementation • NVIDIA Ge. Force FX 5900 Ultra (NV 35) • Cg compiler 1. 1 Compute Lighting Trace Photons Build Photon Map Render Image Ray Trace Scene Compute Radiance Estimate
Demos
Glass Ball – Bitonic Sort 18 s @ 512 x 384, 5 K photons
Glass Ball – Stencil Routing 11 s @ 512 x 384, 5 K photons
Ring – Bitonic Sort 9 s @ 512 x 384, 16 K photons
Ring – Stencil Routing 8 s @ 512 x 384, 16 K photons
Cornell Box – Bitonic Sort 64 s @ 512 x 512, 65 K photons
Cornell Box – Stencil Routing 47 s @ 512 x 512, 65 K photons
Cornell Box – Increased Search Radius
Open Issues (1) • How to prevent program execution over a subset of pixels? • Non-uniform pixel computation distribution • Radiance estimate • KILL is only a write mask • Early-z occlusion culling • No pixel level control • Compute mask, branching, or stream buffer? • Improve radiance estimate speed by 30 -70% over tiling
Open Issues (2) • Scatter • Makes (a programmer’s) life easier • Is it worth implementing? • Gain factor of log 2 n avoiding sort
Future Work • Kd-trees • Photon power redistribution • Adaptive sampling • Progressive refinement
Conclusions • The GPU can compute an entire global illumination • • • solution • Nearly interactive Implemented an adaptive k-nearest neighbor query for the GPU • k. NN-grid Shown how to construct sparse data structures on the GPU • Bitonic merge sort and binary search • Stencil routing Sorting and searching algorithms applicable to other computations
Acknowledgments • Stanford Flash. G • Ian Buck, Mike Houston, Kekoa Proudfoot • Stencil routing • Kurt Akeley, Matt Papakipos • Hardware and drivers • David Kirk, Nick Triantos • Funding • NVIDIA, DARPA, NSF, 3 Com
- Slides: 53