Conservative Ray Batching using Geometry Proxies Mathijs Molenaar
Conservative Ray Batching using Geometry Proxies Mathijs Molenaar and Elmar Eisemann Delft University of Technology
Path Tracing 12/14/2021 2
Path Tracing 12/14/2021 3
Path Tracing 12/14/2021 4
Simple Path Tracer § Generate one camera ray § Loop • Intersect that camera / bounce ray • Shade the ray hit § Disadvantages • Hard to properly utilize SIMD instructions / GPUs • No control over memory access patterns 12/14/2021 5
Random Tree Traversal Cache fits 8 nodes Green in cache Gray not in cache Red evicted 12/14/2021 6
Random Tree Traversal Cache fits 8 nodes Green in cache Gray not in cache Red evicted 12/14/2021 7
Random Tree Traversal Cache fits 8 nodes Green in cache Gray not in cache Red evicted 12/14/2021 8
Random Tree Traversal Cache fits 8 nodes Green in cache Gray not in cache Red evicted 12/14/2021 9
Random Tree Traversal Cache fits 8 nodes Green in cache Gray not in cache Red evicted 12/14/2021 10
Random Tree Traversal Cache fits 8 nodes Green in cache Gray not in cache Red evicted 12/14/2021 11
Random Tree Traversal Cache fits 8 nodes Green in cache Gray not in cache Red evicted 12/14/2021 12
Random Tree Traversal Cache fits 8 nodes Green in cache Gray not in cache Red evicted 12/14/2021 13
Random Tree Traversal Cache fits 8 nodes Green in cache Gray not in cache Red evicted 12/14/2021 14
Random Tree Traversal Cache fits 8 nodes Green in cache Gray not in cache Red evicted 12/14/2021 15
Random Tree Traversal (2 nd ray) Cache fits 8 nodes Green in cache Gray not in cache Red evicted 12/14/2021 16
Random Tree Traversal (2 nd ray) Cache fits 8 nodes Green in cache Gray not in cache Red evicted 12/14/2021 17
Random Tree Traversal (2 nd ray) Cache fits 8 nodes Green in cache Gray not in cache Red evicted 12/14/2021 18
Random Tree Traversal (2 nd ray) Cache fits 8 nodes Green in cache Gray not in cache Red evicted 12/14/2021 19
Random Tree Traversal (2 nd ray) Cache fits 8 nodes Green in cache Gray not in cache Red evicted 12/14/2021 20
Random Tree Traversal (2 nd ray) Cache fits 8 nodes Green in cache Gray not in cache Red evicted 12/14/2021 21
Random Tree Traversal (2 nd ray) Cache fits 8 nodes Green in cache Gray not in cache Red evicted 12/14/2021 22
Random Tree Traversal (2 nd ray) Cache fits 8 nodes Green in cache Gray not in cache Red evicted 12/14/2021 23
Random Tree Traversal (2 nd ray) Cache fits 8 nodes Green in cache Gray not in cache Red evicted 12/14/2021 24
Random Tree Traversal (2 nd ray) Cache fits 8 nodes Green in cache Gray not in cache Red evicted 12/14/2021 25
Random Tree Traversal (2 nd ray) Cache fits 8 nodes Green in cache Gray not in cache Red evicted 12/14/2021 26
Random Tree Traversal (2 nd ray) Cache fits 8 nodes Green in cache Gray not in cache Red evicted 12/14/2021 27
Random Tree Traversal (3 rd ray) Cache fits 8 nodes Green in cache Gray not in cache Red evicted 12/14/2021 28
Random Tree Traversal (3 rd ray) Cache fits 8 nodes Green in cache Gray not in cache Red evicted 12/14/2021 29
Random Tree Traversal (3 rd ray) Cache fits 8 nodes Green in cache Gray not in cache Red evicted 12/14/2021 30
Random Tree Traversal (3 rd ray) Cache fits 8 nodes Green in cache Gray not in cache Red evicted 12/14/2021 31
Random Tree Traversal (3 rd ray) Cache fits 8 nodes Green in cache Gray not in cache Red evicted 12/14/2021 32
Out-of-core rendering § Not all data fits into system memory • A cache miss means a disk access § DDR 4 RAM vs NVME SSD • Latency: • Throughput: 10 ns vs 100, 000 ns 60 GB/s vs 3 GB/s § Goal: minimize cache misses 12/14/2021 33
Wavefront Path Tracer § Generate all camera rays § Loop • Intersect all rays • Shade all ray hits § Process large arrays of rays / surface hits • Maps well to SIMD instructions / GPUs § Provides opportunities to make memory accesses more coherent 12/14/2021 34
Ray Batching 12/14/2021 35
Ray Batching § Pharr, Matt, et al. "Rendering complex scenes with memory-coherent ray tracing. " Proceedings of the 24 th annual conference on Computer graphics and interactive techniques. 1997. § Two level hierarchy • Top-level: • Bottom-level: Always in memory Loaded on demand § Batch of rays at each top-level leaf • Store all rays passing through that region of space 36 12/14/2021 36
Traversal § For each ray • Insert ray in batch at first top-level leaf node along ray 37 12/14/2021 37
Traversal § For each ray • Insert ray in batch at first top-level leaf node along ray § While there are rays in the system • Select a top-level leaf node 38 12/14/2021 38
Traversal § For each ray • Insert ray in batch at first top-level leaf node along ray § While there are rays in the system • Select a top-level leaf node • Load geometry from disk and build acceleration structure • Intersect all rays in batch 39 12/14/2021 39
* Depends on acceleration structure; BVH traversal does not guarantee perfect front-to-back traversal ordering. Traversal § For each ray • Insert ray in batch at first top-level leaf node along ray § While there are rays in the system • Select a top-level leaf node • Load geometry from disk and build acceleration structure • Intersect all rays in batch • Perform shading & forward non intersecting rays* 40 12/14/2021 40
Ray Batching Analysed § Improved coherency • Group rays traveling through the same region of space § Ray is batched when. . . • Top-level leaf node is reached • In our case: if a ray passes hits the nodes Axis Aligned Bounding Box (AABB) § Storing a ray in a batch unnecessarily slows down traversal • Increased number of disk loads & bottom-level traversals 12/14/2021 41
Inefficiency Example § The red ray intersects volume of: • Node 1, then… • Node 3 § It intersects geometry in node 3 1 2 3 4 § System memory can fit only one node at a time 42 12/14/2021 42
Inefficiency Example § The red ray intersects volume of: • Node 1, then… • Node 3 § It intersects geometry in node 3 1 2 3 4 § System memory can fit only one node at a time 43 12/14/2021 43
Inefficiency Example § The red ray intersects volume of: • Node 1, then… • Node 3 § It intersects geometry in node 3 1 2 3 4 § System memory can fit only one node at a time 44 12/14/2021 44
Inefficiency Example § The red ray intersects volume of: • Node 1, then… • Node 3 § It intersects geometry in node 3 1 2 3 4 § System memory can fit only one node at a time 45 12/14/2021 45
Inefficiency Example § The red ray intersects volume of: • Node 1, then… • Node 3 § It intersects geometry in node 3 1 2 3 4 § System memory can fit only one node at a time 46 12/14/2021 46
Inefficiency Example § The red ray intersects volume of: • Node 1, then… • Node 3 § It intersects geometry in node 3 1 2 3 4 § System memory can fit only one node at a time 47 12/14/2021 47
Inefficiency Example § The red ray intersects volume of: • Node 1, then… • Node 3 § It intersects geometry in node 3 1 2 3 4 § System memory can fit only one node at a time 48 12/14/2021 48
Inefficiency Example § The red ray intersects volume of: • Node 1, then… • Node 3 § It intersects geometry in node 3 1 2 3 4 § System memory can fit only one node at a time 49 12/14/2021 49
Our Contribution 12/14/2021 50
Our Contribution § Store proxy geometry at top-level leaf nodes • Fully encloses actual geometry • More detailed than the volume representing leaf nodes § Only batch ray if it intersects proxy geometry, otherwise… § Continue traversal of top-level acceleration structure 12/14/2021 51
Proxy Geometry - Implementation § Voxelize geometry at each top-level leaf node • Conservative voxelization (volume completely encloses geometry) § Construct Sparse Voxel Octree (SVO) from voxelization • Schwarz, Michael, and Hans-Peter Seidel. "Fast parallel surface and solid voxelization on GPUs. " ACM transactions on graphics (TOG) 29. 6 (2010): 110. § Compress to Sparse Voxel Directed Acyclic Graph (SVDAG) 12/14/2021 52
Sparse Voxel Directed Acyclic Graph (SVDAG) § Kämpe, Viktor, Erik Sintorn, and Ulf Assarsson. "High resolution sparse voxel DAGs. " ACM Transactions on Graphics (TOG) 32. 4 (2013): 1 -13. § SVOs may contain duplicate information • Equivalent leaf nodes (43 region of voxels) • Equivalent subtrees § Replace duplicates by a single instance 12/14/2021 53
Sparse Voxel Directed Acyclic Graph (SVDAG) § Kämpe, Viktor, Erik Sintorn, and Ulf Assarsson. "High resolution sparse voxel DAGs. " ACM Transactions on Graphics (TOG) 32. 4 (2013): 1 -13. § SVOs may contain duplicate information • Equivalent leaf nodes (43 region of voxels) • Equivalent subtrees § Replace duplicates by a single instance 12/14/2021 54
Sparse Voxel Directed Acyclic Graph (SVDAG) § Kämpe, Viktor, Erik Sintorn, and Ulf Assarsson. "High resolution sparse voxel DAGs. " ACM Transactions on Graphics (TOG) 32. 4 (2013): 1 -13. § SVOs may contain duplicate information • Equivalent leaf nodes (43 region of voxels) • Equivalent subtrees § Replace duplicates by a single instance 12/14/2021 55
Sparse Voxel Directed Acyclic Graph (SVDAG) § Kämpe, Viktor, Erik Sintorn, and Ulf Assarsson. "High resolution sparse voxel DAGs. " ACM Transactions on Graphics (TOG) 32. 4 (2013): 1 -13. § SVOs may contain duplicate information • Equivalent leaf nodes (43 region of voxels) • Equivalent subtrees § Replace duplicates by a single instance 12/14/2021 56
Sparse Voxel Directed Acyclic Graph (SVDAG) § Kämpe, Viktor, Erik Sintorn, and Ulf Assarsson. "High resolution sparse voxel DAGs. " ACM Transactions on Graphics (TOG) 32. 4 (2013): 1 -13. § SVOs may contain duplicate information • Equivalent leaf nodes (43 region of voxels) • Equivalent subtrees § Replace duplicates by a single instance 12/14/2021 57
Sparse Voxel Directed Acyclic Graph (SVDAG) § Tree traversal • Most SVO algorithms will just work § We use a modified version of: • Laine, Samuli, and Tero Karras. "Efficient sparse voxel octrees. " IEEE Transactions on Visualization and Computer Graphics 17. 8 (2010): 1048 -1059. 12/14/2021 58
Ray Batching with Geometry Proxies § Two level hierarchy • Top-level: • Bottom-level: Always in memory Loaded on demand 59 12/14/2021 59
Ray Batching with Geometry Proxies § Two level hierarchy • Top-level: • Bottom-level: Always in memory Loaded on demand § Traverse top-level tree • Test ray against SVDAG at leaf • Batch when it hits SVDAG 60 12/14/2021 60
Ray Batching with Geometry Proxies § Two level hierarchy • Top-level: • Bottom-level: Always in memory Loaded on demand § Traverse top-level tree • Test ray against SVDAG at leaf • Batch when it hits SVDAG 61 12/14/2021 61
Results 12/14/2021 62
Implementation § Wavefront path tracer (CPU) build from scratch § Acceleration Structures • Top-level • Gasparian, T. G. Fast Divergent Ray Traversal by Batching Rays in a BVH. MS thesis. 2016. • Bottom-level • Intel Embree ray tracing kernels • Proxy Geometry • Sparse Voxel Directed Acyclic Graph (SVDAG) Source: https: //github. com/mathijs 727/pandora 12/14/2021 63
Crown Test Scenes Landscape Island 12/14/2021 64
Construction Overhead § Crown scene (3 x subdiv, 95 M triangles, 151 top-level leafs) § Construction* • SVO: ~10 s wall clock • SVDAG: ~610 ms wall clock Run on an AMD R 9 3900 x (12 core, 24 thread); Voxel grid resolution of 2563. 12/14/2021 65
Memory Overhead § Resolution along one axis • E. g. 32 => 323 = 32. 768 voxels per node § Memory usage depends on • Resolution • Number of top-level leaf nodes § Number of top-level leaf nodes • Crown • Island • Landscape 132 1984 335 12/14/2021 66
Effectiveness § Resolution along one axis • E. g. 32 => 323 = 32. 768 voxels § Percentage of rays that hit AABB but not the SVDAG • These rays do not need to be batched 12/14/2021 67
Effectiveness (# batches per path) Normal Batching Proxy Geometry Average number of batch operations per path. Voxel grid resolution of 2563. 12/14/2021 68
Performance Overhead § Voxel overhead during rendering (preloaded = all in memory) • Normal batching: • Proxy geometry (no culling): • Proxy geometry: ~220 s wall clock ~240 s wall clock ~210 s wall clock § Negative overhead 12/14/2021 69
Disk Bandwidth - Landscape Voxel grid resolution of 2563. 12/14/2021 71
Disk Bandwidth - Island Voxel grid resolution of 2563. 12/14/2021 72
Disk Bandwidth - Crown Voxel grid resolution of 2563. 12/14/2021 73
Summary § Improve existing ray batching algorithm § Add geometry proxies to prevent unnecessary batching § Small memory & performance overhead § Large savings in disk bandwidth (= render time) 12/14/2021 74
Future Work § Could be applied to other out-of-core traversal schemes as well • Ingo Wald, Philipp Slusallek, and Carsten Benthin. "Interactive distributed ray tracing of highly complex models. " Rendering Techniques 2001. Springer, Vienna, 2001. 277 -288. § Proxy geometry intersections could be used to sort rays • Moon, Bochang, et al. "Cache-oblivious ray reordering. " ACM Transactions on Graphics (TOG) 29. 3 (2010): 1 -10. 12/14/2021 75
Questions? Contact m. l. molenaar@tudelft. nl e. eisemann@tudelft. nl 12/14/2021 77
- Slides: 75