A Parallel Algorithm for Construction of Uniform Grids













- Slides: 13

A Parallel Algorithm for Construction of Uniform Grids Javor Kalojanov, Philipp Slusallek Saarland University

Motivation § Grids for GPU ray tracing – Trade off quality for fast construction § Parallel construction – Computational power of GPUs – Memory bandwidth of GPUs § Existing algorithms not “massively parallel” – Atomic synchronization – Work distribution not scene independent A Parallel Algorithm for Construction of Uniform Grids Javor

1 Data Structure §Array of primitive references 2 0 1 1 2 2 1 3 1 4 2 5 §Cells are ranges in the array [0, 2) [2, 3) [3, 4) [4, 5) [0, 0) [0, 0) A Parallel Algorithm for Construction of Uniform Grids Javor

Algorithm Overview §Reduce to sorting 1. Write pairs of references and cell indices 2. Sort 3. Extract cell ranges 1 2 1 0 1 1 1 2 1 0 2 3 2 2 1 3 2 2 0 1 0 2 1 1 3 [0, 2) [2, 3) [3, 4) [4, 5) [0, 0) [0, 0) A Parallel Algorithm for Construction of Uniform Grids Javor

Implementation § Main idea already implemented – CUDA SDK, particle demo § Here – Each Primitive overlaps any number of cells § Unknown number of references § Write conflicts § Solution – Count references – Segment output array A Parallel Algorithm for Construction of Uniform Grids Javor

Count Primitive References Shared 2 1 3 2 2 3 Shared 3 2 4 1 2 3 Memory Shared 1 3 1 4 2 1 Memory Shared 4 1 2 4 3 2 Memory Thread Block 1 Memory Thread Block 3 Thread Reduce Block 4 Global Memory 13 15 12 16 Exclusive Scan 0 13 26 38 54 A Parallel Algorithm for Construction of Uniform Grids Javor

Write Unsorted Pairs Scan result: 0 13 26 38 Next primitive: 54 19 Next Free Slot: 38 42 Thread Block 4 Output Array: 3 19 14 19 15 19 38 42 A Parallel Algorithm for Construction of Uniform Grids Javor

Radix Sort § § Fastest GPU sort algorithm Parallel implementation Linear work complexity Fits the data (integer cell indices) A Parallel Algorithm for Construction of Uniform Grids Javor

Cell Ranges §Extract from sorted data §Load chunk-wise into shared memory §Find neighboring pairs with different cell indices §Update the corresponding cells 0 1 0 2 1 1 2 1 3 2 Thread Block [0, 2) [2, 3) [3, 4) [4, 5) [0, 0) [0, 0) A Parallel Algorithm for Construction of Uniform Grids Javor

Analysis Soda Hall 211 98 Conference 1 5. 7 Exploding Dragon 1 1 0% 7 13 17 1 2. 1 7. 5 20% 40% Count Cell-Prim Pairs Radix Sort 0. 8 60% 80% 2 100% Write Cell-Prim Pairs Extract Grid Cells § Runtime dominated by sorting § Linear work complexity § No write conflicts A Parallel Algorithm for Construction of Uniform Grids Javor

Results Model (Triangles) LBVH GTX 280 Grid GTX 280 Hybrid BVH GTX 280 Fairy (174 K) 10. 3 ms 1. 8 fps 24 ms 3. 5 fps 124 ms 11. 6 fps Bunny/Dragon (252 K) 17 ms 7. 3 fps 13 ms 7. 7 fps 66 ms 7. 6 fps Conference (284 K) 19 ms 6. 7 fps 27 ms 7. 0 fps 105 ms 22. 9 fps Soda Hall (2. 2 M) 66 ms 3. 0 fps 130 ms 6. 3 fps 445 ms 20. 7 fps Times for primary rays and simple shading (no shadows). Frame rate does not include build time. 10242 window. A Parallel Algorithm for Construction of Uniform Grids Javor

Future Work § Apply to other acceleration structures § Two level grids – Top level – uniform grid – Each cell is a grid § Independent resolution – Single sorting pass for level 2 Model (Triangles) Hybrid BVH GTX 280 Grid GTX 280 2 level Grid GTX 285 Fairy (174 K) 124 ms 11. 6 fps 24 ms 3. 5 fps 28 ms 9. 9 fps Conference (284 K) 105 ms 22. 9 fps 27 ms 7. 0 fps 89 ms 11. 8 fps A Parallel Algorithm for Construction of Uniform Grids Javor

Thank You!