A Parallel Algorithm for Construction of Uniform Grids

  • Slides: 13
Download presentation
A Parallel Algorithm for Construction of Uniform Grids Javor Kalojanov, Philipp Slusallek Saarland University

A Parallel Algorithm for Construction of Uniform Grids Javor Kalojanov, Philipp Slusallek Saarland University

Motivation § Grids for GPU ray tracing – Trade off quality for fast construction

Motivation § Grids for GPU ray tracing – Trade off quality for fast construction § Parallel construction – Computational power of GPUs – Memory bandwidth of GPUs § Existing algorithms not “massively parallel” – Atomic synchronization – Work distribution not scene independent A Parallel Algorithm for Construction of Uniform Grids Javor

1 Data Structure §Array of primitive references 2 0 1 1 2 2 1

1 Data Structure §Array of primitive references 2 0 1 1 2 2 1 3 1 4 2 5 §Cells are ranges in the array [0, 2) [2, 3) [3, 4) [4, 5) [0, 0) [0, 0) A Parallel Algorithm for Construction of Uniform Grids Javor

Algorithm Overview §Reduce to sorting 1. Write pairs of references and cell indices 2.

Algorithm Overview §Reduce to sorting 1. Write pairs of references and cell indices 2. Sort 3. Extract cell ranges 1 2 1 0 1 1 1 2 1 0 2 3 2 2 1 3 2 2 0 1 0 2 1 1 3 [0, 2) [2, 3) [3, 4) [4, 5) [0, 0) [0, 0) A Parallel Algorithm for Construction of Uniform Grids Javor

Implementation § Main idea already implemented – CUDA SDK, particle demo § Here –

Implementation § Main idea already implemented – CUDA SDK, particle demo § Here – Each Primitive overlaps any number of cells § Unknown number of references § Write conflicts § Solution – Count references – Segment output array A Parallel Algorithm for Construction of Uniform Grids Javor

Count Primitive References Shared 2 1 3 2 2 3 Shared 3 2 4

Count Primitive References Shared 2 1 3 2 2 3 Shared 3 2 4 1 2 3 Memory Shared 1 3 1 4 2 1 Memory Shared 4 1 2 4 3 2 Memory Thread Block 1 Memory Thread Block 3 Thread Reduce Block 4 Global Memory 13 15 12 16 Exclusive Scan 0 13 26 38 54 A Parallel Algorithm for Construction of Uniform Grids Javor

Write Unsorted Pairs Scan result: 0 13 26 38 Next primitive: 54 19 Next

Write Unsorted Pairs Scan result: 0 13 26 38 Next primitive: 54 19 Next Free Slot: 38 42 Thread Block 4 Output Array: 3 19 14 19 15 19 38 42 A Parallel Algorithm for Construction of Uniform Grids Javor

Radix Sort § § Fastest GPU sort algorithm Parallel implementation Linear work complexity Fits

Radix Sort § § Fastest GPU sort algorithm Parallel implementation Linear work complexity Fits the data (integer cell indices) A Parallel Algorithm for Construction of Uniform Grids Javor

Cell Ranges §Extract from sorted data §Load chunk-wise into shared memory §Find neighboring pairs

Cell Ranges §Extract from sorted data §Load chunk-wise into shared memory §Find neighboring pairs with different cell indices §Update the corresponding cells 0 1 0 2 1 1 2 1 3 2 Thread Block [0, 2) [2, 3) [3, 4) [4, 5) [0, 0) [0, 0) A Parallel Algorithm for Construction of Uniform Grids Javor

Analysis Soda Hall 211 98 Conference 1 5. 7 Exploding Dragon 1 1 0%

Analysis Soda Hall 211 98 Conference 1 5. 7 Exploding Dragon 1 1 0% 7 13 17 1 2. 1 7. 5 20% 40% Count Cell-Prim Pairs Radix Sort 0. 8 60% 80% 2 100% Write Cell-Prim Pairs Extract Grid Cells § Runtime dominated by sorting § Linear work complexity § No write conflicts A Parallel Algorithm for Construction of Uniform Grids Javor

Results Model (Triangles) LBVH GTX 280 Grid GTX 280 Hybrid BVH GTX 280 Fairy

Results Model (Triangles) LBVH GTX 280 Grid GTX 280 Hybrid BVH GTX 280 Fairy (174 K) 10. 3 ms 1. 8 fps 24 ms 3. 5 fps 124 ms 11. 6 fps Bunny/Dragon (252 K) 17 ms 7. 3 fps 13 ms 7. 7 fps 66 ms 7. 6 fps Conference (284 K) 19 ms 6. 7 fps 27 ms 7. 0 fps 105 ms 22. 9 fps Soda Hall (2. 2 M) 66 ms 3. 0 fps 130 ms 6. 3 fps 445 ms 20. 7 fps Times for primary rays and simple shading (no shadows). Frame rate does not include build time. 10242 window. A Parallel Algorithm for Construction of Uniform Grids Javor

Future Work § Apply to other acceleration structures § Two level grids – Top

Future Work § Apply to other acceleration structures § Two level grids – Top level – uniform grid – Each cell is a grid § Independent resolution – Single sorting pass for level 2 Model (Triangles) Hybrid BVH GTX 280 Grid GTX 280 2 level Grid GTX 285 Fairy (174 K) 124 ms 11. 6 fps 24 ms 3. 5 fps 28 ms 9. 9 fps Conference (284 K) 105 ms 22. 9 fps 27 ms 7. 0 fps 89 ms 11. 8 fps A Parallel Algorithm for Construction of Uniform Grids Javor

Thank You!

Thank You!