Fast BVH Construction on GPUs Eurographics 2009 Park
Fast BVH Construction on GPUs (Eurographics 2009) Park, Soonchan KAIST (Korea Advanced Institute of Science and Technology)
Contents ● What is BVH ● Motivation ● Three Algorithm to Construct BVH ● LBVH ● SAH Hierarchy Construction ● Hybrid GPU Construction Algorithm ● Results & Analysis 2
Contents ● What is BVH ● Motivation ● Three Algorithm to Construct BVH ● LBVH ● SAH Hierarchy Construction ● Hybrid GPU Construction Algorithm ● Results & Analysis 3
What is BVH? ● Bounding Volume Hierarchy ● A tree structure on a set of geometric objects ● “Fast Computation” ● Ray tracing ● Collision detection ● Visibility Culling 4
What is BVH? ● Issues of BVH construction ● Construction Time ● Effectiveness of Construction ●How much improvement BVH makes – Median Subdivision & Surface Area Heuristic 5
Motivation ● BVH Construction Almost all prior works are about “Purely serial construction algorithms” Make Efficient Parallel algorithms! on manycore processors How to make processes of BVH construction appropriate for parallel computation 6
Contents ● What is BVH ● Motivation ● Three Algorithm to Construct BVH ● LBVH ● SAH Hierarchy Construction ● Hybrid GPU Construction Algorithm ● Results & Analysis 7
Contents ● What is BVH ● Motivation ● Three Algorithm to Construct BVH ● LBVH ● SAH Hierarchy Construction ● Hybrid GPU Construction Algorithm ● Results & Analysis 8
LBVH ● Linear Bounding Volume Hierarchy ● Simplest approach to parallelizing BVH Construction ● Sorting input primitives by Morton Codes ● BVH Construction Sorting ( O(nlogn) ) 9
Morton Codes (Z-order) ● Space-filling curve ● Morton Codes (Z-order) ● Good locality-preserving ● Express space as bits 10
Morton Codes (Z-order) 11
LBVH ● Linear B. V. H. ● Sorting primitives along the curve parallel radix sort [SHG 08] ● Each primitive has bit expression of position ● How to make the Hierarchy? 12
LBVH ● Make Hierarchy ● Test all Primitive i with Primitive i+1 ●What levels they are separated ●Make list ( (Primitive index) , ( separate level) ) ● Resort the list by level We can have intervals at each level! 13
Example Split list (Prim. Index, Separate Lev. ) (6, 1) (3, 2) (6, 2) (2, 3) (3, 3) (4, 3) (5, 3) (6, 3) (7, 3) (1, 4) (2, 4) (3, 4) (4, 4) (5, 4) (6, 4) (7, 4) 14
12345678 123456 15 78 LEVEL 1
12345678 123456 123 16 456 78 LEVEL 1 LEVEL 2
12345678 123456 123 12 17 3 LEVEL 1 78 456 4 5 LEVEL 2 6 7 8 LEVEL 3
12345678 123456 123 3 12 1 18 2 LEVEL 1 78 456 4 5 LEVEL 2 6 7 8 LEVEL 3 LEVEL 4
19
LBVH ● Pros ● Very fast – same complexity as sorting ●+ we use parallel radix sort [SHG 08] ● Constructed Hierarchy is not optimized ●It uniformly subdivides space at the median ● Leaf can has multiple primitives 20
Contents ● What is BVH ● Motivation ● Three Algorithm to Construct BVH ● LBVH ● SAH Hierarchy Construction ● Hybrid GPU Construction Algorithm ● Results & Analysis 21
What is SAH ● Surface Area Heuristic ● Answer for optimized architecture ●“which of a number of partitions of primitives will be better? ●“which of a number of possible positions to split space will be better? ” 22
What is SAH ● SAH optimized construction can also be achieved in O(nlogn) [WH 06] ● Processes for SAH ● Recursively splitting the set of geometric primitives (usually two parts per step-binary tree) ● Evaluate with “cost function” ●Cost function can be defined ● Find the one with lowest cost ● Check all possible split position can be costly ● Sampling method can be applied 23
GPU SAH Construction ● Breadth-first construction using work queues Input queue Output queue ● Parallelization! 24
Data-Parallel SAH Split ● Two steps for performing SAH split ● Determine the best split position by evaluating the SAH ● Reorder the primitives ( corresponds to the new split ) 25
Data-Parallel SAH Split ● Determine the best split position ● Approximate SAH computation ● Generate k uniformly sampled split candidates for three axes ( test all the samples in parallel by using 3 k threads ) ● Each thread computes the SAH cost for its split candidate ● Find split candidate with lowest cost ● Reorder the Primitives ● In corresponds to the new splits ● Only reorder the indices ●No copy of geometry 26
Small Split Operation ● Two main bottleneck ● Initial split at the top level of hierarchy is very slow ● Large # of primitives at Top level – By using hybrid method (discussed later) ● Large # of small splits at Low level ● Problems ●Higher compaction costs generated by large # of splits ●Vector utilizing is low (Few primitive per split) 27 ● Large # of small size of split makes problem Use different split kernel for small size
Small Split Operation ● Main Idea ● Set Thresh hold to define “Small split” ●Depends on geometry data & cache size (32) ● Use processor’s local memory ●to maintain a local work queue ●Keep all the geometric primitives ● Pros ● Reduce memory bandwidth ● Decrease # of Thread ●Maximize utilization of vector operation ● Avoid waiting for memory access 15~20% speed up 28
Small Split Operation # of active splits Times 29 Level of splits
Contents ● What is BVH ● Motivation ● Three Algorithm to Construct BVH ● LBVH ● SAH Hierarchy Construction ● Hybrid GPU Construction Algorithm ● Results & Analysis 30
Hybrid GPU Construction Algorithm ● LBVH ● Not optimized at last ● Shallow hierarchy ● Large # of primitives at the leafs ● But FAST ● Problem of GPU SAH Construction ● Relatively Slow ● Overhead at first level ● But it can build optimized hierarchy ● Solution ● Top level use LBVH ● Others use GPU SAH Construction 31
Contents ● What is BVH ● Motivation ● Three Algorithm to Construct BVH ● LBVH ● SAH Hierarchy Construction ● Hybrid GPU Construction Algorithm ● Results & Analysis 32
Results ● Render several scenes ● Comparing with other environments ● One-core not optimized CPU SAH ● Full SAH ● Standard CPU BVH ray tracer using ray packets ● Compare with ● Construction time, Well Optimized, fps 33
Results Construction Time Absolute/relative r. t. perf. 34
Results Construction Time Absolute/relative r. t. perf. 35
Results Construction Time Absolute/relative r. t. perf. 36
Results ● GPU SAH ● Show better performance than CPU SAH ● Good optimization ● LBVH ● Fast, not optimized ● Scene dependent ● Hybrid ● Middle of GPU SAH & LBVH ● can be customized 37
Analysis ● Current GPU architecture several features for constructing hierarchy ● Special Graphics memory significantly higher memory bandwidth ● Manage fast local memory ●Discussed in Small Split Operation ● Memory ● 113 bytes/triangle ●Worst case: when one triangle per leaf It allows multi-million triangle models on current GPU 38
Analysis ● Bottleneck Analysis Core overhead Memory overhead 39
Analysis ● Time Distribution Full SAH build Hybrid build *Rest = read/write BVH node information, setting up splits, join rest of steps “Note that Hybrid build is 10 times faster” 40
Video ● Youtube Video 41
Reference ● [SHG 08] SATISH N. , HARRIS M. , GARLAND M. : Designing efficient sorting algorithms for manycore GPUs. Under review (2008). ● [WH 06] WALD I. , HAVRAN V. : On building fast kd -trees for ray tracing, and on doing that in O(N log N). In Proc. of IEEE Symp. on Interactive Ray Tracing (2006), pp. 61– 69. 42
- Slides: 42