GH 05 KDTree Acceleration Structures for a GPU
- Slides: 27
GH 05 KD-Tree Acceleration Structures for a GPU Raytracer Tim Foley, Jeremy Sugerman Stanford University
Motivation GH 05 • Accelerated raytracing – On commodity HW – Production rendering – Real-time applications? • Performance trend – 9800 XT : 170 M ray-triangle intersects/s – X 800 XT PE: 350 M ray-triangle intersects/s
GPU Raytracing • Promising early results – Simple scenes • Uniform grid – Problems with complex scenes • Hierarchical accelerator (kd-tree) – Improve scalability GH 05
Outline • Background – GPU Raytracing – KD-Tree Algorithm • KD-Restart, KD-Backtrack • Results • Future Work GH 05
Background • Ray. Engine [Carr et al. 2002] – Parallel ray-triangle intersection – Host controls culling • [Purcell et al. 2002] – Entire raytracing pipeline – Many rays required for efficiency – Uniform Grid GH 05
Why not KD-Tree? • Uniform grid acceleration structure – Regular structure = efficient traversal – Regular structure = poor partitioning • KD-Trees – Adapt to scene complexity – Compact storage, efficient traversal – “Best” for CPU raytracing [Havran 2000] GH 05
KD-Tree tmin GH 05 Z X X B Y C Y D Z A tmax A B C D
KD-Tree Traversal GH 05 Z X B Y X C Y D Z A A B C D
Per-Fragment Stacks • Parallel (per-ray) push – No indexed write in fragment program • Per-ray stack storage • [Ernst et al. 2004] – Emulate push with extra passes – Impractical, slow GH 05
Our Contribution • Stackless kd-tree traversal algorithms – KD-Restart – KD-Backtrack GH 05
Observation GH 05 Z X B Y X C Y D Z A A B Current leaf’s tmax = Next leaf’s tmin C D
KD-Restart GH 05 Z X B • Standard traversal – Omit stack operations – Proceed to 1 st leaf Y C A D • If no intersection – Advance (tmin, tmax) – Restart from root • Proceed to next leaf
KD-Restart • Restart traversal after each leaf – m leaves – Average depth d – Cost O(m*d) • Balanced tree of n nodes – Upper bound: O(n log(n)) • Standard algorithm: O(n) – Expected: O( log(n) ) GH 05
Observation GH 05 Z X B Y X C Y D Z A A B Ancestor of A is parent of Z C D
KD-Backtrack Z X • If no intersection B Y – Advance (tmin, tmax) – Start backtracking C A GH 05 D • If node intersects (tmin, tmax) – Resume traversal • Proceed to next leaf
KD-Backtrack • Backtrack after leaf – Revisits previous nodes – At most twice: from left, right • Within constant factor of standard traversal – Upper bound: O(n) – Expected: O( log(n) ) • Requires additional storage – Parent pointers – Bounding boxes for internal nodes GH 05
Implementation GH 05 • Built GPU raytracer in Brook [Buck et al. ] • 4 intersection schemes: – Brute Force – Uniform Grid – KD-Restart – KD-Backtrack
Scenes GH 05 Cornell Box Stanford Bunny 32 triangles 69451 triangles BART Robots BART Kitchen 71708 triangles 110561 triangles
Results Box GH 05 Bunny Robots Kitchen 12. 9 Relative speedup over brute-force intersection.
Results Traverse Backtrack Intersect GH 05 Ideal 10. 86 M 0 5. 91 M Restart 21. 80 M 0 5. 91 M Backtrack 10. 86 M 7. 78 M 5. 91 M Rays in each state throughout traversal.
Discussion • Absolute performance – Trails best CPU implementations 5 -6 x • Sources of inefficiency – Load balancing – Data reuse GH 05
Load Balancing • Subset of rays intersecting, traversing – Occlusion queries to select kernel – Early-Z to cull inactive rays • Approximately 5 x overhead – Query, kernel switch overhead – Worse with fewer rays GH 05
Data Reuse GH 05 • Every kernel – Loads ray origin/direction – Load/Store traversal state • Consumes streaming bandwidth – We are bandwidth-limited – CPU implementation stores these in registers
Branching GH 05 • Merge multiple passes into larger kernel – Fragment branches for load balancing – Avoid load/store of reused data • Current branching has high overhead • Shifts efficiency burden to HW
Conclusion • Stackless Traversal – Allows efficient GPU kd-tree – Scales to larger, more complex scenes • Future Work – Changes in HW – Alternative acceleration structures – “Out-of-core” scenes – Dynamic scenes GH 05
Acknowledgements • Tim Purcell (NVIDA) – Streaming raytracer • Mark Segal (ATI) – Demo machine • NVIDIA, ATI : HW • DARPA, Rambus : Funding GH 05
Questions GH 05
- Matlab gpu acceleration
- Kvm gpu acceleration
- Svg basics
- Inkscape gpu acceleration
- Linear acceleration
- Radial acceleration
- Tangential acceleration and centripetal acceleration
- Relationship between angular and linear quantities
- Radial acceleration definition
- Examples of homologous
- Formel gruplar
- Etik och ledarskap etisk kod för chefer
- Ellika andolf
- Rutin för avvikelsehantering
- Varför kallas perioden 1918-1939 för mellankrigstiden
- Presentera för publik crossboss
- Treserva lathund
- Debatt mall
- Sju principer för tillitsbaserad styrning
- En lathund för arbete med kontinuitetshantering
- Lek med geometriska former
- Tobinskatten för och nackdelar
- Vad är lyrik
- Hur ser ett referat ut
- Atmosfr
- Verifikationsplan
- Formel för lufttryck
- Typiska drag för en novell