Coherent Hierarchical Culling Hardware Occlusion Queries Made Useful

  • Slides: 26
Download presentation
Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1,

Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna University of Technology 2 VRVis Vienna

Motivation Coherent Hierarchical Culling n Typical hardware occlusion culling scenario CPU R Q GPU

Motivation Coherent Hierarchical Culling n Typical hardware occlusion culling scenario CPU R Q GPU R Q R Q C Q R time R Q C Render Occlusion Query Cull Michael Wimmer Waiting time 2 Vienna University of Technology

Occlusion Culling: Offline vs. Online n Offline u + Global information about visibility (from

Occlusion Culling: Offline vs. Online n Offline u + Global information about visibility (from region) Difficult to implement Accuracy and maintenance problems No runtime overhead n Online u + + - Local information about visibility (from point) Easier to implement Greater accuracy, easy maintenance Runtime overhead Michael Wimmer 3 Vienna University of Technology

Online Occlusion Culling n Object space methods - Need complex geometric calculations (hard to

Online Occlusion Culling n Object space methods - Need complex geometric calculations (hard to handle detailed scenes) + Do not require rasterization n Image space methods + No geometric calculations (easier to handle detailed scenes) - Require rasterization Michael Wimmer 4 Vienna University of Technology

Hardware Occlusion Culling n Hardware is good at rasterization! n Hardware counts rasterized fragments

Hardware Occlusion Culling n Hardware is good at rasterization! n Hardware counts rasterized fragments u But need not update frame buffer n NV/ARB_occlusion_query u u Asynchronous Allows multiple simultaneous occlusion queries n General algorithm idea: u Render simple approximation first (bbox) n invisible: cull object n visible: render object Michael Wimmer 5 Vienna University of Technology

Hardware Occlusion Culling n Advantages u u Pixel-exact No explicit occluder rendering Exploit rasterization

Hardware Occlusion Culling n Advantages u u Pixel-exact No explicit occluder rendering Exploit rasterization power of GPU Easy to use (API calls) n Problems u u u Delay in availability of the results Time to execute queries If fill-bound: only useful if several objects culled Michael Wimmer 6 Vienna University of Technology

Hierarchical Stop&Wait (S&W) Front-to-back hierarchy traversal 1. Issue visibility query for node 2. Stop

Hierarchical Stop&Wait (S&W) Front-to-back hierarchy traversal 1. Issue visibility query for node 2. Stop and Wait for result u u Invisible: cull the subtree Visible: render or continue 1. recursively n Advantage: u Hierarchy can cull huge subtrees n Problems: u u Waiting causes CPU stalls and GPU starvation Huge rasterization costs (especially for large interior nodes) Michael Wimmer 7 Vienna University of Technology

CPU Stalls and GPU Starvation CPU R 1 Q 2 GPU R 1 Q

CPU Stalls and GPU Starvation CPU R 1 Q 2 GPU R 1 Q 2 R 2 Q 3 C 3 Q 4 R 4 time Rx Qx Cx Render object x Query object x Cull object x Michael Wimmer Waiting time 8 Vienna University of Technology

Solution: Coherent Hierarchical Culling n Scheduling based on temporal coherence u u Skipping certain

Solution: Coherent Hierarchical Culling n Scheduling based on temporal coherence u u Skipping certain visibility tests Immediate rendering of certain geometry n Clever interleaving of queries and rendering u Maintaining a queue of running occlusion queries n Design goal: easy implementation Michael Wimmer 9 Vienna University of Technology

Coherent Hierarchical Culling (CHC) visible in previous. Assume frame independent occlusion CPU R 1

Coherent Hierarchical Culling (CHC) visible in previous. Assume frame independent occlusion CPU R 1 Q 2 GPU R 1 Q 2 R 2 Q 3 C 3 Q 4 R 4 time Rx Qx Cx Render object x Query object x Cull object x Michael Wimmer 10 Vienna University of Technology

CHC Algorithm Outline n Front-to-back hierarchy traversal 1. Node handling u u Interior node

CHC Algorithm Outline n Front-to-back hierarchy traversal 1. Node handling u u Interior node n Previously invisible: issue visibility query n Previously visible: continue 1. recursively Leaf n Issue visibility query n Previously visible: render immediately 2. Check availability of query results n n Michael Wimmer Invisible: propagate visibility change Visible: render or continue 1. recursively 11 Vienna University of Technology

Why Interleaving Works… n Processing a node only depends on… 1. Front to back

Why Interleaving Works… n Processing a node only depends on… 1. Front to back order 2. Results of queries for processed nodes where: Previous frame: processed node current node S&W CHC visible yes no visible invisible yes no invisible yes no yes invisible (different subtrees) invisible (parent child, refinement of visibility) Michael Wimmer 12 Vienna University of Technology

CHC: Hierarchy Traversal no queries for previously visible interior nodes assume no query dependencies

CHC: Hierarchy Traversal no queries for previously visible interior nodes assume no query dependencies 1 9 11 12 10 13 hidden regions: queries depend on parents previously visible previously invisible 13 Michael Wimmer front-to-back order 2 3 4 5 6 7 7 6 8 10 734 9 131112 8 5 6 Vienna University of Technology

CHC Features n Reduction of CPU stalls and GPU starvation u Interleaving queries with

CHC Features n Reduction of CPU stalls and GPU starvation u Interleaving queries with rendering previously visible geometry n Reduction of the number of queries u u Michael Wimmer Avoids expensive redundant queries for interior nodes Size of tested regions adapts to visibility n pull-up: occluded region growing n pull-down: visible region growing 14 Vienna University of Technology

Implementation Issues n Front-to-back traversal u Priority queue: allows various hierarchical data structures n

Implementation Issues n Front-to-back traversal u Priority queue: allows various hierarchical data structures n Checking query results u u gl. Get. Occlusion. Queryiv. NV GL_PIXEL_COUNT_AVAILABLE_NV Very cheap operation n Queries for previously visible nodes u Use actual geometry as occludee (instead of bounding box) Michael Wimmer 15 Vienna University of Technology

Further Optimizations n Conservative visibility testing u Assume visible node remains visible n frames

Further Optimizations n Conservative visibility testing u Assume visible node remains visible n frames + Saves additional occlusion queries n Approximate visibility u #visible pixels < threshold node invisible + Saves rendered geometry - Produces image errors Michael Wimmer 16 Vienna University of Technology

Results – Test Scenes Teapots 11. 5 M triangles 21 k k. D-tree nodes

Results – Test Scenes Teapots 11. 5 M triangles 21 k k. D-tree nodes City 1 M triangles 33 k k. D-tree nodes Power plant 12. 7 M triangles 18. 7 k k. D-tree nodes Michael Wimmer 17 Vienna University of Technology

Results – Speedup Ideal: zero overhead – render only visible geometry Michael Wimmer 18

Results – Speedup Ideal: zero overhead – render only visible geometry Michael Wimmer 18 Vienna University of Technology

Results – Summary n Comparison to hierarchical S&W u #queries reduced by almost 2

Results – Summary n Comparison to hierarchical S&W u #queries reduced by almost 2 u Times for stalls reduced by 20 -60 x (to 0. 18 – 1. 31 ms) n Close to ideal algorithm! u Only 2– 9 ms slower u Overhead due to query time Michael Wimmer 19 Vienna University of Technology

Results – Teapot Michael Wimmer 20 Vienna University of Technology

Results – Teapot Michael Wimmer 20 Vienna University of Technology

Results – City Michael Wimmer 21 Vienna University of Technology

Results – City Michael Wimmer 21 Vienna University of Technology

Results – Powerplant Michael Wimmer 22 Vienna University of Technology

Results – Powerplant Michael Wimmer 22 Vienna University of Technology

Optimization Results n Conservative culling, 2 frames assumed visible u u Good for deep

Optimization Results n Conservative culling, 2 frames assumed visible u u Good for deep hierarchies with simple leaf geometry Further speedup up to 21% n Approximate culling, 25 pixels threshold u u Good for scenes with complex visible geometry Further speedup up to 33% Michael Wimmer 23 Vienna University of Technology

Conclusion n Efficient scheduling of hardware occlusion queries u u n n Greatly reduces

Conclusion n Efficient scheduling of hardware occlusion queries u u n n Greatly reduces CPU stalls and GPU starvation Reduces number of required queries Simple to implement Arbitrary hierarchical data structure Speedup ~4 over VFC Close to ideal solution for tested scenes n Watch out for GPU Gems II Michael Wimmer 24 Vienna University of Technology

Thanks for Your Attention Michael Wimmer 25 Vienna University of Technology

Thanks for Your Attention Michael Wimmer 25 Vienna University of Technology

CHC: Example query final previously pull-up issued query classification result queries invisibility visible invisible:

CHC: Example query final previously pull-up issued query classification result queries invisibility visible invisible: available: : render continue issue query render continue cull mark query 1. visible recursively +1. render recursively 1 9 11 2 10 3 4 5 6 7 8 query queue GPU Michael Wimmer R 6 Q 7 Q 8 R 7 Q 10 R 4 Q 5 Q 6/R 6 Q 10/R 10 Q 11 26 Vienna University of Technology