NBuffers for efficient depth map query Xavier Dcoret

N-Buffers for efficient depth map query Xavier Décoret Artis GRAVIR/IMAG INRIA

Context • Real-time rendering • Visibility culling – quickly reject what’s not visible what won’t affect any pixel in final image • Many methods available [COCSD 02, PT 02]

Occlusion maps • Select potential occluders [LG 95, KCCO 00] – project and rasterize them – store distance to closest one at each pixel • Z buffer / occlusion map / depth map • Traverse potential occludees – project and rasterize them – test visibility of each fragment - use bounding volumes - do it hierarchically • depth comparison against depth map

Optimizations • Reduce number of pixels tested – Hierarchical Z Buffer [ZMHH 97] – Lazy Occlusion Grid [HTP 01] – Summed Area Tables [HW 99] • Use hardware Z buffer – implemented for hidden face removal • with optimizations [Mor 00, AMN 03] – exposed through Occlusion Queries

Occlusion queries • # of pixels passing z test if some geometry were rendered in current framebuffer • Hardware-assisted culling [HSLM 02, BWPP 04] • Other applications [TPK 01] – culling & clamping of shadow volumes [LWGM 04] – LOD selection [ASVNB 00]

Motivation for N-Buffers • Query depth map within GPU – Advantages • reduce communication with CPU • allow to discard/optimize geometry on GPU – Constraints • limited # of operations • complex datastructures unavailable – no pointers and lists • “complex” algorithms prohibited – branching and indirections costly

Task at hand • For a given object, find the maximum depth covered by its projection • Depth map accessed as a texture – Lookups give information at one pixel – We need information over a region • Use texture to encode depth over a region – proximity grids

The datastructure • Sequence of depth maps (levels) • At level i a texel stores maximum depth in a neighborood of size i – various neigborood/size possible – we choose squares • with lower left corner on texel • with size 2 ix 2 i

The datastructure • Sequence of depth maps (levels) • At level i a texel stores maximum depth in a neighborood of size i depth map level 0

The datastructure • Sequence of depth maps (levels) • At level i a texel stores maximum depth in a neighborood of size i that texel stores maximum depth within that region depth map level 0 level 1

The datastructure • Sequence of depth maps (levels) • At level i a texel stores maximum depth in a neighborood of size i that texel stores maximum depth within that region depth map level 0 level 1 level 2

The datastructure • Sequence of depth maps (levels) • At level i a texel stores maximum depth in a neighborood of size i depth map level 0 that texel stores maximum depth within that region level 1 level 2 level 3

The datastructure • Like an image pyramid but. . . – all levels have same resolution – level 0 (depth map) can have any dimensions • not limited to power of 2 • # of levels is log of largest dimension – but we might build only the first levels

Construction • Level i+1 obtained from level i level 0 level 1 level 2

Construction • Can be done on the GPU – render scene offscreen – copy depth to texture L[0] – for i = 1 to n • setup fragment program • render a quad – covering viewport – with unit texcoords – with fragment program • copy depth to texture L[i] standard z-buffer

Construction • Can be done on the GPU – render scene offscreen – copy depth to texture L[0] – for i = 1 to n • setup fragment program • render a quad – covering viewport – with unit texcoords – with fragment program • copy depth to texture L[i]

Construction • Similar to matrix reduction. . . – Buck and Purcell, GPU Gems, p 626 • . . . but we keep full resolution – gives us locality

Construction • Complexity – first step depends on scene complexity – other steps depends only on resolution • Computation cost – ~10 ms for 640 x 480 – no read back Ge. Force FX 6800

Query • Naive approach top viewport level 0 level 1 level 2 level 3 level 4 level 5

Query • Naive approach top view – project occludee viewport level 0 level 1 level 2 level 3 level 4 level 5

Query • Naive approach top view – project occludee – get screen space bbox • extents + zmin viewport level 0 level 1 level 2 level 3 level 4 level 5

Query • Naive approach top view – project occludee – get screen space bbox • extents + zmin – get bounding neighborood level 0 level 1 level 2 level 3 level 4 viewport 25 x 2 5 level 5

Query • Naive approach top view – project occludee – get screen space bbox • extents + zmin • in matching level • at lower left corner 25 x 2 5 zmax level 0 level 1 level 2 level 3 level 4 viewport – get bounding neighborood – do one lookup level 5

Query • Naive approach top view – project occludee – get screen space bbox • extents + zmin • in matching level • at lower left corner 25 x 2 5 zmax viewport – get bounding neighborood – do one lookup – compare zmin and zmax level 0 level 1 level 2 level 3 level 4 level 5

Query top view • Naive approach • Overly conservative – (bvolume of occludee) – screenspace bbox – bounding neighborood viewport 25 x 2 5 Need a tighter coverage level 0 level 1 level 2 level 3 level 4 level 5

4 tiles coverage • depthmax in region > depthmax in sub-region bounding neighborood 25 x 2 5 screenspace bbox zmax ≤ z 24 x 2 4

4 tiles coverage • depthmax in region > depthmax in sub-region bounding neighborood 25 x 2 5 zmax = screenspace bbox zmax ≤ z 24 x 2 4 max( z 1, z 2, z 3, z 4 )

4 tiles coverage • 5 ways of covering with 4 squares • Measure of the gain on over-conservativity

Applications • Occlusion culling • Particles • Shadow volume clamping

Occlusion Culling • N-Buffer vs. Occlusion Queries – walkthrough in city-like scene – occluders at frame n = visible at frame n-1 • Measured the number of depth tests – testing each building – using a hierarchy of bounding volumes

Occlusion Culling • Occlusion queries are faster – harware implementation, available API – N-Buffers penalized • computation of 4 tiles coverage on CPU • use of gl. Read. Pixels to query levels • Occlusion queries can be interleaved with rendering [BWPP 04]

Occlusion Culling • # of depth tests smaller with N-Buffers – 4 tests/occludee << nb of pixels rasterized • N-Buffers always benefit from hierarchy – testing A cheaper than testing children(A) – not the case for OQ

Occlusion Culling • # of depth tests smaller with N-Buffers – 4 tests/occludee << nb of pixels rasterized • N-Buffers always benefit from hierarchy – testing A cheaper than testing children(A) – not the case for OQ n n>n 1+n 2 n 1 n 2

Hardware implementation? • Extra memory to store levels • Dedicated component for level updates – not all levels? – lazy updates? • Faster than OQ for large objects • Fixed (4) number of operations – simplementation – good for parallelism

Applications • Occlusion culling • Particles • Shadow volume clamping

Particles • Particle rendered using ARB_point_sprite – no need to compute quad on CPU • Particle animated within GPU – up to a million particle in real-time

Particles • Particle rendered using ARB_point_sprite – no need to compute quad on CPU • Particle animated within GPU – up to a million particle in real-time • How to cull unseen particles? – can not use OQ!

Particles • Using N-Buffers – for 16 x 16 point sprites • compute 4 first levels only • do one texture lookup in vertex program • Not implementable yet – v. program lookups require LUMINANCE_FLOAT 32_ATI – N-Buffers require DEPTH_COMPONENT

Applications • Occlusion culling • Particles • Shadow volume clamping

Shadow volumes clamping • Ignore unseen or fully shadowed casters • Clamp shadow volume to shadowed area [LWGM 04]

Shadow volumes clamping • From light’s view, what part of the (visible) scene a shadow volume encompass? light camera scene

The litmap Light’s view Camera’s view • Light view of what’s seen by viewer

Shadow volumes clamping • From light’s view, what part of the (visible) scene a shadow volume encompass? • Minimum/maximum depth covered by a shadow caster

Shadow volumes clamping • Compute two litmaps – furthest visible parts – closest visible parts • Compute N-Buffers for both • For each shadow caster – use N-Buffers to lookup min/max visible parts – cull and clamp accordingly • Can be done in vertex program [BS 03]

Shadow volumes clamping

Shadow volumes clamping • Simpler than CC shadow volumes [LWGM 04] – single slice – not optimized (no hardware support) reduce by more of 80% the fill rate

Conclusion • Novel representation for depth maps – for encoding depth information over a region – fast to compute • possible implementation on hardware – fixed number of operations for query • queries available in vertex/fragment programs • Applications – can’t beat (yet) hardware optimized approaches – more a proof of concept

Future work • Not limited to culling – depth maps used for relief [OB 00, PNC 05] • Other neighborood basis – RIP maps [KLK 00] • Link to theory of Wavelet Zoom [Mal 01]

Questions