PARTIALLY RESIDENT TEXTURES ON NEXTGENERATION GPUS Bill Bilodeau
PARTIALLY RESIDENT TEXTURES ON NEXT-GENERATION GPUS Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD
AGENDA FOR TODAY’S TALK §Part 1 – Introduction to HD 7970 and Partially Resident Textures, Bill Bilodeau §Part 2 – Implementation in Open. GL, Graham Sellers §Part 3 – Ptex, an example PRT application, Karl Hillesland 2 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
PART 1 INTRODUCTION TO THE RADEON HD 7970 AND PARTIALLY RESIDENT TEXTURES 3 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
WHAT ARE PARTIALLY RESIDENT TEXTURES? § Partially Resident Textures (PRTs) are textures that have only portions of the texture stored in GPU video memory § Best known example of virtual texturing (software implementation) is John Carmack’s “Mega. Textures” Image from id Software’s Rage 4 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
RADEON HD 7970 OVERVIEW §World’s first GPU to have dedicated hardware for Partially Resident Textures §Completely new Shader architecture §Improved cache and memory bandwidth §World’s first Direct 3 D® 11. 1 GPU 5 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
PREVIOUS SHADER ARCHITECTURE §Previous AMD GPUs used VLIW (Very Long Instruction Word) architecture – Combines instructions into a 4 -wide VLIW that gets executed on a SIMD Shader Instructions VLIW Instruction X a b c d = = b a c b d c e + + c; c; d; a; e; d; f; 6 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012 Y Z W b a + c c a idle d c + d d + e idle e + f Thread 0 a b c + c a idle d c + d d + e idle e + f Thread 1 b a + c c a idle d c + d d + e idle e + f Thread 2 b a + c c a idle d c + d d + e idle e + f Thread 63
NEW SHADER ARCHITECTURE § 64 -wide SIMD architecture without VLIW instructions – No need to combine instructions, since multiple threads can run in parallel Shader Instructions a b c d = = b a b c + + c; c; a; d; ALUs S 0 S 1 S 2 b a + c c a d b + d a c . . No idle ALUs! 7 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012 S 63 c + c b a d
COMPUTE UNITS ARE THE NEW BASIC BUILDING BLOCK FOR SHADERS §Each Compute unit consists of 4 SIMDs and one Scalar unit §Higher execution efficiency §Simplified logic design §Simplified assembly language §HD 7970 has 32 Compute Units – 4 SIMDs per CU Compute Unit 8 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
ADDITIONAL FEATURES OF THE HD 7970 §Improved Tessellation Performance §Improved Geometry Shader Performance §Fast depth accept for fully visible triangles, depth bounds testing support § 384 bit memory bus §DX 11. 1 §And of course, Partially Resident Texture support! 9 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
INTRODUCTION TO PARTIALLY RESIDENT TEXTURES §Enables application to manage more texture data than can physically fit in a fixed footprint – A. k. a. Virtual texturing or Sparse texturing §The principle behind PRT is that not all texture contents is likely to be needed at any given time – Current render view may only require selected portions of the texture to be resident in memory – Or selected MIPMap levels §PRT textures only have a portion of their data mapped into GPU-accessible memory at a given time 10 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
PRT TILES §The PRT texture is chunked into 64 KB tiles – Fixed memory size – Not dependant on texture type or format Highlighted areas represent texture data that needs highest resolution Chunked texture Images from “Sparse Virtual Textures”, Sean Barrett, GDC 2008 11 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012 Texture tiles needing to be resident in GPU memory
TRANSLATION TABLE §The GPU virtual memory page table translates tiles into a resident texture tile pool Texture Map Page Table Texture Tile Pool (Video Memory) (linear storage) 64 Kb tile Unmapped page entry Mapped page entry Images from “Sparse Virtual Textures”, Sean Barrett, GDC 2008 12 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
TRANSLATION TABLE - MIPMAPS §MIPMaps can be included in the Texture Tile Pool Texture Map Page Table Texture Tile Pool (Video Memory) 64 Kb tile Unmapped page entry Mapped page entry Images from “Sparse Virtual Textures”, Sean Barrett, GDC 2008 13 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
“FAILED” TEXEL FETCH CONDITION §How does the application know which texture tiles to upload? §Answer: PRT-specific texture fetch instructions in a shader – Return a “Failed” texel fetch condition when sampling a PRT pixel whose tile is currently not in the pool §This information is then stored in render target or UAV – Texel fetch failed for a given (x, y) tile location §. . . and then copied to the CPU so that application can upload required tiles §App chooses what to render until missing data gets uploaded 14 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
“LOD WARNING” TEXEL FETCH CONDITION §PRT fetch condition code can also indicate an “LOD Warning” §The minimum LOD warning is specified by the application on a per texture basis §If a fetched pixel’s LOD is below the specified LOD warning value then the condition code is returned §This functionality is typically used to try to predict when higher-resolution MIP levels are going to be needed – E. g. Camera getting closer to PRT-mapped geometry 15 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
EXAMPLE USAGE § 1) App allocates PRT (e. g. 16 kx 16 k DXT 1) using PRT API § 2) App uploads MIP levels using API calls § 3) Shader fetches PRT data at specified texcoords Two possibilities: 3 a) Texel data belongs to a resident (64 KB) tile - Valid color returned, no error code 3 b) Texel data points to non-resident tile or specified LOD - Error/LOD Warning code returned - Shader writes tile location and error code to RT or UAV § 4) App reads RT or UAV and upload/release new tiles as needed 16 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
PRT ADVANTAGES VS SOFTWARE IMPLEMENTATION PRT Ease of implementation Software Impementation • Eliminates the complexity and limitations of SW solutions Full filtering support • Includes anisotropic filtering Full-speed filtering • SW solution requires “manual” filtering in pixel shader • Can be quite costly if anisotropic filtering is used Don’t go overboard with PRT allocation! • Page table entry size is 4 DWORDs • Have to be resident in video memory 17 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
PART 2 IMPLEMENTATION IN OPENGL AMD_SPARSE_TEXTURE EXTENSION 18 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
OPENGL EXTENSION | AMD_SPARSE_TEXTURE § Partially Resident Textures exposed in Open. GL via extension § Two design goals for the extension – Minimally invasive to the API § Easy to retrofit into existing application § Plays well with non-sparse textures – Easy fallback path § Most of the same code will work in the absence of the extension § Two parts to the extension – Update to the API – 1 function, a hand full of tokens – Update to the shading language 19 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
UPLOAD TEXTURES | EXAMPLE USING EXISTING OPENGL API § Use of immutable texture storage GLuint tex; gl. Gen. Textures(1, &tex); gl. Bind. Texture(GL_TEXTURE_2 D, tex); gl. Tex. Storage 2 D(GL_TEXTURE_2 D, 10, GL_RGBA 8, 1024); gl. Tex. Sub. Image 2 D(GL_TEXTURE_2 D, 0, 0, 0, 1024, GL_RGBA, GL_UNSIGNED_BYTE, data); § This is the existing Open. GL immutable storage API – declare storage, specify image data 20 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
UPLOAD TEXTURES | EXAMPLE USING NEW OPENGL EXTENSION § Use of sparse texture storage GLuint tex; gl. Gen. Textures(1, &tex); gl. Bind. Texture(GL_TEXTURE_2 D, tex); gl. Tex. Storage. Sparse. AMD(GL_TEXTURE_2 D, GL_RGBA, 1024, 1, 1, GL_TEXTURE_STORAGE_SPARSE_BIT_AMD); gl. Tex. Sub. Image 2 D(GL_TEXTURE_2 D, 0, 0, 0, 1024, GL_RGBA, GL_UNSIGNED_BYTE, data); § gl. Tex. Storage. Sparse. AMD is the one new function in the extension – Notice very little difference to previous API 21 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
MAKE PAGES RESIDENT | REUSE EXISTING API § Previous example used gl. Tex. Sub. Image 2 D – Upload sub-region of the texture – Physical pages allocated on demand by the Open. GL driver – Unused pages remain free gl. Tex. Storage. Sparse. AMD(GL_TEXTURE_2 D, GL_RGBA, 1024, 1, 10, GL_TEXTURE_STORAGE_SPARSE_BIT_AMD); gl. Tex. Sub. Image 2 D(GL_TEXTURE_2 D, 0, 0, 0, 256, GL_RGBA, GL_UNSIGNED_BYTE, data 1); gl. Tex. Sub. Image 2 D(GL_TEXTURE_2 D, 0, 768, 256, GL_RGBA, GL_UNSIGNED_BYTE, data 2); § Enough storage for two 256 x 256 regions allocated 22 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
FREE PHYSICAL PAGES | AGAIN, REUSE EXISTING API § Passing NULL to gl. Tex. Sub. Image 2 D makes pages non-resident – Driver returns physical pages to the pool gl. Tex. Sub. Image 2 D(GL_TEXTURE_2 D, 0, 0, 0, 256, GL_RGBA, GL_UNSIGNED_BYTE, NULL); 23 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
PAGE SIZES | DETERMINING PAGE SIZES § Sparse Textures rely on VM subsystem – Pages are 64 KB in size on Southern Islands § Note size is measured in bytes, not texels – Texel size of a page depends on texture format BPP 128 64 32 16 8 Texels 4096 8192 16384 32768 65636 24 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012 BPP 128 BC 2/3/5/6 H/7 64 BC 1/4 32 16 8 Tile Width 64 256 128 512 128 256 Tile Height 64 256 128 128 256
PAGE SIZE | RETRIEVING PAGE SIZE FROM OPENGL § Reuse existing API: gl. Get. Internal. Formativ – New Open. GL tokens – GL_VIRTUAL_PAGE_SIZE_{X, Y, Z}_AMD GLint page_size_x; gl. Get. Internal. Formativ(GL_TEXTURE_2 D, GL_RGBA 8, GL_VIRTUAL_PAGE_SIZE_X_AMD, sizeof(GLint), &page_size_x); § Given a target (texture dimensionality) and format, returns the page size – It is not necessary to create a texture to get this information 25 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
MIPMAPS | DEALING WITH SMALL TEXTURES § Highest resolution LOD requires multiple pages § Each LOD requires fewer and fewer pages § Eventually, one LOD does not fill a page – Now what? § At some point, we must make all LODs resident – But which LOD? § Use gl. Get. Internal. Formativ to retrieve the lowest sparse level for a given target/format GLint min_sparse_level; gl. Get. Internal. Formativ(GL_TEXTURE_2 D, GL_RGBA 16 F, GL_MIN_SPARSE_LEVEL_AMD, 1, &min_sparse_level); – All levels below this reside in the same page and share residency 26 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
LOD WARNING | LOW WATER MARK § To assist in streaming we include a per-texture low water mark – Set this to the highest resolution LOD that’s fully resident – Once you hit this, you’ll get a signal in the shader § Returned data is still valid § Signal says it’s time to start streaming the next mip § Exposed using the gl. Tex. Parameter API gl. Tex. Parameteri(GL_TEXTURE_2 D, GL_MIN_WARNING_LOD_AMD, 4); – Here, an LOD warning will be returned to the shader if hardware attempts to access LOD 4 or lower § More on residency returns later. . . 27 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
RENDERING TO PRT | ATTACH PRT TO FBO § It is possible to render to a PRT using an FBO GLuint prt, fbo; gl. Gen. Textures(1, &prt); gl. Bind. Texture(GL_TEXTURE_2 D, prt); gl. Tex. Storage. Sparse. AMD(GL_TEXTURE_2 D, GL_RGBA, 1024, 1, 1, GL_TEXTURE_STORAGE_SPARSE_BIT_AMD); gl. Tex. Sub. Image 2 D(GL_TEXTURE_2 D, 0, 0, 0, 1024, GL_RGBA, GL_UNSIGNED_BYTE, data); gl. Gen. Framebuffers(1, &fbo); gl. Bind. Framebuffer(GL_FRAMEBUFFER, fbo); gl. Framebuffer. Texture 2 D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT 0, GL_TEXTURE_2 D, prt, 0); § Writes to unmapped regions are simply dropped 28 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
READING FROM PRT | RETRIEVING DATA FROM PRTS § Applications can read PRTs to CPU memory using existing APIs – Call gl. Get. Tex. Image to read the entire content back gl. Get. Tex. Image(GL_TEXTURE_2 D, 0, GL_RGBA, GL_UNSIGNED_BYTE, data); – Bind to FBO and use gl. Read. Pixels or gl. Blit. Framebuffer § Reads to system memory or into another FBO, respectively gl. Framebuffer. Texture 2 D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT 0, GL_TEXTURE_2 D, prt, 0); gl. Read. Pixels(0, 0, 1024, GL_RGBA, GL_UNSIGNED_BYTE, data); gl. Blit. Framebuffer(0, 0, 1024, 0, 0, 128, GL_COLOR_BUFFER_BIT, GL_LINEAR); 29 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
RESTRICTIONS | MOSTLY EVERYTHING WORKS § There are some restrictions on the use of sparse textures – Dimensions of the base level must be integer multiples of the page size (GL_VIRTUAL_PAGE_SIZE_{X, Y, Z}_AMD) § This means. . . no sparse textures below this size – No buffer textures or “TBOs” – another extension is coming for that! – No depth or stencil textures, nor MSAA textures 30 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
MANAGING FAILURE | MEMORY IS NOT UNLIMITED § Virtual address space is extremely large – 10’s to 100’s of gigabytes – You will run out eventually, but it’ll take a while § Physical memory is still limited – gl. Tex. Sub. Image 2 D etc. , may fail – Draw calls may fail § Feel free to create an 4 k x 4 k volume texture – Don’t try to make it all resident at the same time! § There are no sparse read-backs – gl. Get. Tex. Image could read gigabytes of data back – This will fail 31 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
SPARSE TEXTURES IN SHADERS | EXTENDING GLSL § First and most important: IT IS NOT NECESSARY TO MAKE SHADER CHANGES TO USE SPARSE TEXTURES 32 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
SPARSE TEXTURES IN SHADERS | EXTENDING GLSL § Basic type for textures in GLSL is the ‘sampler’ – Several types of samplers exist. . . sampler 2 D, sampler 3 D, sampler. CUBE, sampler 2 DArray, etc. – We didn’t add any new sampler types § PRTs look like regular textures in the shader § Textures are read using the ‘texture’ built-in function, its overloads and variants gvec 4 texture(gsampler 1 D sampler, float P [, float bias]); gvec 4 texture(gsampler 2 D sampler, vec 2 P [, float bias]); gvec 4 texture(gsampler 2 DArray sampler, vec 3 P [, float bias]); gvec 4 texture. Lod(gsampler 2 D sampler, vec 2 P, float lod); gvec 4 texture. Proj(gsampler 2 D sampler, vec 4 P [, float bias]); gvec 4 texture. Offset(gsampler 2 D sampler, vec 2 P, ivec 2 offset [, float bias]); //. . . etc. – We didn’t add any overloads 33 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
EXTENDING GLSL | NEW BUILT-IN FUNCTIONS § Adding more overloads to existing functions was difficult – Need to return a status code and a texel – Need user-specified defaults with conditional move like functionality – Optional parameters in existing overloads made this very difficult § Added new built-in functions – New built-in functions return status code – New built-in functions return texel data via inout parameters int sparse. Texture(gsampler 2 D sampler, vec 2 P, inout gvec 4 texel [, float bias]); int sparse. Texture. Lod(gsampler 2 D sampler, vec 2 P, float lod, inout gvec 4 texel); . . . existing etc. texture functions have a sparse. Texture equivalent –//Most § Non-PRTs work with new functions – Will appear as fully-resident PRT 34 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
EXTENDING GLSL | SPARSETEXTURE FUNCTIONS § All sparse. Texture functions return two pieces of data: int sparse. Texture(gsampler 2 D sampler, vec 2 P, inout gvec 4 texel [, float bias]); – Texel data via inout parameter – Residency status code § Texel data returned in inout parameter – If texel fetch fails, old data remains in variable – Think of it as a CMOV type operation § Return code is hardware-dependent bit-field information – More built-in functions for decoding status codes – This allows us to extend this further in the future, or to change the implementation 35 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
SPARSETEXTURE FUNCTIONS | TEXTURE DATA RETURN § Texel data is returned in inout parameter – No direct support for ‘default value’ behavior – This is emulated in the shader: vec 4 texel = vec 4(1. 0, 0. 7, 1. 0); // Default value sparse. Texture(s, tex. Coord, texel); // On success, texel contains texture data. On failure, it has the shadersupplied // default value in it (pinkish magenta here). § Note that regular texture fetch functions work on PRTs too: vec 4 texel = texture(s, tex. Coord); – Value of texel is undefined if you miss. . . §. . . but feel free to use on known-resident data (atlases, explicit Lo. D, etc. ) 36 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
SPARSETEXTURE FUNCTIONS | RESIDENCY DATA RETURN § Residency data is bit-packed into the return value from the fetch vec 4 texel = vec 4(1. 0, 0. 7, 1. 0); // Default value int code; code = sparse. Texture(s, tex. Coord, texel); § After this, code can be interpreted by three additional functions: bool sparse. Texel. Resident(int code); bool sparse. Texel. Min. Lod. Warning(int code); int sparse. Texel. Lod. Warning. Fetch(int code); 37 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
RESIDENCY DATA | SPARSETEXELRESIDENT § sparse. Texel. Resident simply indicates whether the data fetched is valid bool sparse. Texel. Resident(int code); § Returns true if data is valid, false otherwise § Texel miss is generated if any required sample is not resident, including: – Texels required for bilinear or trilinear sampling – Missing mip maps – Anisotropic filter taps § It is up to the shader to ‘do the right thing’ – Fall back to lower mips – Write out to an image or framebuffer attachment – etc. , etc. 38 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
RESIDENCY DATA | SPARSETEXELMINLODWARNING § sparse. Texel. Min. Lod. Warning returns true if a min LOD warning was generated bool sparse. Texel. Min. Lod. Warning(int code); – This occurs when generating the returned texel required fetching from an LOD lower than the lowwater mark specified by the application – This can be a signal to the application to start streaming more mip levels 39 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
RESIDENCY DATA | SPARSETEXELLODWARNINGFETCH § Returns the LOD that caused the low-watermark warning to be generated int sparse. Texel. Lod. Warning. Fetch(int code); – This also causes sparse. Texel. Min. Lod. Warning to return true – sparse. Texel. Lod. Warning. Fetch returns 0 if the warning was not hit 40 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
EXAMPLE USE CASES | WHAT CAN I USE THIS FOR? § Drop in replacement for traditional 2 D Sparse Virtual Texture (SVT) – Well, almost – maximum texture size hasn’t increased § Very large texture arrays – Sparsely populate array – Can almost eliminate texture binds in some applications § Volume textures + ray marching – Sparse or homogeneous media – Default value is maximum step distance for ray marching distance fields § Arrays of variable sized textures – Make a large array, but populate different mip levels in each slice – Store LOD bias per array slice in an auxiliary array (UBO, for example) § Etc. , etc. 41 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
PART 3 PRT PTEX USING SPARSE TEXTURES 42 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
PTEX | INTRODUCTION Ptex: Per-face Texture Mapping for Production Rendering [Burley and Lacewell, 2008] §No UV setup (it’s implicit) §No Seams §Per-Patch Resolution Control §Out-of-core Performance Advantages 43 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
PTEX | INTRODUCTION Ptex: Per-face Texture Mapping for Production Rendering [Burley and Lacewell, 2008] §Per-face textures + MIPs §Adjacency for filtering 44 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
BORDERS FOR FILTERING Face Texture A 45 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012 Face Texture B
MANUAL TRILINEAR FILTERING floor Resolution Lookup (ddx ddy) frac floor +1 46 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012 Lerp
PRT PTEX §Packed in one texture array – Slice per resolution – Resolution includes MIPs – Cannot fit in standard MIP chain – Easy lookups – Easy resolution management – Still one texture 47 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
PRT PTEX PRAGMATICS §Better organization possibilities – Pack pages – Scaled squares §Other Methods – Packed Ptex – all in one texture slice – Face per slice, array per resolution 48 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
MULTIRES SLICES 49 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
MIP FALLBACK floor Resolution Lookup (ddx ddy) frac floor +1 50 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012 Lerp
Demo
Trademark Attribution AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names used in this presentation are for identification purposes only and may be trademarks of their respective owners. © 2012 Advanced Micro Devices, Inc. All rights reserved. 52 | Partially Resident Textures on Next-Generation AMD GPUs | March 8, 2012
- Slides: 52