Vertex Shader Tricks New Ways to Use the

  • Slides: 33
Download presentation
Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill

Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

Topics Covered ● ● ● Overview of the DX 11 front-end pipeline Common bottlenecks

Topics Covered ● ● ● Overview of the DX 11 front-end pipeline Common bottlenecks Advanced Vertex Shader Features Vertex Shader Techniques Samples and Results

DX 11 Front-End Pipeline ● ● ● VS –vertex data HS – control points

DX 11 Front-End Pipeline ● ● ● VS –vertex data HS – control points Tessellator DS – generated vertices GS – primitives Write to UAV at all stages ● Starting with DX 11. 1 Input Assembler Vertex Shader Hull Shader CB, SRV, or UAV Tessellator Domain Shader Geometry Shader . . . Stream Graphics Hardware Out

Bottlenecks - VS ● VS Attributes ● Limit outputs to 4 attributes (AMD) ●This

Bottlenecks - VS ● VS Attributes ● Limit outputs to 4 attributes (AMD) ●This ● applies to all shader stages (except PS) VS Texture Fetches ● Too many texture fetches can add latency ●Especially dependent texture fetches ●Group fetches together for better performance ●Hide latency with ALU instructions

Bottlenecks - VS ● Use the caches wisely ● ● Avoid large vertex formats

Bottlenecks - VS ● Use the caches wisely ● ● Avoid large vertex formats that waste pre-VS cache space Draw. Indexed() allows for reuse of processed vertices saved in the post-VS cache ●Vertices with the same index only need to get processed once Input Assembler Pre-VS Cache (Hides Latency) Vertex Shader Post-VS Cache (Vertex Reuse)

Bottlenecks - GS ● ● Can add or remove primitives Adding new primitives requires

Bottlenecks - GS ● ● Can add or remove primitives Adding new primitives requires storing new vertices ●Going ● off chip to store data can be a bandwidth issue Using the GS means another shader stage ●This means more competition for shader resources ●Better if you can do everything in the VS

Advanced Vertex Shader Features ● ● ● SV_Vertex. ID, SV_Instance. ID UAV output (DX

Advanced Vertex Shader Features ● ● ● SV_Vertex. ID, SV_Instance. ID UAV output (DX 11. 1) NULL vertex buffer ● VS can create its own vertex data

SV_Vertex. ID Can use the vertex id to decide what vertex data to fetch

SV_Vertex. ID Can use the vertex id to decide what vertex data to fetch ● Fetch from SRV, or procedurally create a vertex ● VSOut Vertex. Shader(SV_Vertex. ID id) { float 3 vertex = g_Vertex. Buffer[id]; … }

UAV buffers ● Write to UAVs from a Vertex Shader ● New feature in

UAV buffers ● Write to UAVs from a Vertex Shader ● New feature in DX 11. 1 (UAV at any stage) Can be used instead of stream-out for writing vertex data ● ● Triangle output not limited to strips ●You ● can use whatever format you want Can output anything useful to a UAV

NULL Vertex Buffer ● DX 11/DX 10 allows this ● ● ● Just set

NULL Vertex Buffer ● DX 11/DX 10 allows this ● ● ● Just set the number of vertices in Draw() VS will execute without a vertex buffer bound Can be used for instancing ● ● Call Draw() with the total number of vertices Bind mesh and instance data as SRVs

Vertex Shader Techniques ● ● Full Screen Triangle Vertex Shader Instancing ● ● Merged

Vertex Shader Techniques ● ● Full Screen Triangle Vertex Shader Instancing ● ● Merged Instancing Vertex Shader UAVs

Full Screen Triangle ● For post-processing effects ● Triangle has better performance than quad

Full Screen Triangle ● For post-processing effects ● Triangle has better performance than quad Fast and easy with VS generated coordinates ● ● No IB or VB is necessary Something you should be using for full screen effects ● (-1, 3, 0) (3, -1, 0) (-1, 0) Clip Space Coordinates

Full Screen Triangle: C++ code // Null VB, IB pd 3 d. Immediate. Context->IASet.

Full Screen Triangle: C++ code // Null VB, IB pd 3 d. Immediate. Context->IASet. Vertex. Buffers( 0, 0, NULL, NULL ); pd 3 d. Immediate. Context->IASet. Index. Buffer( NULL, (DXGI_FORMAT)0, 0 ); pd 3 d. Immediate. Context->IASet. Input. Layout( NULL ); // Set Shaders pd 3 d. Immediate. Context->VSSet. Shader( g_p. Full. Screen. VS, NULL, 0 ); pd 3 d. Immediate. Context->PSSet. Shader( … ); pd 3 d. Immediate. Context->PSSet. Shader. Resources( … ); pd 3 d. Immediate. Context->IASet. Primitive. Topology( D 3 D 11_PRIMITIVE_TOPOLOGY_TRIANGLELIST ); // Render 3 vertices for the triangle pd 3 d. Immediate. Context->Draw(3, 0);

Full Screen Triangle: HLSL Code VSOutput VSFull. Screen. Test(uint id: SV_VERTEXID) { VSOutput output;

Full Screen Triangle: HLSL Code VSOutput VSFull. Screen. Test(uint id: SV_VERTEXID) { VSOutput output; (-1, 3, 0) // generate clip space position output. pos. x = (float)(id / 2) * 4. 0 - 1. 0; output. pos. y = (float)(id % 2) * 4. 0 - 1. 0; output. pos. z = 0. 0; output. pos. w = 1. 0; // texture coordinates output. tex. x = (float)(id / 2) * 2. 0; output. tex. y = 1. 0 - (float)(id % 2) * 2. 0 ; // color output. color = float 4(1, 1, 1, 1); return output; } (-1, 0) (3, -1, 0) Clip Space Coordinates

VS Instancing: Point Sprites ● Often done on GS, but can be faster on

VS Instancing: Point Sprites ● Often done on GS, but can be faster on VS ● ● ● Create an SRV point buffer and bind to VS Call Draw or Draw. Indexed to render the full triangle list. Read the location from the point buffer and expand to vertex location in quad Can be used for particles or Bokeh DOF sprites Don’t use Draw. Instanced for a small mesh

Point Sprites: C++ Code pd 3 d->IASet. Index. Buffer( g_p. Particle. Index. Buffer, DXGI_FORMAT_R

Point Sprites: C++ Code pd 3 d->IASet. Index. Buffer( g_p. Particle. Index. Buffer, DXGI_FORMAT_R 32_UINT, 0 ); pd 3 d->IASet. Primitive. Topology( D 3 D 11_PRIMITIVE_TOPOLOGY_TRIANGLELIST ); pd 3 d. Immediate. Context->Draw. Indexed( g_particle. Count * 6, 0, 0);

Point Sprites: HLSL Code VSInstanced. Particle. Draw. Out VSIndex. Buffer(uint id: SV_VERTEXID) { VSInstanced.

Point Sprites: HLSL Code VSInstanced. Particle. Draw. Out VSIndex. Buffer(uint id: SV_VERTEXID) { VSInstanced. Particle. Draw. Out output; uint particle. Index = id / 4; uint vertex. In. Quad = id % 4 ; // calculate the position of the vertex float 3 position; position. x = (vertex. In. Quad % 2) ? 1. 0 : -1. 0; position. y = (vertex. In. Quad & 2) ? -1. 0 : 1. 0; position. z = 0. 0; position. xy *= PARTICLE_RADIUS; position = mul( position, (float 3 x 3)g_m. Inv. View ) + g_buf. Pos. Color[particle. Index ]. pos. xyz; output. pos = mul( float 4(position, 1. 0), g_m. World. View. Proj ); output. color = g_buf. Pos. Color[particle. Index]. color ; // texture coordinate output. tex. x = (vertex. In. Quad % 2) ? 1. 0 : 0. 0; output. tex. y = (vertex. In. Quad & 2) ? 1. 0 : 0. 0 ; return output; }

Point Sprite Performance 12 10 8 6 4 2 0 Indexed, 500 K Sprites

Point Sprite Performance 12 10 8 6 4 2 0 Indexed, 500 K Sprites 0. 52 AMD Radeon R 9 290 x R 9 Nvidia 290 x Titan (ms) Non-Indexed, 500 K Sprites 0. 77 0. 87 GS, 500 K Sprites 1. 38 0. 83 Draw. Instanced, 500 K Sprites 1. 77 5. 1 Indexed, 1 M Sprites 1. 02 1. 5 Non-Indexed, 1 M GS, 1 M Sprites 1. 53 2. 7 1. 92 1. 6 Draw. Instanced, 1 M Sprites 3. 54 10. 3

Point Sprite Performance Draw. Indexed() is the fastest method ● Draw() is slower but

Point Sprite Performance Draw. Indexed() is the fastest method ● Draw() is slower but doesn’t need an IB ● Don’t use Draw. Instanced() for creating sprites on either AMD or NVidia hardware ● ● Not recommended for a small number of vertices

Merge Instancing Combine multiple meshes that can be instanced many times ● ● Better

Merge Instancing Combine multiple meshes that can be instanced many times ● ● Better than normal instancing which renders only one mesh ● ● Instance nearby meshes for smaller bounding box Each mesh is a page in the vertex data ● Fixed vertex count for each mesh ●Meshes smaller than page size use degenerate triangles

Merge Instancing Instance 0 Mesh Index 2 Instance 1 Mesh Index 0 Mesh Data

Merge Instancing Instance 0 Mesh Index 2 Instance 1 Mesh Index 0 Mesh Data 1 Mesh Data 2. . . Vertex. . . 0 0 1 2 3 Degenerate Triangle Fixed Length Page Mesh Instance Data Mesh Vertex Data

Merged Instancing using VS Use the vertex ID to look up the mesh to

Merged Instancing using VS Use the vertex ID to look up the mesh to instance ● ● ● All meshes are the same size, so (id / SIZE) can be used as an offset to the mesh Faster than using Draw. Instanced()

Merge Instancing Performance Instancing performance test by Cloud Imperium Games for Star Citizen ●

Merge Instancing Performance Instancing performance test by Cloud Imperium Games for Star Citizen ● Renders 13. 5 M triangles (~40 M verts) ● Draw. Instanced version calls Draw. Instanced() and uses instance data in a vertex buffer ● Soft Instancing version uses vertex instancing with Draw() calls and fetches instance data from SRV ● 30 25 20 AMD Radeon 290 x R 9 R 9 290 X ms 15 GTX 780 Nvidia GTX 780 10 5 0 Draw. Instanced Soft Instancing

Vertex Shader UAVs Random access Read/Write in a VS ● Can be used to

Vertex Shader UAVs Random access Read/Write in a VS ● Can be used to store transformed vertex data for use in multi-pass algorithms ● Can be used for passing constant attributes between any shader stage (not just from VS) ●

Skinning to UAV ● Skin vertex data then output to UAV ● ● Instance

Skinning to UAV ● Skin vertex data then output to UAV ● ● Instance the skinned UAV data multiple times Can also be used for non-instanced data ● Multiple passes can reuse the transformed vertex data – Shadow map rendering Performance is about the same as streamout, but you can do more … ●

Bounding Box to UAV ● Can calculate and store Bbox in the VS ●

Bounding Box to UAV ● Can calculate and store Bbox in the VS ● ● Use a UAV to store the min/max values (6) Interlocked. Min/Interlocked. Max determine min and max of the bbox ●Need ● to use integer values with atomics Use the stored bbox in later passes ● ● GPU physics (collision) Tile based processing

Bounding Box: HLSL Code void UAVBBox. Skin. VS(VSSkinned. In input, uint id: SV_VERTEXID )

Bounding Box: HLSL Code void UAVBBox. Skin. VS(VSSkinned. In input, uint id: SV_VERTEXID ) { // skin the vertex. . . // output the max and min for the bounding box int x = (int) (v. Skinned. Pos. x * FLOAT_SCALE ); // convert to integer int y = (int) (v. Skinned. Pos. y * FLOAT_SCALE); int z = (int) (v. Skinned. Pos. z * FLOAT_SCALE); Interlocked. Min(g_BBox. UAV[0 ], Interlocked. Min(g_BBox. UAV[1 ], Interlocked. Min(g_BBox. UAV[2 ], Interlocked. Max(g_BBox. UAV[3 ], Interlocked. Max(g_BBox. UAV[4 ], Interlocked. Max(g_BBox. UAV[5 ], . . . x); y); z);

Particle System UAV ● ● Single pass GPU-only particle system In the VS: ●

Particle System UAV ● ● Single pass GPU-only particle system In the VS: ● ● Generate sprites for rendering Do Euler integration and update the particle system state to a UAV

Particle System: HLSL Code uint particle. Index = id / 4; uint vertex. In.

Particle System: HLSL Code uint particle. Index = id / 4; uint vertex. In. Quad = id % 4; // calculate the new position of the vertex float 3 old. Position = g_buf. Pos. Color[particle. Index]. pos. xyz; float 3 old. Velocity = g_buf. Pos. Color[particle. Index]. velocity. xyz; // Euler integration to find new position and velocity float 3 acceleration = normalize(old. Velocity) * ACCELLERATION; float 3 new. Velocity = acceleration * g_delta. T + old. Velocity; float 3 new. Position = new. Velocity * g_delta. T + old. Position; g_particle. UAV[particle. Index ]. pos = float 4(new. Position, 1. 0); g_particle. UAV[particle. Index ]. velocity = float 4(new. Velocity, 0. 0 ); // Generate sprite vertices. . .

Conclusion Vertex shader “tricks” can be more efficient than more commonly used methods ●

Conclusion Vertex shader “tricks” can be more efficient than more commonly used methods ● ● Use SV_Vertex ID for smarter instancing ●Sprites ●Merge ● Instancing UAVs add lots of freedom to vertex shaders ●Bounding box calculation ●Single pass VS particle system

Demos ● ● Particle System UAV Skinning ● Bbox

Demos ● ● Particle System UAV Skinning ● Bbox

Acknowledgements ● Merge Instancing ● ● ● Emil Person, “Graphics Gems for Games” SIGGRAPH

Acknowledgements ● Merge Instancing ● ● ● Emil Person, “Graphics Gems for Games” SIGGRAPH 2011 Brendan Jackson, Cloud Imperium Thanks to ● ● ● Nick Thibieroz, AMD Raul Aguaviva (particle system UAV), AMD Alex Kharlamov, AMD

Questions ● bill. bilodeau@amd. com

Questions ● bill. bilodeau@amd. com