Firaxis LORE And other uses of D 3

  • Slides: 26
Download presentation
Firaxis LORE And other uses of D 3 D 11

Firaxis LORE And other uses of D 3 D 11

Low Overhead Rendering Engine • Or, how I learned to Render 15, 000+ batches

Low Overhead Rendering Engine • Or, how I learned to Render 15, 000+ batches at 60 FPS

Overview • Civ 5 is a big game, covers 6000 years of history •

Overview • Civ 5 is a big game, covers 6000 years of history • The entire map can be populated/ polluted with all sorts of things the user creates • Need to be able to render a huge amount of possibly disparate types

Early Goals • Build brand new Engine for Civilization V • Like the game,

Early Goals • Build brand new Engine for Civilization V • Like the game, we wanted graphics engine to be able to ‘stand the test of time’ • Decided while D 3 D 11 was in Alpha to build the engine natively for D 3 D 11 architecture, and map backwards to DX 9

Step 1: Cutting the overhead down • Shaders start in Firaxis Shading Language (FSL)

Step 1: Cutting the overhead down • Shaders start in Firaxis Shading Language (FSL) superset of HLSL • Compiles into CPP and Header file – all shader constants are mapped to structs, grouped into packages where all packages have same bindings • Model Code is templated – FSL generated header is then bound with template code • Result is tiny amount of code that fills out required shading, barely shows up on profiling FSL Files CPP / H Template Code Compile Time Glue Code

Step 2: Abstracting the Rendering • Still have to Support DX 9, might have

Step 2: Abstracting the Rendering • Still have to Support DX 9, might have to support consoles in future • Might have to write a ‘driver’ • Our solution: Make DX 9 ‘look like’ DX 11 • Started with as a restricted design as possible, and expanded as we needed to

Packetized Rendering • Stateless rendering, much simpler then D 3 D • Command based

Packetized Rendering • Stateless rendering, much simpler then D 3 D • Command based – all rendering is performed by self contained command • A command set may contain a list of surfaces to render, each with shader constant payload • A surface is an immutable bundle of an IB, VB, textures, shader def, etc • All state is bundled into a packages Alpha State, Z State, etc. Commands reference one of these state packages • Entire Frame is queued up • Minimal per frame allocation

Only 5 Types of commands • COMMAND_RENDER_BATCHES – A List of surfaces to render

Only 5 Types of commands • COMMAND_RENDER_BATCHES – A List of surfaces to render into 1 or more rendertargets, with alpha and Zstate bundles – Surfaces have IB, VB, sampler and texture bundles. All required state is specified • • COMMAND_GENERATE_MIPS COMMAND_RESOLVE_RENDERTEXTURE COMMAND_COPY_RESOURCE

Packetized Rendering Command Stream Rendering System Command Stream Rendering Engine D 3 D/Driver

Packetized Rendering Command Stream Rendering System Command Stream Rendering Engine D 3 D/Driver

Step 3: Threading Job Job Job Manager Job Command Stream Rendering System

Step 3: Threading Job Job Job Manager Job Command Stream Rendering System

Why do we queue up entire Frame? • Would seem like additional overhead, but

Why do we queue up entire Frame? • Would seem like additional overhead, but perf analysis shows it is a net win – – – Internal command setup is super-cheap, just some mem copies Engine cache coherency is vastly better D 3 D driver cache coherency is much better with one giant dump Very low % of total CPU time spent in submission Allows us to filter redundant D 3 D calls. Call overhead adds up Fast even in DX 9

Implementation advantages • Once ‘stateless’ concept grasped, code maintaince easy • Next to no

Implementation advantages • Once ‘stateless’ concept grasped, code maintaince easy • Next to no state-leaking (flickering alpha, textures etc) • Because rendering is packetized, individual jobs need little or no communication between each other • NO THREADING BUGS

Threaded D 3 D 11 submission • Top issues: – Generally High driver overhead

Threaded D 3 D 11 submission • Top issues: – Generally High driver overhead for batch submission – But: D 3 D 11 has multithreaded submission – Command Streams not necessarily map 1: 1 to Command. Lists – Civilization V can change how it submits via settings the config files

Step 4: Gloating over results • Wildly surpassed commonly held beliefs on # of

Step 4: Gloating over results • Wildly surpassed commonly held beliefs on # of batches possible, especially with threading Test Driver with native CL support Driver without CL support Units 1686* 931 Landmarks 1152* 673 Lategame 3616* 2052 *Believed to be GPU limited

Conclusions • High throughput rendering is possible: IF: – care taken to reduce application

Conclusions • High throughput rendering is possible: IF: – care taken to reduce application overhead – Job based, pay-load based rendering – Redundant state and calls filtered – Use D 3 D 11 command lists – Engine can peg 12 threads at 97% (sans driver)

D 3 D 11 Features: Tessellation • Major addition to D 3 D 11

D 3 D 11 Features: Tessellation • Major addition to D 3 D 11 API [Screenshot]

Terrain • Civ 5 contains one of the most complex terrain systems ever made

Terrain • Civ 5 contains one of the most complex terrain systems ever made • Complete procedural process • Use GPU to raytrace and anti-alias shadows • Caching system to deal with cases where terrain is too big

Tessellation • Terrain very high detail, roughly 64 x 64 heightmap data per hex

Tessellation • Terrain very high detail, roughly 64 x 64 heightmap data per hex • Triangle count, when zoomed out, can be in the millions • Used Tessellation as a ‘drop-in’

Tessellation Cont • Simple Bicupic Beta Spline patches • Adjusted global tessellation as camera

Tessellation Cont • Simple Bicupic Beta Spline patches • Adjusted global tessellation as camera moved in and out • A strict performance increase : 10%-40% faster, on both AMD and Nvidia hardware. • More Adapative techinques would work even better, but didn’t have time to implement them

Leaders

Leaders

Leader Rendering • Largely done with DX 10. 1 rendering tech • New Variable

Leader Rendering • Largely done with DX 10. 1 rendering tech • New Variable bit rate compression technology implemented for D 3 D 11. • 2. 5 GBs of texture data reduced to 150 mbs, can be decompressed on the GPU • Details forthcoming, research is in publication submission process – extensive use of UAVs

Future Stuff, NO AO

Future Stuff, NO AO

Future Stuff (CS), AO

Future Stuff (CS), AO

Q&A

Q&A