Hitting 60 Hz with the Unreal Engine Inside

  • Slides: 38
Download presentation
Hitting 60 Hz with the Unreal Engine: Inside the Tech of Mortal Kombat vs

Hitting 60 Hz with the Unreal Engine: Inside the Tech of Mortal Kombat vs DC Universe Jon Greenberg Graphics Programming Lead MK Team

Why Bother? • In general, “twitch” games require very high framerate. • Fast input

Why Bother? • In general, “twitch” games require very high framerate. • Fast input response demands fast feedback to player • Running at 60 Hz a basic requirement of fighting genre.

Why Is 60 So Rare? • Very few games target 60 Hz (< 10%

Why Is 60 So Rare? • Very few games target 60 Hz (< 10% of games) • Only 16. 7 ms in which to do everything vs 33. 3 ms at 30 Hz. Implies half the time to do everything… this is not correct. • In general, this means you have ~1/3 the time, due to fixed cost overhead which can’t be removed. • Customer doesn’t “care” that you have less time to do everything – still wants game to look great. • Game must hit 60 Hz on both PS 3 and Xbox 360, and both versions look as close as possible!

Why 1/3 rd The Time? • Game must run at >= 60 Hz –

Why 1/3 rd The Time? • Game must run at >= 60 Hz – not allowed to drop frames (bog). • This means we have to set aside headroom that can absorb instantaneous spikes. • MK vs DC steady state ~= 9. 5 ms per frame. • Allows for lot of particle effects and variability. Other genres (even other fighting games) likely need a great deal less slack. • Philosophy: Always address worst-case scenarios up front.

The Problem (part 1) • Midway had decided to use Unreal. Engine 3 (UE

The Problem (part 1) • Midway had decided to use Unreal. Engine 3 (UE 3) as basic middleware across all internal games. Using UE 3 was required by mgmt. • UE 3 was (is) designed for 30 Hz FPS/3 rd person action genre titles. • We started with the October 2006 (post Gears of War 1) codebase. Some additional features taken from Epic à-la-carte. Ex: MITV, file caching, misc fixes. • About 22 months to develop the game.

The Problem (part 2) • UE 3 brings a lot to the table (nice

The Problem (part 2) • UE 3 brings a lot to the table (nice tools, wide feature set) but imposes a lot of heavy fixed costs. • There also some choices made in the engine that have problematic side effects for 60 Hz play (UObject overhead, Garbage Collection, etc). • Out-of-the-box fixed cost baseline (especially GPU) too high for a 60 Hz title. • Eg. , Oct 06 build GPU baseline ~ 9 ms.

Breaking it Down • GPU Overhead • • • GPU Fixed costs General rendering

Breaking it Down • GPU Overhead • • • GPU Fixed costs General rendering overhead Multipass overhead Lighting cost Particle cost • CPU Overhead • Particle cost • Cloth & Water • Render thread virtual overhead/state caching

GPU Fixed Costs • Post-processing • Usually the biggest fixed cost. • Combine as

GPU Fixed Costs • Post-processing • Usually the biggest fixed cost. • Combine as many operations together as possible to hide work (ie, Bloom+DOF+Gamma+Resolution retarget) • Cut as many corners as possible and special case as necessary – eg. we use 1 of 3 different DOF methods depending on the case: • • • Normal gameplay: classic blur cross-fade Main Menu/Cinematics: dialating Poisson disc Klose-Kombat: a series of blur planes. • “Normal” DOF+Bloom effect cost = 1. 8 ms

Bloom • Bloom is done a little strangely to compensate for linear color range

Bloom • Bloom is done a little strangely to compensate for linear color range and not having a separate downsample/blur: • Per environment thresholding value determines which pixels bloom. • Thresholding is done inside downsample pass and written out into the alpha channel as 0 or 1. • This bloom mask is then blurred along with color. • We had separate thresholding and strength values for characters and the general background to allow the two to be tuned differently. • Character masks were written/read from stencil buffer.

Distortion • • • Normal UE 3 distortion effect has 3 ms overhead! Instead,

Distortion • • • Normal UE 3 distortion effect has 3 ms overhead! Instead, fold Distortion into Translucency. Sample from a snapshot of opaque pass, and do a depth-based selection to prevent neardistortion. • Overhead now just “capturing snapshot” - just a copy blit of color buffer ~ 0. 4 ms. • Now usable everywhere! • Optionally support recapture of the “snapshot” per distorting effect to allow for layered distortion effects as well. Needed for water level.

Motion Blur • Very expensive to do full-screen. • Epic doesn’t support motion blurring

Motion Blur • Very expensive to do full-screen. • Epic doesn’t support motion blurring of skinned geometry! • Instead, motion blur effects done via rendering velocity-stretched fading geometry. • Required changing GPU skinning (PC/360) and Edge (PS 3 -SPU) to support skinning against previous bone positions. • Requires localized blur-only Z-prepass to prevent additive blur effects from blending badly.

Shadows and MSAA • • Game made use of MSAA-2 x on both platforms

Shadows and MSAA • • Game made use of MSAA-2 x on both platforms Resolving MSAA is very expensive on PS 3. Combine full-screen modulated shadow blit with MSAA color/depth resolve! Hide heavy texture bandwidth operations inside math heavy shadow work. Shadow ALU overhead high enough that we can also hide the Distortion copy blit! No self-shadowing – disabled via stencil mask. Once there’s no self-shadowing anyway, we use proxy shadow characters. Total cost ~= 1. 33 ms

Fog • • Fullscreen per-pixel ~2 ms on the GPU. Visible vertices < visible

Fog • • Fullscreen per-pixel ~2 ms on the GPU. Visible vertices < visible pixels! Per-pixel fog is often overkill. Replaced with pervertex fog and per-object fog (characters). To keep per-vertex costs low, only support 2 active fog actors. Heightfog is optional, and controlled via static branching. Also added optional undulating height fog, via pulsing sine-waves through the fog height. Dramatically cheaper!

General Rendering • 8 bpc render targets, linear color scale of 0. . 2.

General Rendering • 8 bpc render targets, linear color scale of 0. . 2. • We light in a combination of γ=1. 0 and γ=2. 2, depending on what we’re lighting, to save cost. • Opaque: uses MSAA • Translucent: post-MSAA resolve • Heavy use of Playstation Edge library for skinned and world geometry on PS 3. • 3 D resolution of the game was 1040 x 624 which was then scaled up to allow the HUD to render at 1280 x 720.

Multipass Overhead • Pass-per-light overhead is simply too high. • We’re mostly prelit, so

Multipass Overhead • Pass-per-light overhead is simply too high. • We’re mostly prelit, so we chose forward rendering. • Z-Prepass? Typical depth complexity < 1. 5. • Loosely sort opaque objects front to back via “rings of detail”. Removing Z-prepass saves ~0. 75 ms. • Touch each pixel only once if possible.

World Lighting (static) • World is prelit using Illuminate Labs’ Beast, with some “dynamic”

World Lighting (static) • World is prelit using Illuminate Labs’ Beast, with some “dynamic” RNMs built with Turtle. Dynamic RNMs are animated in materials or via MITVs. • Prelit lighting was a mix of texture and vertex RNM lighting, with a fast-path added to support per-vertex diffuse only RNM evaluation for distant objects.

World Lighting (dynamic) • Effect point lighting is done via a mix of perpixel

World Lighting (dynamic) • Effect point lighting is done via a mix of perpixel lighting (floors) and per-vertex (the rest of the environment). • To account for maximum load, shaders are built with three diffuse-only point lights active and burned into the material • No branching! All three lights always evaluated. • These lights are globally assigned and managed in 3 -deep FIFO.

Character Lighting (part 1) Custom lighting model: • Irradiance volume of SH coefficient sets.

Character Lighting (part 1) Custom lighting model: • Irradiance volume of SH coefficient sets. • Eval gradients to determine an SH-set per object. • Diffuse light the model using only the first 4 coefficients (“ambient” and “directional” term). • The 3 effect point lights are evaluated per-vertex and combined into the final diffuse lighting result. • Spec faked via power-scaling of (E • N) and multiplying by diffuse lighting.

Specularity

Specularity

Particle Effects • Very large problem. Cascade not very optimal. • Solution – port

Particle Effects • Very large problem. Cascade not very optimal. • Solution – port Cascade runtime async on separate worked threads (to SPU on PS 3)! • All emitters for a particle system updated in single block of async work (particles, emitter state, system state). • All particle Modules ported to SPU, except for collision (due to data complexity).

Character Lighting (part 2) • Skin transmission faked by using (E • N) as

Character Lighting (part 2) • Skin transmission faked by using (E • N) as lerp factor between diffuse lighting and SH ambient term. • Rim Lighting: power-scaling (1 -E • N) for falloff and then mul by hard thresholding (1 -E • N). • If threshold is raised high enough (~0. 7), ends up looking like chrome mapping!. • Final rendering cost ~= 0. 8 ms per character • Character mesh-chucks batch rendered.

Skin and Metal

Skin and Metal

The Story So Far… • So far costs are: • • Misc Shadowmaps: Characters:

The Story So Far… • So far costs are: • • Misc Shadowmaps: Characters: Environment: MSAA Resolve/Shadow: Post. FX: Total • What about particle effects? ~0. 5 1. 6 ~4. X 1. 3 1. 8 ~9. X ms ms

Particle Effects (CPU load) • All per-particle overhead removed from Game/Render thread! • Particle

Particle Effects (CPU load) • All per-particle overhead removed from Game/Render thread! • Particle overhead now a simple linear relationship between system count and emitter count. • On PC/360, vertex data for sprites created JIT by async worker thread. • No changes/compromises to artist tools or workflow.

Particle Effects (SPU load) • SPUs extremely fast. • Just used basic C++ code

Particle Effects (SPU load) • SPUs extremely fast. • Just used basic C++ code (including templates and polymorphism). No need to bother with intrinsics or ASM. • Same module code runs on PS 3/360. • Complex (dependant) DMAs done synchronously. Simpler to deal with and fast enough that it doesn’t matter. • Update done via SPURS job

Particle Effects (GPU load) • GPU overhead less straightforward • Attempt 1: Lie to

Particle Effects (GPU load) • GPU overhead less straightforward • Attempt 1: Lie to hardware and tell it we’re in MSAA-4 x on non-MSAA target. Looks okay on wispy stuff in general (smoke, fire, etc. ), but looks terrible on 360.

Particle Effects (GPU cont…) • Attempt 2: for somewhat opaque particles, break effect out

Particle Effects (GPU cont…) • Attempt 2: for somewhat opaque particles, break effect out into masked pass and unmasked pass, sorting particles for a system front to back before rendering to prime Z. 1. Render particles with alpha-test set to =1. 0, front to back 2. Render particles with alpha-test set to <1. 0, back to front • Didn’t help! Alpha-test disables ZCull writes, negating the benefits of the priming pass.

Particle Effects (GPU cont…) • Attempt 3: Observation – for flipbook effects, lots of

Particle Effects (GPU cont…) • Attempt 3: Observation – for flipbook effects, lots of time is wasted rendering alpha-0 space around meaningful content. • Idea: For flipbook effects, reduce particle dimensions (and UVs) to bound content of the particular flipbook page! • Works great! Dramatic fillrate improvement from doing this (>50%). Requires artist to identify channel to scan for image bounds.

General Render Thread Optimizations • Lots of work to reduce unnecessary operations. • Render

General Render Thread Optimizations • Lots of work to reduce unnecessary operations. • Render thread virtuals = death by a thousand paper cuts. • Cache as much state as possible to reduce redundant virtual calls. Eg, replaced FMaterial. Render. Proxy’s Get. Material virtual call with a caching call. • Remove tons of unneeded repeated calls to Get. XXX() (ie, Get. Pixel. Shader) states from inside Shader processing.

Misc Further optimizations • Cloth simulation moved to run async in another thread (SPU

Misc Further optimizations • Cloth simulation moved to run async in another thread (SPU on PS 3). • Epic’s water simulation code ported to run on SPU on PS 3. • Animation still synchronous Game-thread based, but doesn’t use Anim. Trees. Very limited blend options for designers. • No Occlusion pass – Vis is simple frustum culling. • Lots of work to reduce amount of memory allocation via pools and isolated heaps. Still, accounts for 25% of CPU time.

Garbage Collection • Based on work by Stranglehold team • Not quite as aggressive

Garbage Collection • Based on work by Stranglehold team • Not quite as aggressive as they were, but removes all live calling of GC from gameplay – only called when exiting modes. • Memory management switched to deferred (by a frame) cleanup of UObjects/AActors. • All “loaded” data trapped via Rootset • Introduces UResource class, a reference counting UObject. • All USurface derived classes (ie, UMaterial, UTexture, etc) are all reference counted via UResource to prevent unwanted deletion.

Additional Game Details • We don’t use Unreal. Script. Minimally use Kismet. Use our

Additional Game Details • We don’t use Unreal. Script. Minimally use Kismet. Use our own scripting engine (C/C++ish) for AI, object management, menu logic, etc. • Game scripts are expected to manage resource lifetimes. • Main advantage – dynamically reloadable for fast iteration! • MKScripts describe resource usage to determine cooked resources that need to be added to characters/backgrounds.

Artist Limitations • UE 3 gives artists a lot of rope to hang themselves

Artist Limitations • UE 3 gives artists a lot of rope to hang themselves with. • Big thing was to limit who could use the Material Editor. • All character art uses same small set of materials. • Characters budgeted at 20 k polys visible at a time. • Backgrounds budgeted based on visible object count and storage limitations more than polycount. • Environment material/lighting complexity managed by the background lead to ensure overall performance hit GPU performance targets, with various metrics helping to tell them where they were.

General Recommendations for hitting 60 Hz in UE 3 • Budget performance up front!

General Recommendations for hitting 60 Hz in UE 3 • Budget performance up front! • Given Edge and 360’s unified shaders, geometry less of a problem than fillrate. • Predetermine valid Post. Fx and hardwire the majority of permutations. • Reduce dynamic critical sectioned memory allocation as much as possible. Massively stalls all performance. • Use pool allocators whenever possible, and watch for realloc’s. • Force designers and artists to run with performance metrics on!

Recommendations for hitting 60 Hz in UE 3 on PS 3 (well, and 360)

Recommendations for hitting 60 Hz in UE 3 on PS 3 (well, and 360) • Consider what can be deferred and/or can be made to run async and consider moving that work. • Consider using Edge on PS 3. • Even sync’d work can be done way faster on SPU if divided over multiple SPUs/threads! • Don’t be intimidated by the SPUs on PS 3. Prototype SPU code on 360/PC where its easier to debug. • Template heavy C++ might not be ideal performance case for SPUs, but certainly a LOT better than not using them at all.

Things We Have Yet to Address • Serialization – as we tend to only

Things We Have Yet to Address • Serialization – as we tend to only stream content underneath movie playback or load screens, the CPU impact wasn’t too problematic for us, though it does impact load times. • Animation – need to explore making it run on worker threads/SPU for deferrable (background and LOD’d) objects.

Acknowledgements • Nathan Mefford • Chicago ATG Team

Acknowledgements • Nathan Mefford • Chicago ATG Team

Questions? • Thanks for listening!

Questions? • Thanks for listening!