Direct X 12 Advanced Graphics and Performance Max

  • Slides: 45
Download presentation

Direct. X 12 Advanced Graphics and Performance Max Mc. Mullen Direct 3 D Development

Direct. X 12 Advanced Graphics and Performance Max Mc. Mullen Direct 3 D Development Lead Microsoft

It’s been a busy year… • API is largely complete, with working drivers •

It’s been a busy year… • API is largely complete, with working drivers • Over 50% of gamers have Direct. X 12 hardware • Massive industry support: Early Access, Engines, Titles • 1 yr free upgrade to Windows 10 from Windows 7, 8. x • And now… *Based on Steam survey

Agenda • Refresh on Direct 3 D 12 • New Feature Levels • Unity

Agenda • Refresh on Direct 3 D 12 • New Feature Levels • Unity on Direct 3 D 12 • CPU & GPU Performance Improvements • Fable – 11 versus 12

Direct 3 D 12 API • Reduce CPU overhead • Increase scalability across multiple

Direct 3 D 12 API • Reduce CPU overhead • Increase scalability across multiple CPU cores • Greater developer control • Console level API efficiency and performance • Superset of D 3 D 11 rendering functionality

CPU Overhead and Multithread Improvements • Pipeline state objects • Explicit resource binding management

CPU Overhead and Multithread Improvements • Pipeline state objects • Explicit resource binding management • Flexible pipeline parameterization • Explicit CPU/GPU synchronization • Command Reuse

Pipeline State Objects D 3 D Vertex Shader D 3 DHS/DS/… Rasterizer D 3

Pipeline State Objects D 3 D Vertex Shader D 3 DHS/DS/… Rasterizer D 3 D Pixel Shader D 3 D Blend State Pipeline State Object HW State 1 HW State 2 HW State 3

Explicit Resource Binding Management Descriptor { Type Format Mip Count p. Data } Descriptor

Explicit Resource Binding Management Descriptor { Type Format Mip Count p. Data } Descriptor Heap Descriptor Table Start Index Size

Resource Binding Tiers Max Descriptor Heap CBV/SRV/UAVs Max CBVs per stage Max SRVs per

Resource Binding Tiers Max Descriptor Heap CBV/SRV/UAVs Max CBVs per stage Max SRVs per stage Max UAVs in all stages Max Samplers per stage Max SRV Descriptor Tables Tier 1 220 Tier 2 220 Tier 3 220+ 14 128 8 16 5 14 full heap 64 full heap 5 full heap no limit

Binding Tiers in the D 3 D 12 Market Tier 3 [PERCENTAG E] Tier

Binding Tiers in the D 3 D 12 Market Tier 3 [PERCENTAG E] Tier 2 44% Tier 1 39%

Explicit Resource Binding: Hazard Resolution • Resource hazards • • Render Target to/from Texture

Explicit Resource Binding: Hazard Resolution • Resource hazards • • Render Target to/from Texture Copy Source to/from Copy Destination Tiled Resource Aliasing etc… • Resource. Barrier API to resolve hazards

Flexible Pipeline Parameterization • Two parts: Root Signature and Root Arguments • Contains constants,

Flexible Pipeline Parameterization • Two parts: Root Signature and Root Arguments • Contains constants, descriptors, and descriptor tables • Leverage hardware specific registers and pipelined renaming paths for highest frequency parameters • Remove indirection from a constant descriptor index to an explicit descriptor

Explicit CPU/GPU synchronization • Application responsible to manage CPU & GPU race-conditions • Synchronization

Explicit CPU/GPU synchronization • Application responsible to manage CPU & GPU race-conditions • Synchronization primitive is a fence • Application chooses granularity of synchronization • One increment per-frame is well amortized • Increment per command list submission possible

New Feature Levels Direct 3 D 12

New Feature Levels Direct 3 D 12

New Rendering Features • Conservative Rasterization • ROVs • Typed UAV Loads • Tiled

New Rendering Features • Conservative Rasterization • ROVs • Typed UAV Loads • Tiled Resources Tier 3: Volumes • PS Specified Stencil Ref

New Feature Levels • Feature Level 12. 0 • Resource Binding Tier 2 •

New Feature Levels • Feature Level 12. 0 • Resource Binding Tier 2 • Tiled Resources Tier 2: Texture 2 D • Typed UAV Tier 1 • Feature Level 12. 1 • Conservative Rasterization Tier 1 • ROVs

Unity on Direct 3 D 12 Kasper Engelstoft Unity Graphics Engineer

Unity on Direct 3 D 12 Kasper Engelstoft Unity Graphics Engineer

Direct 3 D 12 in Unity • Porting experience • Case study: multithreaded shadow

Direct 3 D 12 in Unity • Porting experience • Case study: multithreaded shadow rendering • What’s next for D 3 D 12 in Unity?

D 3 D 12 porting experience • • • Started porting in September with

D 3 D 12 porting experience • • • Started porting in September with SDK 1 After 2 weeks, we had something rendering In October, SDK 2 API changes hit. . . Mid-January 95% of our tests were passing Then SDK 3 hit. . .

D 3 D 12 optimization case study • Multi-threading shadow map rendering • Move

D 3 D 12 optimization case study • Multi-threading shadow map rendering • Move work away from main thread • Generate d 3 d cmd lists for each of the shadow maps on their own worker threads • Cmd lists executed in parallel with the main scene cmd list building

Why shadow maps? • Rendered before the main scene • Simple render loop •

Why shadow maps? • Rendered before the main scene • Simple render loop • Extracting receivers & casters is quite CPU intensive • The shadow jobs don’t require waiting until ID 3 D 12 Command. List needs to be executed

Before

Before

After

After

Future D 3 D 12 work • Prerecorded command bundles • One bundle per

Future D 3 D 12 work • Prerecorded command bundles • One bundle per material pass • Bundles for standard operations • Mipmap generation • Use shader model 5. 1 features

CPU & GPU Performance Improvements Direct 3 D 12

CPU & GPU Performance Improvements Direct 3 D 12

Shader Cache • Redundant compilation from IL to hardware specific instructions • Optimize startup

Shader Cache • Redundant compilation from IL to hardware specific instructions • Optimize startup and level load times, reduce glitches Heavy shader compilation during start-up Heavy shader compilation during level-load CPU Usage (%) Time (s) start-up menu level load play

Shader Cache • Frames typically have 200 to 400 Pipeline State Objects • Long

Shader Cache • Frames typically have 200 to 400 Pipeline State Objects • Long traces typically have 300 to 1000 Pipeline State Objects • Cache operates on fully compiled PSOs, not individual shader stages • Serialization and deserialization under developer control

Execute. Indirect • Replacement for Draw. Indirect and Dispatch. Indirect • Can perform multiple

Execute. Indirect • Replacement for Draw. Indirect and Dispatch. Indirect • Can perform multiple draws with a single API call • Number of draws can be controlled by CPU or GPU • Can even change bindings between draw calls • Works on all 12 hardware from FL 11. 0 and up

Execute. Indirect Command Signature • Operations performed by Execute. Indirect described by a command

Execute. Indirect Command Signature • Operations performed by Execute. Indirect described by a command signature • Describes the layout of the argument buffer and the set of commands • Operations include: • • Set vertex or index buffer Change root constants Set root resource views (SRV, UAV, CBV) Draw, Draw. Indexed, or Dispatch

Execute. Indirect versus Draw Loop for (UINT draw. Idx = draw. Start; draw. Idx

Execute. Indirect versus Draw Loop for (UINT draw. Idx = draw. Start; draw. Idx < draw. End; ++draw. Idx) m. Cmd. Lst->Set. Graphics. Root. Descriptor. Table(RT_SRV, m. Texture. Start); { // Set bindings cmd. Lst->Set. Graphics. Root. Constant. Buffer. View(RT_CBV, constants. Pointer); constants. Pointer += sizeof(Draw. Constant. Buffer); auto texture. SRV = texture. Start. SRV. Make. Offsetted(static. Data->texture. Index, handle. Increment. Size); cmd. Lst->Set. Graphics. Root. Descriptor. Table(RT_SRV, texture. SRV); cmd. Lst->Draw. Indexed. Instanced(dynamic. Data>index. Count, 1, dynamic. Data->index. Start, static. Data>vertex. Start, 0); } m. Cmd. Lst->Execute. Indirect(m. Command. Signature, settings. num. Asteroids, frame->m. Indirect. Arg. Buffer>Heap(), 0, nullptr, 0);

Execute. Indirect Demo Intel’s Asteroids Demo Updated

Execute. Indirect Demo Intel’s Asteroids Demo Updated

Execute. Indirect Demo CPU GPU 11 39. 19 ms 34. 81 ms 12 12

Execute. Indirect Demo CPU GPU 11 39. 19 ms 34. 81 ms 12 12 Bindless 33. 41 ms 28. 77 ms 12. 85 ms 11. 86 ms 12 Execute. Indirect 5. 69 ms 10. 59 ms

Flexible Predication and Queries • Predicates & Queries are now an explicit resource creation

Flexible Predication and Queries • Predicates & Queries are now an explicit resource creation on GPU accessible heaps • Rendering operations can be predicated based on arbitrary computation performed by the CPU or GPU • Resolve operation transforms hardware specific query representation into standardized buffer contents • Apps that have lots of occlusion queries per frame will see improved performance due to bulk resolves

Multiengine • Expose multiple parallel queues as explicit API objects • Queue Types: 3

Multiengine • Expose multiple parallel queues as explicit API objects • Queue Types: 3 D, Compute, Copy • Prioritized queues enable new scenarios • High priority, latency sensitive workloads • Low priority background tasks 3 D Compute Copy

Multiengine 3 D Queue Render Compute Copy Queue Stream textures Signal Fence 1 Wait

Multiengine 3 D Queue Render Compute Copy Queue Stream textures Signal Fence 1 Wait Fence 1 Render

Multiengine 3 D Queue Render Wait Fence 1 Render Copy Queue Stream textures Signal

Multiengine 3 D Queue Render Wait Fence 1 Render Copy Queue Stream textures Signal Fence 1 Compute Render

Multiengine Demo Compute and Copy Scenario Test

Multiengine Demo Compute and Copy Scenario Test

UAV Barriers • In D 3 D 11 all UAV accesses in 1 Draw/Dispatch

UAV Barriers • In D 3 D 11 all UAV accesses in 1 Draw/Dispatch must complete before any UAV accesses in a subsequent Draw/Dispatch • This results in idle GPU shader cores for small Draw/Dispatch • In D 3 D 12 UAV accesses in multiple Draw/Dispatch are truly unordered, applications must use an explicit barrier to enforce ordering • D 3 D 12 – putting the “U” back in UAV

UAV Barriers Direct 3 D 11 Draw+UAV Wait for Idle Dispatch Wait for Idle

UAV Barriers Direct 3 D 11 Draw+UAV Wait for Idle Dispatch Wait for Idle Draw+UAV Dispatch Draw+UAV Barrier Draw+UAV Direct 3 D 12 Draw+UAV Wait for Idle Draw+UAV

UAV Barrier – Fable A/B Demo

UAV Barrier – Fable A/B Demo

Fable: 11 versus 12

Fable: 11 versus 12

Summary • Dramatically reduced CPU overhead • Great multithreaded scalability • Expose new GPU

Summary • Dramatically reduced CPU overhead • Great multithreaded scalability • Expose new GPU capabilities • Increase GPU performance • Greater developer control

Resources – Previous Talks • IDF 2014: https: //intel. lanyonevents. com/sf 14/connect/session. Detail. ww?

Resources – Previous Talks • IDF 2014: https: //intel. lanyonevents. com/sf 14/connect/session. Detail. ww? SESSI ON_ID=1315 • GDC 2014/Build 2014: http: //channel 9. msdn. com/Events/Build/2014/3 -564

Resources • Check our booths and quick start challenge at the Expo • Join

Resources • Check our booths and quick start challenge at the Expo • Join early access: http: //1 drv. ms/1 dgelm 6 • Upcoming GDC 2015 Talks: • Direct. X Tools: http: //schedule. gdconf. com/session/solve-the-tough-graphicsproblems-with-your-game-using-directx-tools-presented-by-microsoft • Direct 3 D 12 Power & Performance: http: //schedule. gdconf. com/session/better-power-better-performance-yourgame-on-directx 12 -presented-by-microsoft • And several talks by hardware partners…

© 2015 Microsoft Corporation. All rights reserved. Microsoft, Xbox, Windows, and other product names

© 2015 Microsoft Corporation. All rights reserved. Microsoft, Xbox, Windows, and other product names are or may be registered trademarks and/or trademarks in the U. S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.