Direct X 12 Advanced Graphics and Performance Max









![Binding Tiers in the D 3 D 12 Market Tier 3 [PERCENTAG E] Tier Binding Tiers in the D 3 D 12 Market Tier 3 [PERCENTAG E] Tier](https://slidetodoc.com/presentation_image_h/c16ac02d9981fa55a4bc19402d33794e/image-10.jpg)



































- Slides: 45
Direct. X 12 Advanced Graphics and Performance Max Mc. Mullen Direct 3 D Development Lead Microsoft
It’s been a busy year… • API is largely complete, with working drivers • Over 50% of gamers have Direct. X 12 hardware • Massive industry support: Early Access, Engines, Titles • 1 yr free upgrade to Windows 10 from Windows 7, 8. x • And now… *Based on Steam survey
Agenda • Refresh on Direct 3 D 12 • New Feature Levels • Unity on Direct 3 D 12 • CPU & GPU Performance Improvements • Fable – 11 versus 12
Direct 3 D 12 API • Reduce CPU overhead • Increase scalability across multiple CPU cores • Greater developer control • Console level API efficiency and performance • Superset of D 3 D 11 rendering functionality
CPU Overhead and Multithread Improvements • Pipeline state objects • Explicit resource binding management • Flexible pipeline parameterization • Explicit CPU/GPU synchronization • Command Reuse
Pipeline State Objects D 3 D Vertex Shader D 3 DHS/DS/… Rasterizer D 3 D Pixel Shader D 3 D Blend State Pipeline State Object HW State 1 HW State 2 HW State 3
Explicit Resource Binding Management Descriptor { Type Format Mip Count p. Data } Descriptor Heap Descriptor Table Start Index Size
Resource Binding Tiers Max Descriptor Heap CBV/SRV/UAVs Max CBVs per stage Max SRVs per stage Max UAVs in all stages Max Samplers per stage Max SRV Descriptor Tables Tier 1 220 Tier 2 220 Tier 3 220+ 14 128 8 16 5 14 full heap 64 full heap 5 full heap no limit
Binding Tiers in the D 3 D 12 Market Tier 3 [PERCENTAG E] Tier 2 44% Tier 1 39%
Explicit Resource Binding: Hazard Resolution • Resource hazards • • Render Target to/from Texture Copy Source to/from Copy Destination Tiled Resource Aliasing etc… • Resource. Barrier API to resolve hazards
Flexible Pipeline Parameterization • Two parts: Root Signature and Root Arguments • Contains constants, descriptors, and descriptor tables • Leverage hardware specific registers and pipelined renaming paths for highest frequency parameters • Remove indirection from a constant descriptor index to an explicit descriptor
Explicit CPU/GPU synchronization • Application responsible to manage CPU & GPU race-conditions • Synchronization primitive is a fence • Application chooses granularity of synchronization • One increment per-frame is well amortized • Increment per command list submission possible
New Feature Levels Direct 3 D 12
New Rendering Features • Conservative Rasterization • ROVs • Typed UAV Loads • Tiled Resources Tier 3: Volumes • PS Specified Stencil Ref
New Feature Levels • Feature Level 12. 0 • Resource Binding Tier 2 • Tiled Resources Tier 2: Texture 2 D • Typed UAV Tier 1 • Feature Level 12. 1 • Conservative Rasterization Tier 1 • ROVs
Unity on Direct 3 D 12 Kasper Engelstoft Unity Graphics Engineer
Direct 3 D 12 in Unity • Porting experience • Case study: multithreaded shadow rendering • What’s next for D 3 D 12 in Unity?
D 3 D 12 porting experience • • • Started porting in September with SDK 1 After 2 weeks, we had something rendering In October, SDK 2 API changes hit. . . Mid-January 95% of our tests were passing Then SDK 3 hit. . .
D 3 D 12 optimization case study • Multi-threading shadow map rendering • Move work away from main thread • Generate d 3 d cmd lists for each of the shadow maps on their own worker threads • Cmd lists executed in parallel with the main scene cmd list building
Why shadow maps? • Rendered before the main scene • Simple render loop • Extracting receivers & casters is quite CPU intensive • The shadow jobs don’t require waiting until ID 3 D 12 Command. List needs to be executed
Before
After
Future D 3 D 12 work • Prerecorded command bundles • One bundle per material pass • Bundles for standard operations • Mipmap generation • Use shader model 5. 1 features
CPU & GPU Performance Improvements Direct 3 D 12
Shader Cache • Redundant compilation from IL to hardware specific instructions • Optimize startup and level load times, reduce glitches Heavy shader compilation during start-up Heavy shader compilation during level-load CPU Usage (%) Time (s) start-up menu level load play
Shader Cache • Frames typically have 200 to 400 Pipeline State Objects • Long traces typically have 300 to 1000 Pipeline State Objects • Cache operates on fully compiled PSOs, not individual shader stages • Serialization and deserialization under developer control
Execute. Indirect • Replacement for Draw. Indirect and Dispatch. Indirect • Can perform multiple draws with a single API call • Number of draws can be controlled by CPU or GPU • Can even change bindings between draw calls • Works on all 12 hardware from FL 11. 0 and up
Execute. Indirect Command Signature • Operations performed by Execute. Indirect described by a command signature • Describes the layout of the argument buffer and the set of commands • Operations include: • • Set vertex or index buffer Change root constants Set root resource views (SRV, UAV, CBV) Draw, Draw. Indexed, or Dispatch
Execute. Indirect versus Draw Loop for (UINT draw. Idx = draw. Start; draw. Idx < draw. End; ++draw. Idx) m. Cmd. Lst->Set. Graphics. Root. Descriptor. Table(RT_SRV, m. Texture. Start); { // Set bindings cmd. Lst->Set. Graphics. Root. Constant. Buffer. View(RT_CBV, constants. Pointer); constants. Pointer += sizeof(Draw. Constant. Buffer); auto texture. SRV = texture. Start. SRV. Make. Offsetted(static. Data->texture. Index, handle. Increment. Size); cmd. Lst->Set. Graphics. Root. Descriptor. Table(RT_SRV, texture. SRV); cmd. Lst->Draw. Indexed. Instanced(dynamic. Data>index. Count, 1, dynamic. Data->index. Start, static. Data>vertex. Start, 0); } m. Cmd. Lst->Execute. Indirect(m. Command. Signature, settings. num. Asteroids, frame->m. Indirect. Arg. Buffer>Heap(), 0, nullptr, 0);
Execute. Indirect Demo Intel’s Asteroids Demo Updated
Execute. Indirect Demo CPU GPU 11 39. 19 ms 34. 81 ms 12 12 Bindless 33. 41 ms 28. 77 ms 12. 85 ms 11. 86 ms 12 Execute. Indirect 5. 69 ms 10. 59 ms
Flexible Predication and Queries • Predicates & Queries are now an explicit resource creation on GPU accessible heaps • Rendering operations can be predicated based on arbitrary computation performed by the CPU or GPU • Resolve operation transforms hardware specific query representation into standardized buffer contents • Apps that have lots of occlusion queries per frame will see improved performance due to bulk resolves
Multiengine • Expose multiple parallel queues as explicit API objects • Queue Types: 3 D, Compute, Copy • Prioritized queues enable new scenarios • High priority, latency sensitive workloads • Low priority background tasks 3 D Compute Copy
Multiengine 3 D Queue Render Compute Copy Queue Stream textures Signal Fence 1 Wait Fence 1 Render
Multiengine 3 D Queue Render Wait Fence 1 Render Copy Queue Stream textures Signal Fence 1 Compute Render
Multiengine Demo Compute and Copy Scenario Test
UAV Barriers • In D 3 D 11 all UAV accesses in 1 Draw/Dispatch must complete before any UAV accesses in a subsequent Draw/Dispatch • This results in idle GPU shader cores for small Draw/Dispatch • In D 3 D 12 UAV accesses in multiple Draw/Dispatch are truly unordered, applications must use an explicit barrier to enforce ordering • D 3 D 12 – putting the “U” back in UAV
UAV Barriers Direct 3 D 11 Draw+UAV Wait for Idle Dispatch Wait for Idle Draw+UAV Dispatch Draw+UAV Barrier Draw+UAV Direct 3 D 12 Draw+UAV Wait for Idle Draw+UAV
UAV Barrier – Fable A/B Demo
Fable: 11 versus 12
Summary • Dramatically reduced CPU overhead • Great multithreaded scalability • Expose new GPU capabilities • Increase GPU performance • Greater developer control
Resources – Previous Talks • IDF 2014: https: //intel. lanyonevents. com/sf 14/connect/session. Detail. ww? SESSI ON_ID=1315 • GDC 2014/Build 2014: http: //channel 9. msdn. com/Events/Build/2014/3 -564
Resources • Check our booths and quick start challenge at the Expo • Join early access: http: //1 drv. ms/1 dgelm 6 • Upcoming GDC 2015 Talks: • Direct. X Tools: http: //schedule. gdconf. com/session/solve-the-tough-graphicsproblems-with-your-game-using-directx-tools-presented-by-microsoft • Direct 3 D 12 Power & Performance: http: //schedule. gdconf. com/session/better-power-better-performance-yourgame-on-directx 12 -presented-by-microsoft • And several talks by hardware partners…
© 2015 Microsoft Corporation. All rights reserved. Microsoft, Xbox, Windows, and other product names are or may be registered trademarks and/or trademarks in the U. S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.