Windows Display Driver Model WDDM v 2 And

  • Slides: 42
Download presentation
Windows Display Driver Model (WDDM) v 2 And Beyond Steve Pronovost, Microsoft Henry Moreton,

Windows Display Driver Model (WDDM) v 2 And Beyond Steve Pronovost, Microsoft Henry Moreton, NVIDIA Tim Kelley, ATI

Outline Introduction Trends in use of GPU(s) WDDM v 1. 0 overview WDDM v.

Outline Introduction Trends in use of GPU(s) WDDM v 1. 0 overview WDDM v. 2. x overview Scenarios that benefit

Trends In Use Of GPU Windows XP: Single client at a time GDI desktop

Trends In Use Of GPU Windows XP: Single client at a time GDI desktop Video decoding Full screen game CAD/Workstation applications GPUs getting more flexible Direct 3 D pushing increased programmability, precision and performance Massive processing power, not fully utilized today

Trends In Use Of GPU Windows Vista: Multiple clients together Desktop window manager Win.

Trends In Use Of GPU Windows Vista: Multiple clients together Desktop window manager Win. FX APIs based on Direct 3 D 9 Picture, video playback, capture, encode, transcode, edit leverage GPUs In-box games Emerging General – Purpose-GPU trend Physics, image processing, etc.

WDDM v 1. 0 Designed to work on existing GPUs Increase stability, robustness and

WDDM v 1. 0 Designed to work on existing GPUs Increase stability, robustness and security GPU scheduling Virtualized video memory Resource virtualization seamless across legacy API Ddraw, dx 3, dx 5, dx 6, dx 7, dx 8, dx 9, OGL Use new API to take full advantage of resource virtualization Direct 3 D 9 Ex, Direct 3 D 10

WDDM v 2. 0 New generation of GPUs designed for multi-tasking Mid command buffer

WDDM v 2. 0 New generation of GPUs designed for multi-tasking Mid command buffer preemption Demand faulting of resources Surface fault (preferred mode for v 2. 0) Page fault (stall the GPU) Per process page tables Better multi-tasking than WDDM v 1. 0, still some client cooperation required

WDDM v 2. 1 Everything WDDM v 2. 0 GPU can do Fine grained

WDDM v 2. 1 Everything WDDM v 2. 0 GPU can do Fine grained context switching Can preempt mid pixel Doesn’t stall GPU on page fault True preemptive multi-tasking Ultimate flexibility for the GPU can be used for any scenarios without impact on the desktop

WDDM Cheat Sheet WDDM v 1. 0 WDDM v 2. 1 Scheduling Packet Run.

WDDM Cheat Sheet WDDM v 1. 0 WDDM v 2. 1 Scheduling Packet Run. List Preemption Packet Mid Pixel Demand faulting Memory Management Multi-tasking Not supported Surface/ Page (STALL) Page Physical/ Contiguous Virtual/ Page table Cooperative Mostly Preemptive Truly Preemptive

WDDM 2. x Scheduling, Performance And Multi-GPU Support Henry Moreton NVIDIA

WDDM 2. x Scheduling, Performance And Multi-GPU Support Henry Moreton NVIDIA

GPUs On The Desktop The power of the GPU is finally tapped Graphics Video

GPUs On The Desktop The power of the GPU is finally tapped Graphics Video Bandwidth and floating point (GPGPU) Applications are vying for this powerful resource The Vista Desktop Window Manager (DWM) Photo editing Video feeds Personal Video Recorder

GPU Management Is Crucial Applications naturally see the processor as their own Great GPU

GPU Management Is Crucial Applications naturally see the processor as their own Great GPU tasks really exploit the power But. . . Some GPU operations are so massive they take non-trivial time Some GPU operations are time sensitive Management of the GPU is crucial to success (a happy user)

A Typical Situation (For Me) Watching The Daily Show© Doodling with photos I find

A Typical Situation (For Me) Watching The Daily Show© Doodling with photos I find a great program for creating panoramas. . . Today I set it up with twelve, 6 mega-pixel images Press go and wait. . . a long time (minutes) Soon, with GPU acceleration, I press go and wait a second or two

But A Second Or Two Is A Long Time Managed as a shared resource

But A Second Or Two Is A Long Time Managed as a shared resource the GPU Renders my video unaffected Builds my panorama in no time. . . Unmanaged The Daily Show risks being a slide show. . .

So Scheduling Is Important How does scheduling vary across WDDM v 1. 0 WDDM

So Scheduling Is Important How does scheduling vary across WDDM v 1. 0 WDDM v 2. 1 What are the mechanics? What is the context switch behavior? What is expected performance? With varying numbers of active contexts. . .

WDDM v 2. x – The Care And Feeding Of The GPU User Mode

WDDM v 2. x – The Care And Feeding Of The GPU User Mode Driver (UMD) Creates DMA buffer of commands Kernel Mode Driver (KMD) Appends DMA buffer to GPU context’s queue The GPU Scheduler schedules contexts A Run List of contexts each with its own ring buffer of DMA buffers

Run Lists List of contexts (box) GPU processes a context until Context is completed

Run Lists List of contexts (box) GPU processes a context until Context is completed (get new run list) Scheduler pre-empts Page fault – WDDM v 2. 1 Protection fault Synchronization event Multiple contexts per Run List Hide latency

How Nimble Is Context Switching? XP All Q’d DP 2 buffers must complete (very

How Nimble Is Context Switching? XP All Q’d DP 2 buffers must complete (very coarse) WDDM v 1. 0 – Basic scheduling Current DMA buffer must complete (coarse) WDDM v 2. 0 Switch on command/triangle (fine) WDDM v 2. 1 Switch “immediately” (very fine)

Context Switch Guarantees Pre WDDM v 2. 1 (XP, v 1. 0, v 2.

Context Switch Guarantees Pre WDDM v 2. 1 (XP, v 1. 0, v 2. 0) No guarantee VERY long shader, VERY large triangle slow to switch expected performance Relatively coarse switching for XP and v 1. 0 V 2. 0: Good average/typical switch time WDDM v 2. 1 Guaranteed to context switch Same average/typical switch time as v 2. 0 Much better switch time on applications with long shaders

Context Switch Challenge Because GPUs are heavily threaded there is much more state than

Context Switch Challenge Because GPUs are heavily threaded there is much more state than on a CPU Consider rendering @ 60 fps 17 millisecond frame time With a context switch time of 100µs Three concurrent applications see a ~2% context switch overhead Fast GPU context switching is important and challenging!

WDDM v 2. x Efficiencies WDDM v 1. 0 User Mode Driver (UMD) creates

WDDM v 2. x Efficiencies WDDM v 1. 0 User Mode Driver (UMD) creates GPU-specific command buffer KMD patches addresses Copies to GPU visible DMA buffer WDDM v 2. 0 and 2. 1 UMD creates DMA buffer directly in GPU memory No copy, no patch, fast and efficient

Performance – Memory Footprint WDDM v 1. 0 No demand fault (page or surface)

Performance – Memory Footprint WDDM v 1. 0 No demand fault (page or surface) Entire surfaces resident – coarse grained OS must guarantee residence – CPU overhead WDDM v 2. 0 Surface fault – supports load on bind GPU switches to new context, no stalling Fault and stall – permits partial eviction GPU stalls waiting for missing page WDDM v 2. 1 Page fault – permits partial eviction/residence GPU switches to new context, no stalling

Multi-Engine, Multi-GPU Support GPUs are composed of nodes of engines Homogeneous nodes GPU 3

Multi-Engine, Multi-GPU Support GPUs are composed of nodes of engines Homogeneous nodes GPU 3 D video 3 D nodes Video nodes Copy, etc. Run. List per engine GPU Device-common address space Multiple GPU Contexts (per engine) Synchronization Fence, Trap, Wait, Signal

Multi-GPU Linked Adapter Split Frame Rendering Single logical adapter Multiple physical adapters Memory Mirrored

Multi-GPU Linked Adapter Split Frame Rendering Single logical adapter Multiple physical adapters Memory Mirrored or instanced Broadcast – multiple DMA buffer references

WDDM v 2. x Memory Management And Robustness Tim Kelley ATI

WDDM v 2. x Memory Management And Robustness Tim Kelley ATI

WDDM v 1. 0 Surface Mgmt All allocations (surfaces) referenced in DMA buffer must

WDDM v 1. 0 Surface Mgmt All allocations (surfaces) referenced in DMA buffer must be resident at GPU submit Driver tracks every allocation reference in the DMA buffer Contiguous memory for each allocation DMA buffers patched with physical addresses once surfaces are resident Driver defines DMA split points to identify minimal working set Significant risk of graphics memory thrashing

WDDM v 2. 0 Surface Faulting A step in the right direction GPU supports

WDDM v 2. 0 Surface Faulting A step in the right direction GPU supports per process virtual memory Two faulting behaviors Surface fault and context switch Page fault and stall In surface faulting, GPU probes first page of surface On probe of non-resident surface GPU faults GPU context switches to next run list entry Context switch is coarse grained; graphics pipeline drains OS Vid. Mm issues paging requests

WDDM v 2. 0 Page Fault And Stall Even if surface probe succeeds, entire

WDDM v 2. 0 Page Fault And Stall Even if surface probe succeeds, entire surface may not be resident GPU must still support page faulting On access to a non-resident page GPU faults and stalls Driver informs OS of missing pages OS Vid. Mm issues paging requests Driver restarts GPU once pages are resident Entire working set doesn’t have to be resident simultaneously

WDDM v 2. 1 Page Faulting Finally, full fledged page faulting with context switching!

WDDM v 2. 1 Page Faulting Finally, full fledged page faulting with context switching! GPUs support general page faulting and virtual memory per process On a page fault, GPU context switches to next run list entry Context switch is “immediate” OS can partially populate allocations to reduce an app’s working set GPU faults on non-resident page access GPU context switches to next run list entry

Dedicated Paging Engine Addition of high bandwidth copy engine for paging Operates in parallel

Dedicated Paging Engine Addition of high bandwidth copy engine for paging Operates in parallel to 3 D engine GPU can perform paging operations for one context in parallel with 3 D rendering for another context

Paging Determination GPU reports faulting address GPU/Driver determine set of pages needed to make

Paging Determination GPU reports faulting address GPU/Driver determine set of pages needed to make further progress GPU maintains a set of page access bits OS Vid. Mm uses the above to determine appropriate paging operations (including evictions) Additionally, OS uses heuristics to preload pages

Efficient Memory Management Steady state residency of surface data for applications No texture thrashing

Efficient Memory Management Steady state residency of surface data for applications No texture thrashing for apps whose working set fits into graphics memory No need for entire surface to be resident Apps with large surfaces run fast in smaller local memory if working set fits Page access info guides Vid. Mm eviction and promotion Reduced minimum physical memory requirements

WDDM v 2. x Robustness WDDM V 2. x increases OS robustness GPU uses

WDDM v 2. x Robustness WDDM V 2. x increases OS robustness GPU uses virtual addressing instead of physical Kernel mode driver (KMD) no longer patches DMA buffers with physical addresses User Mode Driver (UMD) builds DMA buffer KMD no longer validates command buffer KMD no longer copies cmd buffer to DMA buffer No DMA buffer splitting UMD no longer identifies split points OS no longer splits DMA buffers to fit resources

WDDM v 2. 1 Robustness Guaranteed sub-triangle context switching Driver processing on fault essentially

WDDM v 2. 1 Robustness Guaranteed sub-triangle context switching Driver processing on fault essentially eliminated No application can hog GPU Better application responsiveness Applications with arbitrarily complex GPU processing do not hinder other applications E. g. , Complex GPGPU number crunching alongside glitch free video

Security Per-process virtual memory Protection moved to GPU Patching eliminated from driver Privileged Operations

Security Per-process virtual memory Protection moved to GPU Patching eliminated from driver Privileged Operations Privileged memory More secure platform for future premium content protection

Privileged Operations DMA buffers created in user mode cannot compromise the system Can’t access

Privileged Operations DMA buffers created in user mode cannot compromise the system Can’t access memory belonging to other processes Can’t interfere with correct and robust operation Certain GPU operations are privileged and only available to KMD-built DMA buffers; Examples include Display settings GPU configuration Context switching controls UMD-created DMA buffers cannot perform privileged operations

Privileged Memory Provides secure location for page tables, ring buffers, and other allocations that

Privileged Memory Provides secure location for page tables, ring buffers, and other allocations that should be protected Malicious apps cannot compromise system security GPU maintains per-page privilege setting (in page table) Fault occurs on GPU access to privileged memory from limited DMA buffers constructed by UMD GPU access Bad DMA allowed for Page Buffer privileged Table DMA buffers constructed Process by KMD Ring V 2. 1 GPU Ring Buffer

WDDM Future And Conclusion Steve Pronovost Microsoft

WDDM Future And Conclusion Steve Pronovost Microsoft

Future: WDDM 3. x All the features of WDDM v 2. 1 Better support

Future: WDDM 3. x All the features of WDDM v 2. 1 Better support for content streaming Virtual machine support

Call To Action Invest in WDDM v 2. x GPU Find new interesting ways

Call To Action Invest in WDDM v 2. x GPU Find new interesting ways to use the GPU

Questions Or Feedback? Send e-mail to. Direct. X @ microsoft. com

Questions Or Feedback? Send e-mail to. Direct. X @ microsoft. com

© 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows Vista and other product names

© 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U. S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.