A Whirlwind Tour of Vulkan Graham Sellers AMD

  • Slides: 46
Download presentation

A Whirlwind Tour of Vulkan Graham Sellers, AMD @grahamsellers

A Whirlwind Tour of Vulkan Graham Sellers, AMD @grahamsellers

Architecture Your code APPLICATION Khronos IHV / Driver Hardware LOADER DRIVER GPU GPU

Architecture Your code APPLICATION Khronos IHV / Driver Hardware LOADER DRIVER GPU GPU

Overview • Overview of the Vulkan System • Outline design goals • Show example

Overview • Overview of the Vulkan System • Outline design goals • Show example API usage

Goals • Major Vulkan design goals – High performance from a single thread –

Goals • Major Vulkan design goals – High performance from a single thread – Scalable to many threads – Scalable across wide range of architectures – Solid foundation for future development • Solve ecosystem issues

Application Startup • Vulkan is represented by an “instance” • Application can have multiple

Application Startup • Vulkan is represented by an “instance” • Application can have multiple Vulkan instances • Instance is owned by the loader – Aggregates drivers from multiple vendors – Responsible for discovery of GPUs – Makes multiple drivers look like one big driver supporting many GPUs

Application Startup • Application specifies to loader: – Information about itself – Callback interface

Application Startup • Application specifies to loader: – Information about itself – Callback interface for memory allocation Vk. Application. Info app. Info = {. . . }; Vk. Alloc. Callbacks alloc. Cb = {. . . }; Vk. Instance instance; vk. Create. Instance(&app. Info, &alloc. Cb, &instance); • Get back a Vullkan instance

Physical Devices • Devices are explicitly enumerated in Vulkan uint 32_t dev. Count; Vk.

Physical Devices • Devices are explicitly enumerated in Vulkan uint 32_t dev. Count; Vk. Physical. Device devices[10]; vk. Enumerate. Physical. Devices(instance, ARRAYSIZE(devices), &dev. Count, devices); • This produces a list of devices – Integrated + discrete – Multiple discrete GPUs in one system – Application manages multiple devices

Device Information • Applications can query information about devices Vk. Physical. Device. Features features

Device Information • Applications can query information about devices Vk. Physical. Device. Features features = {}; vk. Get. Physical. Device. Features(phsical. Device, &features); • Returns lots of information about the device – Capabilities, optional features, memory sizes, performance characteristics, etc.

Logical Devices • Logical device is a software representation of a GPU – This

Logical Devices • Logical device is a software representation of a GPU – This is what your application communicates with Vk. Device. Create. Info info = {. . . }; Vk. Device device; vk. Create. Device(physical. Device, &info, &device); • Parameters include information about application – What features it will to use – Which queues, extensions, etc.

Queues • Work is performed on queues – Queues run asynchronously to each other

Queues • Work is performed on queues – Queues run asynchronously to each other – Queues have different capabilities • Graphics, compute, DMA operations • Property of physical device

Queues • Get queue handle from the device Vk. Queue queue; vk. Get. Device.

Queues • Get queue handle from the device Vk. Queue queue; vk. Get. Device. Queue(device, 0, 0, &queue); • Queues are represented as members of families – Each family has specific capabilities – There is one or more queue in each family • Family and index are the two parameters above

Command Buffers • Commands are sent to a queue in command buffers Vk. Cmd.

Command Buffers • Commands are sent to a queue in command buffers Vk. Cmd. Buffer. Create. Info info; Vk. Cmd. Buffer cmd. Buffer; vk. Create. Command. Buffer(device, &info, &cmd. Buffer); • Creation parameters include: – Which queue family it will be submitted to – How aggressively drivers should optimize? – etc.

Command Buffers • Commands are inserted into command buffers Vk. Cmd. Buffer. Begin. Info

Command Buffers • Commands are inserted into command buffers Vk. Cmd. Buffer. Begin. Info info = {. . . }; vk. Begin. Command. Buffer(cmd. Buf, &info); vk. Cmd. Do. This. Thing(cmd. Buf, . . . ); vk. Cmd. Do. Some. Other. Thing(cmd. Buf, . . . ); vk. End. Command. Buffer(cmd. Buf); • Driver heavy lifting happens here – State validation, optimization, etc.

Pipelines • Pipelines contain most state – Compiled up front, used in command buffers

Pipelines • Pipelines contain most state – Compiled up front, used in command buffers Vk. Graphics. Pipeline. Create. Info info = {. . . }; Vk. Pipeline pipeline; vk. Create. Graphics. Pipelines(device, cache, 1, &info, &pipeline); – Contains compiled shaders, blend, multisample, etc. – Pipelines can be serialized into a cache • Improves application load time

Shaders • Shaders are compiled up front Vk. Shader. Create. Info info = {.

Shaders • Shaders are compiled up front Vk. Shader. Create. Info info = {. . . }; Vk. Shader shader; vk. Create. Shader(device, &info, &shader); • Primary (only) shading language for Vulkan is SPIR-V – Vendor neutral binary intermediate form – Same SPIR-V as used in Open. CL 2. 1 – Reference GLSL -> SPIR-V compiler available

Mutable State • A lot of pipeline state is immutable • Some state is

Mutable State • A lot of pipeline state is immutable • Some state is dynamic – Represented by smaller chunks of state Vk. Dynamic. Viewport. State. Create. Info vp. Info = {. . . }; Vk. Dynamic. Viweport. State vp. State; vk. Create. Dynamic. Viewport. State(device, &vp. Info, &vp. State); Vk. Dynamic. Depth. Stencil. Create. Info ds. Info = {. . . }; Vk. Dynamic. Depth. Stencil. State ds. State; vk. Create. Dynamic. Depth. Stencil. State(device, &ds. Info, &ds. State);

State Binding • State is bound to command buffers vk. Cmd. Bind. Pipeline(cmd. Buffer,

State Binding • State is bound to command buffers vk. Cmd. Bind. Pipeline(cmd. Buffer, VK_PIPELINE_BIND_POINT_GRAPHICS, pipeline); vk. Cmd. Bind. Dynamic. Viewport. State(cmd. Buffer, vp. State); vk. Cmd. Bind. Dynamic. Depth. Stencil. State(cmd. Buffer, ds. State); • State is inherited from draw to draw – It is not inherited across command buffer boundaries – Incremental update by dynamic state binding

Derivative State • Pipelines can be derived from other pipelines – Create a master

Derivative State • Pipelines can be derived from other pipelines – Create a master pipeline template – Modify creation parameters, create derivative • Provides performance opportunity – During creation, drivers can re-use state – At runtime, fast to switch between related states

Vulkan Resources • Resources are data that can be accessed by the device –

Vulkan Resources • Resources are data that can be accessed by the device – Examples are buffers and images • Resources represented by API objects Vk. Image. Create. Info image. Info = {. . . }; Vk. Image image; vk. Create. Image(device, &image. Info, &image); Vk. Buffer. Create. Info buffer. Info = {. . . }; Vk. Buffer buffer; vk. Create. Buffer(device, &buffer. Info, &buffer); • Memory for resources is managed by the application

Device Memory • Applications query objects for their memory needs: Vk. Memory. Requirements reqs;

Device Memory • Applications query objects for their memory needs: Vk. Memory. Requirements reqs; vk. Get. Image. Memory. Requirements(device, image, &reqs); • Application allocates memory for objects: Vk. Memory. Alloc. Info mem. Info = {. . . }; Vk. Device. Memory mem; vk. Alloc. Memory(device, &mem. Info, &mem); • Application binds memory to the resource: vk. Bind. Image. Memory(device, image, mem, 0);

Managing Memory • Application managed memory: – Application does pool management • Multiple resource

Managing Memory • Application managed memory: – Application does pool management • Multiple resource in a single allocation • Avoid overhead of allocation per object • Recycle memory between objects

Sharing Data • Unlike Open. GL, memory is mapped, not buffers – Bind memory

Sharing Data • Unlike Open. GL, memory is mapped, not buffers – Bind memory to buffer – Map memory for CPU access vk. Map. Memory(device, mem, offset, size, flags, &p. Data); • Flags control how memory is allocated and mapped – Control over caching, coherency, etc. provided – Zero-copy and UMA fully supported

Descriptors • Vulkan resources are represented by descriptors – Descriptors are arranged in sets

Descriptors • Vulkan resources are represented by descriptors – Descriptors are arranged in sets – Sets are allocated from pools – Sets have layouts, known at pipeline creation time vk. Create. Descriptor. Pool(. . . ); vk. Create. Descriptor. Set. Layout(. . . ); vk. Alloc. Descriptor. Sets(. . . );

Pipeline Layouts • Layouts represent arrangement of sets used by pipelines – Layout is

Pipeline Layouts • Layouts represent arrangement of sets used by pipelines – Layout is shared between sets and pipelines – Layout represented by Vk. Pipeline. Layout object • Used at pipeline create time – Switch pipelines using sets of the same layout • Pipelines are considered compatible vk. Create. Pipeline. Layout(. . . );

Render Passes • Frames logically organized into render passes Vk. Render. Pass. Create. Info

Render Passes • Frames logically organized into render passes Vk. Render. Pass. Create. Info info = {. . . }; Vk. Render. Pass render. Pass; vk. Create. Render. Pass(device, &info, &render. Pass); • Render pass contains a lot of information: – Layout and types of framebuffer attachments – What to do when the render pass begins and ends – Part of the framebuffer that the pass may effect

Merging Passes • Vulkan has the concept of a “sub-pass” – Allows multiple render

Merging Passes • Vulkan has the concept of a “sub-pass” – Allows multiple render passes to be merged – Intermediate attachments for transient data • Data passed from pass to pass • Tile-based architectures can keep data on chip • Might reuse memory for temporary surfaces

Drawing • Draws are always inside a render pass Vk. Render. Pass. Begin begin.

Drawing • Draws are always inside a render pass Vk. Render. Pass. Begin begin. Info = { render. Pass, . . . }; vk. Cmd. Begin. Render. Pass(cmd. Buffer, &begin. Info); vk. Cmd. Bind. Pipeline(cmd. Buffer, VK_PIPELINE_BIND_POINT_GRAPHICS, pipeline); vk. Cmd. Bind. Descriptor. Sets(cmd. Buffer, . . . ); vk. Cmd. Draw(cmd. Buffer, 0, 100, 1, 0); vk. Cmd. End. Render. Pass(cmd. Buffer, render. Pass); • All draw types supported – instancing, indirect, etc.

Compute • Compute pipelines are special – Possible to have (multiple) compute-only queues –

Compute • Compute pipelines are special – Possible to have (multiple) compute-only queues – Queues run asynchronously • Yes, asynchronous compute Vk. Compute. Pipeline. Create. Info info = {. . . }; Vk. Pipeline pipeline; vk. Create. Compute. Pipeline(device, cache, 1, &info, &pipeline); • Compute launched through dispatches

Synchronization • Work is synchronized through event primitives Vk. Event. Create. Info info =

Synchronization • Work is synchronized through event primitives Vk. Event. Create. Info info = {. . . }; Vk. Event event; vk. Create. Event(device, &info, &event); • Events may be set, reset, polled and waited on vk. Set. Event(. . . ); vk. Reset. Event(. . . ); vk. Get. Event. Status(. . . ); vk. Cmd. Set. Event(. . . ); vk. Cmd. Reset. Event(. . . ); vk. Cmd. Wait. Events(. . . );

Resource State • Resources can be in any of many states – Renderable, CPU

Resource State • Resources can be in any of many states – Renderable, CPU read, shader read or write, etc. – Drivers used to track this information • Not any more! Now it’s your job… Vk. Image. Memory. Barrier image. Barrier = {. . . }; vk. Cmd. Pipeline. Barrier(cmd. Buffer, . . . , 1, &image. Barrier); – Pass old state + stages, new state + stages – Driver will take care of the rest

Work Submission • Work is submitted to queues for execution Vm. Cmd. Buffer command.

Work Submission • Work is submitted to queues for execution Vm. Cmd. Buffer command. Buffers[] = { cmd. Buffer 1, cmd. Buffer 2, . . . }; vk. Queue. Submit(queue, 1, command. Buffers, fence); • A fence (Vk. Fence) is associated with the submission – This is signaled when work completes – CPU can wait on this fence • Queues marshal resources ownership with semaphores vk. Queue. Signal. Semaphore(queue, semaphore); vk. Queue. Wait. Semaphore(queue, semaphore);

Threading • Threading is a big consideration – API doesn’t lock – that’s the

Threading • Threading is a big consideration – API doesn’t lock – that’s the application responsibility • Concurrent read access to same object • Concurrent write access to different objects • Performance from one thread will still be good

Presentation • Displaying outputs is optional! – We expect some compute-only Vulkan applications –

Presentation • Displaying outputs is optional! – We expect some compute-only Vulkan applications – No real need to create a window – console mode – Each platform is different • Presentation is an extension • We define two flavors of the “Window System Interface” – One is for compositors, one is for direct-display

Displays • Vulkan also abstracts some display management – Also delegated to WSI extensions

Displays • Vulkan also abstracts some display management – Also delegated to WSI extensions – Manage display mode – Turn vsync on and off – Enumerate and take control of displays • This all depends on platform support, of course!

Teardown • Application responsible for object destruction – Must be correctly ordered – No

Teardown • Application responsible for object destruction – Must be correctly ordered – No reference counting – No implicit object lifetime • Do not delete objects that are still in use! – This includes use by GPU

Scalability • Scalability is an important goal – Scales from low power mobile to

Scalability • Scalability is an important goal – Scales from low power mobile to high end workstation – Many features optional – Queryable upper limits for most things • Still considering how to “bundle” features – Want to avoid “sea of caps” problem – May defer to platform owners

Extensibility • Vulkan has a first class extension mechanism – Extensions are opt-in •

Extensibility • Vulkan has a first class extension mechanism – Extensions are opt-in • No more using extensions by accident • Don’t pay driver tax for unused features • Much easier to validate – Still want to expose bleeding edge • Vulkan is a platform for innovation

Tools and Debugging • Tools and development are key to success – Strong tools

Tools and Debugging • Tools and development are key to success – Strong tools mean better applications – Vulkan is not simple – tools are a must • Khronos is looking to build a strong ecosystem – Tools, loader and other components open source – Well documented hooks for extending API

Tools and Debugging APPLICATION LOADER LAYERS DRIVER GPU GPU TOOLS

Tools and Debugging APPLICATION LOADER LAYERS DRIVER GPU GPU TOOLS

Layers • Loader supports layering APIs – Formal hooks for debuggers and tools •

Layers • Loader supports layering APIs – Formal hooks for debuggers and tools • No more interceptors, shims, or stub libraries – Validation in intermediate layers • Opt-in, very powerful – Several layers already developed • API trace, parameter validation, API timing, etc.

Layers • Multiple types of layer – Instance level layers • Enabled at instance

Layers • Multiple types of layer – Instance level layers • Enabled at instance creation time • Globally available to every device in instance – Device level layers • Specific to device • Enable device-specific extensions, for example

Summary • Not really “low-level”, just a better abstraction • Very low overhead: –

Summary • Not really “low-level”, just a better abstraction • Very low overhead: – Low overhead means more application CPU cycles – Explicit threading support means you can go wide without worrying about graphics APIs – Building command buffers once and submitting many times means low amortized cost

Summary • Cross-platform, cross-vendor – Not tied to single OS (or OS version) –

Summary • Cross-platform, cross-vendor – Not tied to single OS (or OS version) – Not tied to single GPU family or vendor – Not tied to single architecture • Desktop + mobile, forward and deferred, tilers all first class citizens

Summary • Open, extensible – Khronos is an open standards body • Collaboration from

Summary • Open, extensible – Khronos is an open standards body • Collaboration from across the industry, IHVs + ISVs, games, CAD, “Pro” Graphics, AAA + casual – Full support for extensions, layering, debuggers, tools – SPIR-V fully documented – write your own compiler!

Thanks! @grahamsellers www. khronos. org/vulkan

Thanks! @grahamsellers www. khronos. org/vulkan