Optimization with Radeon GPU Profiler A Vulkan Case

  • Slides: 50
Download presentation
Optimization with Radeon GPU Profiler A Vulkan Case Study Gregory Mitrano gregory. mitrano@amd. com

Optimization with Radeon GPU Profiler A Vulkan Case Study Gregory Mitrano gregory. mitrano@amd. com

About Me Previous: - Subatomic Studios - Game Developer / Graphics Programmer Demoscene -

About Me Previous: - Subatomic Studios - Game Developer / Graphics Programmer Demoscene - Youth Uprising Current: - - AMD - Direct. X Driver Engineer - Developer Driver - Radeon GPU Profiler Demoscene - Catalyst Logo By Pepi Simeonov (https: //www. pepisimeonov. com/)

Sands of Time - Catalyst

Sands of Time - Catalyst

Vulkan ● Why? ○ Consistency ○ Performance ○ Control ● Challenging ● Learning curve

Vulkan ● Why? ○ Consistency ○ Performance ○ Control ● Challenging ● Learning curve ● Performance is not free Vulkan and the Vulkan logo are registered trademarks of the Khronos Group Inc.

What is Radeon GPU Profiler? ● Detailed workload information ● DX 12 and Vulkan

What is Radeon GPU Profiler? ● Detailed workload information ● DX 12 and Vulkan support ● Hardware level profiling features

How does it work? Connect Radeon Developer Panel to Radeon Developer Service Connect Set

How does it work? Connect Radeon Developer Panel to Radeon Developer Service Connect Set up the target application

How does it work? Launch the target application (RGP support is built directly into

How does it work? Launch the target application (RGP support is built directly into the production driver) Capture a trace Double click to open in RGP

Using RGP - Where do I start?

Using RGP - Where do I start?

Using RGP - Where do I start?

Using RGP - Where do I start?

Using RGP - Barriers

Using RGP - Barriers

Using RGP - Barriers

Using RGP - Barriers

Using RGP - Most Expensive Events

Using RGP - Most Expensive Events

Using RGP - Context Rolls

Using RGP - Context Rolls

Using RGP - Wavefront View Occupancy Graph GPU Events

Using RGP - Wavefront View Occupancy Graph GPU Events

? What’s a wavefront? What’s wavefront occupancy?

? What’s a wavefront? What’s wavefront occupancy?

What’s a wavefront? ● 64 threads ● Smallest unit of GPU work ● Also

What’s a wavefront? ● 64 threads ● Smallest unit of GPU work ● Also called a “wave” AMD Graphics Core Next (GCN) Wavefront 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63

GCN Architecture SE 0 ● N SEs per chip CU 0 CU 1 CU

GCN Architecture SE 0 ● N SEs per chip CU 0 CU 1 CU 2 CU 3 CU 4 CU 5 CU 6 CU 7 CU 8 ● N CUs per SE ● 4 SIMDs per CU CU 0 SIMD 1 SIMD 2 SIMD 3 ● N waves per SIMD 0 Wav e 1 Wav e 2 Wav e 3 Wav e 4 Wav e 5 Wav e 6 Wav e 7

What’s wavefront occupancy? 8 Wave Slots Per SIMD on RX 480 8 Waves Per

What’s wavefront occupancy? 8 Wave Slots Per SIMD on RX 480 8 Waves Per SIMD Wave Wave 100% 4 Waves Per SIMD Wave Empt y 50% 2 Waves Per SIMD Wave Empt y Empt y 25% 1 Wave Per SIMD Wave Empt y Empt y 12. 5% A measure of how close a SIMD is to its maximum wavefront capacity

Latency Hiding - Definition ● ALUs -> Fast ● Memory -> Slow ● Memory

Latency Hiding - Definition ● ALUs -> Fast ● Memory -> Slow ● Memory latency prevents us from fully utilizing the ALUs.

Executing Wave Latency Hiding Example Available Wave Stalled Wave SIMD executes wave 0 SIMD

Executing Wave Latency Hiding Example Available Wave Stalled Wave SIMD executes wave 0 SIMD 0 Wave 1 Wave 2 Wave 3 Wave 4 Wave 5 Wave 6 Wave 7 Wave 0 stalls, SIMD moves to Wave 1 SIMD 0 Wave 1 Wave 2 Wave 3 Wave 4 Wave 5 Wave 6 Wave 7 Wave 1 stalls, SIMD moves to Wave 2 SIMD 0 Wave 1 Wave 2 Wave 3 Wave 4 Wave 5 Wave 6 Wave 7 Wave 2 stalls, Wave 0 unblocks, SIMD moves to Wave 0 SIMD 0 Wave 1 Wave 2 Wave 3 Wave 4 Wave 5 Wave 6 Wave 7 Wave

Wavefront occupancy in RGP - Global

Wavefront occupancy in RGP - Global

Using RGP - Wavefront View Occupancy Graph GPU Events

Using RGP - Wavefront View Occupancy Graph GPU Events

Using RGP - Wavefront View Occupancy Graph GPU Events

Using RGP - Wavefront View Occupancy Graph GPU Events

Using RGP - Wavefront View Occupancy Graph GPU Events

Using RGP - Wavefront View Occupancy Graph GPU Events

Pipeline Bubble - Zoomed In

Pipeline Bubble - Zoomed In

Pipeline Bubble - Event Timeline

Pipeline Bubble - Event Timeline

Pipeline Bubble - Event Timeline

Pipeline Bubble - Event Timeline

Pipeline Bubble - DCC Decompress 161 us Depth Of Field Motion Blur Post Processing

Pipeline Bubble - DCC Decompress 161 us Depth Of Field Motion Blur Post Processing Render Pass 56 us~ DCC Decompress Bloom Downsample

The Post Processing Render Pass Subpasses Attachments Initial Depth of Field Motion Blur Final

The Post Processing Render Pass Subpasses Attachments Initial Depth of Field Motion Blur Final Color Shader Read Color Write Transfer Src Depth D/S Read Preserve D/S Read Composite Undefined Color Write Shader Read Transfer Dst Velocity Shader Read Preserve Shader Read Which layout transition is causing the decompress?

The Post Processing Render Pass Subpasses Attachments Motion Blur Final Color Write Transfer Src

The Post Processing Render Pass Subpasses Attachments Motion Blur Final Color Write Transfer Src Depth Preserve D/S Read Composite Shader Read Transfer Dst Velocity Shader Read

The Post Processing Render Pass Subpasses Attachments Motion Blur Final Operation Color Write Transfer

The Post Processing Render Pass Subpasses Attachments Motion Blur Final Operation Color Write Transfer Src Fast Clear Eliminate Depth Preserve D/S Read Composite Shader Read Transfer Dst Velocity Shader Read DCC Decompress

The Post Processing Render Pass Subpasses Composite Motion Blur Final Operation Shader Read Transfer

The Post Processing Render Pass Subpasses Composite Motion Blur Final Operation Shader Read Transfer Dst DCC Decompress Motion Blur Final Shader Read Attachments Composite Vk. Cmd. Pipeline. Barrier : Image Layout [Undefined -> Transfer Dst]

The Post Processing Render Pass 90 us~ Gain

The Post Processing Render Pass 90 us~ Gain

Using RGP - Pipeline State View

Using RGP - Pipeline State View

Using RGP - Pipeline State View SIMD PS Wave Empt y

Using RGP - Pipeline State View SIMD PS Wave Empt y

Using RGP - Pipeline State View SIMD VS Wave VS Wave Empt y

Using RGP - Pipeline State View SIMD VS Wave VS Wave Empt y

Using RGP - Wavefront View

Using RGP - Wavefront View

Low Occupancy - Zoomed In Low-ish Occupancy GBuffer Pass SSAO Pass Very Low Occupancy

Low Occupancy - Zoomed In Low-ish Occupancy GBuffer Pass SSAO Pass Very Low Occupancy Shadow Pass No Async Compute Usage Lighting Pass

Original - 2046 us 1080 p

Original - 2046 us 1080 p

Graphics Queue Overlap - 1627 us 1080 p

Graphics Queue Overlap - 1627 us 1080 p

Asynchronous Compute Queue - 1738 us 1080 p

Asynchronous Compute Queue - 1738 us 1080 p

Original - 6183 us 4 k

Original - 6183 us 4 k

Graphics Queue Overlap - 5519 us 4 k

Graphics Queue Overlap - 5519 us 4 k

Asynchronous Compute Queue - 5203 us 4 k

Asynchronous Compute Queue - 5203 us 4 k

SSAO Overlap Results - AMD Radeon RX 480 1080 p : GFX Overlap (419

SSAO Overlap Results - AMD Radeon RX 480 1080 p : GFX Overlap (419 us Gain) - 5, 896 us Total Original Frametime 4 k : Async Compute (980 us Gain) - 25, 830 us Total Original Frametime

RGP Built-in Help

RGP Built-in Help

Links ● GPUOpen RGP Product Page ○ https: //gpuopen. com/gaming-product/radeon-gpu-profilerrgp/ ● AMD Open Source

Links ● GPUOpen RGP Product Page ○ https: //gpuopen. com/gaming-product/radeon-gpu-profilerrgp/ ● AMD Open Source Vulkan Driver (AMDVLK) ○ ● https: //github. com/GPUOpen-Drivers/AMDVLK Sands of Time - Catalyst ○ Youtube : https: //www. youtube. com/watch? v=f. S 8 MQhbnrv. Q ○ Pouet : http: //www. pouet. net/prod. php? which=72282

Questions ?

Questions ?

Backup

Backup

Compute -> Graphics -> Compute

Compute -> Graphics -> Compute