Optimization with Radeon GPU Profiler A Vulkan Case
- Slides: 50
Optimization with Radeon GPU Profiler A Vulkan Case Study Gregory Mitrano gregory. mitrano@amd. com
About Me Previous: - Subatomic Studios - Game Developer / Graphics Programmer Demoscene - Youth Uprising Current: - - AMD - Direct. X Driver Engineer - Developer Driver - Radeon GPU Profiler Demoscene - Catalyst Logo By Pepi Simeonov (https: //www. pepisimeonov. com/)
Sands of Time - Catalyst
Vulkan ● Why? ○ Consistency ○ Performance ○ Control ● Challenging ● Learning curve ● Performance is not free Vulkan and the Vulkan logo are registered trademarks of the Khronos Group Inc.
What is Radeon GPU Profiler? ● Detailed workload information ● DX 12 and Vulkan support ● Hardware level profiling features
How does it work? Connect Radeon Developer Panel to Radeon Developer Service Connect Set up the target application
How does it work? Launch the target application (RGP support is built directly into the production driver) Capture a trace Double click to open in RGP
Using RGP - Where do I start?
Using RGP - Where do I start?
Using RGP - Barriers
Using RGP - Barriers
Using RGP - Most Expensive Events
Using RGP - Context Rolls
Using RGP - Wavefront View Occupancy Graph GPU Events
? What’s a wavefront? What’s wavefront occupancy?
What’s a wavefront? ● 64 threads ● Smallest unit of GPU work ● Also called a “wave” AMD Graphics Core Next (GCN) Wavefront 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
GCN Architecture SE 0 ● N SEs per chip CU 0 CU 1 CU 2 CU 3 CU 4 CU 5 CU 6 CU 7 CU 8 ● N CUs per SE ● 4 SIMDs per CU CU 0 SIMD 1 SIMD 2 SIMD 3 ● N waves per SIMD 0 Wav e 1 Wav e 2 Wav e 3 Wav e 4 Wav e 5 Wav e 6 Wav e 7
What’s wavefront occupancy? 8 Wave Slots Per SIMD on RX 480 8 Waves Per SIMD Wave Wave 100% 4 Waves Per SIMD Wave Empt y 50% 2 Waves Per SIMD Wave Empt y Empt y 25% 1 Wave Per SIMD Wave Empt y Empt y 12. 5% A measure of how close a SIMD is to its maximum wavefront capacity
Latency Hiding - Definition ● ALUs -> Fast ● Memory -> Slow ● Memory latency prevents us from fully utilizing the ALUs.
Executing Wave Latency Hiding Example Available Wave Stalled Wave SIMD executes wave 0 SIMD 0 Wave 1 Wave 2 Wave 3 Wave 4 Wave 5 Wave 6 Wave 7 Wave 0 stalls, SIMD moves to Wave 1 SIMD 0 Wave 1 Wave 2 Wave 3 Wave 4 Wave 5 Wave 6 Wave 7 Wave 1 stalls, SIMD moves to Wave 2 SIMD 0 Wave 1 Wave 2 Wave 3 Wave 4 Wave 5 Wave 6 Wave 7 Wave 2 stalls, Wave 0 unblocks, SIMD moves to Wave 0 SIMD 0 Wave 1 Wave 2 Wave 3 Wave 4 Wave 5 Wave 6 Wave 7 Wave
Wavefront occupancy in RGP - Global
Using RGP - Wavefront View Occupancy Graph GPU Events
Using RGP - Wavefront View Occupancy Graph GPU Events
Using RGP - Wavefront View Occupancy Graph GPU Events
Pipeline Bubble - Zoomed In
Pipeline Bubble - Event Timeline
Pipeline Bubble - Event Timeline
Pipeline Bubble - DCC Decompress 161 us Depth Of Field Motion Blur Post Processing Render Pass 56 us~ DCC Decompress Bloom Downsample
The Post Processing Render Pass Subpasses Attachments Initial Depth of Field Motion Blur Final Color Shader Read Color Write Transfer Src Depth D/S Read Preserve D/S Read Composite Undefined Color Write Shader Read Transfer Dst Velocity Shader Read Preserve Shader Read Which layout transition is causing the decompress?
The Post Processing Render Pass Subpasses Attachments Motion Blur Final Color Write Transfer Src Depth Preserve D/S Read Composite Shader Read Transfer Dst Velocity Shader Read
The Post Processing Render Pass Subpasses Attachments Motion Blur Final Operation Color Write Transfer Src Fast Clear Eliminate Depth Preserve D/S Read Composite Shader Read Transfer Dst Velocity Shader Read DCC Decompress
The Post Processing Render Pass Subpasses Composite Motion Blur Final Operation Shader Read Transfer Dst DCC Decompress Motion Blur Final Shader Read Attachments Composite Vk. Cmd. Pipeline. Barrier : Image Layout [Undefined -> Transfer Dst]
The Post Processing Render Pass 90 us~ Gain
Using RGP - Pipeline State View
Using RGP - Pipeline State View SIMD PS Wave Empt y
Using RGP - Pipeline State View SIMD VS Wave VS Wave Empt y
Using RGP - Wavefront View
Low Occupancy - Zoomed In Low-ish Occupancy GBuffer Pass SSAO Pass Very Low Occupancy Shadow Pass No Async Compute Usage Lighting Pass
Original - 2046 us 1080 p
Graphics Queue Overlap - 1627 us 1080 p
Asynchronous Compute Queue - 1738 us 1080 p
Original - 6183 us 4 k
Graphics Queue Overlap - 5519 us 4 k
Asynchronous Compute Queue - 5203 us 4 k
SSAO Overlap Results - AMD Radeon RX 480 1080 p : GFX Overlap (419 us Gain) - 5, 896 us Total Original Frametime 4 k : Async Compute (980 us Gain) - 25, 830 us Total Original Frametime
RGP Built-in Help
Links ● GPUOpen RGP Product Page ○ https: //gpuopen. com/gaming-product/radeon-gpu-profilerrgp/ ● AMD Open Source Vulkan Driver (AMDVLK) ○ ● https: //github. com/GPUOpen-Drivers/AMDVLK Sands of Time - Catalyst ○ Youtube : https: //www. youtube. com/watch? v=f. S 8 MQhbnrv. Q ○ Pouet : http: //www. pouet. net/prod. php? which=72282
Questions ?
Backup
Compute -> Graphics -> Compute
- Vulkan optimization
- Ati radeon x0 series
- Best case worst case average case
- Aufbau von vulkanen
- Apo vulkan kontinent
- Waveactivecountbits
- Vesuuvi külastus
- Source 2 vulkan
- Bog hefest
- Yellowstone vulkan
- Rymdgrus
- Vulkan run time libraries co to jest
- Nebenschlot vulkan
- Onet career clusters
- Lixi profiler
- Strs
- Lixi profiler
- Trepn profiler
- O net ability profiler
- Lidar wind profiler
- Pdi profilor 360 assessment
- Luc billot
- 0net pl
- Naviance career interest profiler
- Specialla profiler+
- Tapio technologies
- Color profiler
- Jmp
- Onet work values assessment
- Pbt profiler
- Lixi profiler
- Walking floor profiler
- Lixi profiler
- Lixi profiler
- Ryse son of rome focus
- Adobe certified gpu
- Gpu
- Inkscape gpu acceleration
- Gpu gems 4
- Atlas gpu
- Svg basics
- Gpu accelerated storage
- Sql on gpus
- Quantum espresso parallelization
- Amd gpu scheduling
- Ocelot
- Ff
- Gpu rasterization
- Miaow gpu
- Gpu performance analysis
- Gpu pris