Optimizing Compute Shaders for L2 Locality using Thread-Group ID Swizzling

Originally published at: https://developer.nvidia.com/blog/optimizing-compute-shaders-for-l2-locality-using-thread-group-id-swizzling/

As part of my GDC 2019 session, Optimizing DX12/DXR GPU Workloads using Nsight Graphics: GPU Trace and the Peak-Performance-Percentage (P3) Method, I presented an optimization technique named thread-group tiling, a type of thread-group ID swizzling. This is an important technique for optimizing 2D, full-screen, compute shader passes that are doing widely spread texture fetches and…

Hello, thank you for the great article! I’ve used this approach in my pet project and it really improves performance. But also I’ve found an issue in code snippet, I’ve already created pull request and it will be great if you can merge it.