Optimizing Compute Shaders for L2 Locality using Thread-Group ID Swizzling

Originally published at: https://developer.nvidia.com/blog/optimizing-compute-shaders-for-l2-locality-using-thread-group-id-swizzling/

As part of my GDC 2019 session, Optimizing DX12/DXR GPU Workloads using Nsight Graphics: GPU Trace and the Peak-Performance-Percentage (P3) Method, I presented an optimization technique named thread-group tiling, a type of thread-group ID swizzling. This is an important technique for optimizing 2D, full-screen, compute shader passes that are doing widely spread texture fetches and…