Originally published at: https://developer.nvidia.com/blog/reading-between-the-threads-shader-intrinsics/
When writing compute shaders, it’s often necessary to communicate values between threads. This is typically done through shared memory. Kepler GPUs introduced shuffle intrinsics, which enable threads of a warp to directly read each other’s registers, avoiding memory access and synchronization. Shared memory is relatively fast but instructions that operate without using memory of any…