CUDA Pro Tip: Do The Kepler Shuffle

Originally published at: https://developer.nvidia.com/blog/cuda-pro-tip-kepler-shuffle/

When writing parallel programs, you will often need to communicate values between parallel threads. The typical way to do this in CUDA programming is to use shared memory. But the NVIDIA Kepler GPU architecture introduced a way to directly share data between threads that are part of the same warp. On Kepler, threads of a…