Using Shared Memory in CUDA C/C++

Originally published at: https://developer.nvidia.com/blog/using-shared-memory-cuda-cc/

In the previous post, I looked at how global memory accesses by a group of threads can be coalesced into a single transaction, and how alignment and stride affect coalescing for various generations of CUDA hardware. For recent versions of CUDA hardware, misaligned data accesses are not a big issue. However, striding through global memory is…