Finite Difference Methods in CUDA C/C++, Part 1

Originally published at: https://developer.nvidia.com/blog/finite-difference-methods-cuda-cc-part-1/

In the previous CUDA C/C++ post we investigated how we can use shared memory to optimize a matrix transpose, achieving roughly an order of magnitude improvement in effective bandwidth by using shared memory to coalesce global memory access. The topic of today’s post is to show how to use shared memory to enhance data reuse…