Controlling Data Movement to Boost Performance on the NVIDIA Ampere Architecture

Originally published at: https://developer.nvidia.com/blog/controlling-data-movement-to-boost-performance-on-ampere-architecture/

The NVIDIA Ampere architecture provides new mechanisms to control data movement within the GPU and CUDA 11.1 puts those controls into your hands. These mechanisms include asynchronously copying data into shared memory and influencing residency of data in L2 cache. This post walks through how to use the asynchronous copy feature, and how to set…