BUG: CUDA Programming Guide memcpy_async pipeline example is incorrect

carlos0105ad · February 23, 2026, 4:24pm

The example in section “Tracking Asynchronous Memory Operations” (https://docs.nvidia.com/cuda/cuda-programming-guide/04-special-topics/pipelines.html#tracking-asynchronous-memory-operations) shows:

cuda::memcpy_async(buffer, in, sizeof(float), pipeline);

Claiming “every thread fetches one element” but all threads pass the same buffer/in pointers. This implies implicit thread offset which doesn’t exist.

Actual behavior: All threads copy the same float to the same shared memory buffer position. The code “works” only because they all write the same value.

Users following this pattern for shared memory copies get incorrect results.
Explicit +threadIdx.x offset is required.

striker159 · February 24, 2026, 5:53am

There are several typos and mistakes left in the reworked programming guide. You can report them here: How to report a bug

Topic		Replies	Views
Bug for __pipeline_memcpy_async CUDA Programming and Performance	2	1953	October 12, 2021
Problem with programming guide example async memory copying CUDA Programming and Performance	7	3785	July 23, 2008
cudaMemcpyAsync waiting for another unrelated cudaMemcpyAsync CUDA Programming and Performance cuda	10	215	December 10, 2024
Coalesced and conflict free memory access using cuda::memcpy_async/cp.async CUDA Programming and Performance cuda	6	1048	November 13, 2024
Async Memcpy calls blocking main thread CUDA Programming and Performance	3	2514	November 19, 2011
Asynchronous copying on hopper GPU from shared to global CUDA Programming and Performance	2	98	October 28, 2025
__pipeline_memcpy_async CUDA Programming and Performance cuda	0	348	November 26, 2024
__pipeline_memcpy_async VS cuda::memcpy_async CUDA Programming and Performance cuda	0	145	November 26, 2024
cudaMemcpyAsync not behaving asynchronously CUDA Programming and Performance	5	2535	July 4, 2008
Issue with cooperative_groups::memcpy_async CUDA Programming and Performance	4	2559	November 30, 2021

BUG: CUDA Programming Guide memcpy_async pipeline example is incorrect

Related topics