Overlapping computation and data transfers must use pinned memory or UVA?

hijohnny5 · August 13, 2018, 9:05am

Official cuda c programing guide section 9.1.2 says “In contrast with cudaMemcpy(), the asynchronous transfer version requires pinned host memory (see Pinned Memory), and it contains an additional argument, a stream ID”

Official document also gives an concurrent copy and execute example as below:

cudaStreamCreate(&stream1);
cudaStreamCreate(&stream2);
cudaMemcpyAsync(a_d, a_h, size, cudaMemcpyHostToDevice, stream1);
kernel<<<grid, block, 0, stream2>>>(otherData_d);

Here in above example, is a_h must be pinned memory by calling cudaHostAlloc or cudaHostRegister? Normal host memory allocated by malloc can get concurrent copy and execute effect? I tried non-pinned memory, it functionally worked but I didn’t know if it could concurrent copy and execute.

Thanks.

Robert_Crovella · August 13, 2018, 2:25pm

Yes, to witness overlap/concurrency, you would normally want a_h to be allocated with a pinned allocator. If you don’t use a pinned allocator, the operation won’t fail or return an error, but it generally will not overlap with the subsequent kernel call.

You should be able to confirm behavior with a profiler.

Topic		Replies	Views
Does cudaMemcpyAsync require host memory to be pinned? CUDA Programming and Performance cuda	1	461	October 6, 2022
CUDA streams questions CUDA Programming and Performance	1	1047	May 17, 2015
Problems with cudaHostAlloc and cudaMemcpyAsync CUDA Programming and Performance	5	4604	February 8, 2010
Can I create a pinned memory buffer to support overlapping compute/copy without cudaMallocHost overhead CUDA Programming and Performance cuda	13	1009	November 3, 2020
cudaMemcpyAsync and pinned memory CUDA Programming and Performance	1	1172	August 31, 2021
cudaMemcpyAync with pageable memory overlap with kernal CUDA Programming and Performance cuda	3	804	January 23, 2023
Searching some infos on cudaStreams CUDA Programming and Performance	6	6205	January 26, 2012
Pinned memory that's not CUDA Programming and Performance	3	357	February 26, 2024
Asynchronous Memcpy's not overlapping with asynchronous kernel execution despite using cuda streams? CUDA Programming and Performance cuda	4	1207	October 31, 2022
Overlapping CPU<->GPU trasnfer and kernel computation only for pinned memory CUDA Programming and Performance	3	967	March 29, 2011

Overlapping computation and data transfers must use pinned memory or UVA?

Related topics