cudaMemcpyAsync clarification required & help needed

cirus · October 17, 2009, 5:46pm

The CUDA 2.2 guide says following

" Two commands from different streams cannot run concurrently if either a pagelocked

host memory allocation, a device memory allocation, a device memory set, a

device â†” device memory copy, or any CUDA command to stream 0 is called inbetween

them by the host thread."

Q1. Does above mean that I can not use a cudaMemcpy between two cudaMemcpyAsync ?

I have following situation

function XYZ()

{

//

.....

//allocate memory to device variables 

........

//generate streams

cudaCreateStream(&stream1);

cudaCreateStream(&stream2);

//allocate page locked memory to host pointers.

cudaMallocHost( (void**)&tempFrame , frame_size); 

cudaMallocHost( (void**)&h_Samples_Data , all_samples_data_size); 

//Sync operation

	err = cudaGetLastError();

		cudaMemcpy(d_SampleAttrib, h_SampleAttrib, attrib_size, cudaMemcpyHostToDevice);

	err = cudaGetLastError();

//Sync operation

	err = cudaGetLastError();

		cudaMemcpy(d_ProjMat, proj_mat_data, cDim * iSampleLen * sizeof(float), cudaMemcpyHostToDevice);

	err = cudaGetLastError();

//Async operation

	err = cudaGetLastError();

		cudaMemcpyAsync(d_ImgFrameData, tempFrame, frame_size, cudaMemcpyHostToDevice, stream1);

	err = cudaGetLastError();

//Query whether all operations in stream 1 are done  

	 err = cudaGetLastError();

			cudaStreamQuery(stream1);

	 err = cudaGetLastError();

//Kernell calling

	foo<<<BLOCKS , THREADS>>>(.....)

//destroy stream 1

cudaStreamDestroy(stream1);

//copy back processed data to another host pointer with page locked memory.

cudaMemcpyAsync(h_SamplesData, d_SamplesData, all_samples_data_size, cudaMemcpyDeviceToHost , stream2);

//destroy stream2

cudaStreamDestroy(stream2);

//remaining code

free memory etc. 

}

Above code segment is called every time a new frame arrives through a video file. The sequence of operations runs successfully for the first time only.

At the second time, I am greeted with exception error at the first cudaMemcpy() operation.

I tried to debug using device_emulation mode and getting error. I am also checking values in the device variables viz: d_SampleAttrib, d_ProjMat and d_ImgFrameData.

FYI: (This may help for spotting error)

For first iteration, the values in each of these device variables are different. At the second iteration, when the device variables are allocated memory again, I find the values in all three are same.

Q2. Can you guide me where am I going wrong?

After reading the guide, I moved my async operation after cudaMemcpy operation. Still I am not successful.

I wrote another simple program where I am storing and retrieving data from host to device and vice versa using cudaMempcyAsync and it is working well. But I never experienced to mix cudaMemcpy and cudaMemcpyAsync operations together.

If more information is needed please tell me.Thanks all.

Topic		Replies	Views
Question about streams CUDA Programming and Performance	1	980	August 6, 2009
Memory copy/set async to kernel execution in different stream CUDA Programming and Performance	5	1044	December 15, 2022
Are cudaMemCpy and cudaMalloc blocking/synchronous? CUDA Programming and Performance	1	359	September 30, 2024
asynchronous cuMemcpyDtoD ? CUDA Programming and Performance	9	2406	December 9, 2008
CPU blocked MUCH longer than expected calling a cudaMemcpy after a cuda graph launch CUDA Programming and Performance	7	544	October 19, 2023
Synchronization of cudaMemcpyAsync for pageable memory CUDA Programming and Performance	2	1660	October 3, 2021
Confusion about implicit inter-stream synchronization brought by cudaMemsetAsync CUDA Programming and Performance	5	603	December 30, 2023
Confusion about synchronization or asynchronization of cudaMemcpy() and cudaMemcpyAsync() CUDA Programming and Performance	5	3431	December 23, 2023
cudaMemcpyAsync makes code faster even when using the default stream 0 CUDA Programming and Performance	1	1458	January 10, 2022
I want to synchronize CUDA streams CUDA Programming and Performance	5	767	January 5, 2024

cudaMemcpyAsync clarification required & help needed

Related topics