cudaMemcpy too slow

sakalauskas.andrius · May 11, 2021, 6:34am

Hello,

Currently I’m working with CNN related project, the goal to implement YOLO convolutional neural network in real-time using GPU and I faced certain problem. Overall, the all calculations of CNN layers on GPU runs fast (~15 ms), however I didn’t find the way how to be fast when copying final results back to CPU memory. cudaMemcpy takes about 55 seconds!!! even when copying single float variable:

cudaMemcpy(var1, var, sizeof(float), cudaMemcpyDeviceToHost);

I have tried to use page-locked memory:

float* var1;

cudaHostAlloc((void**)&var1, sizeof(float),0);

cudaStream_t mystream1;

cudaStreamCreate(&mystream1);

cudaMemcpyAsync(var1, var, sizeof(float), cudaMemcpyDeviceToHost, mystream1);

now it runs fast, but returns wrong result. If I add:

cudaStreamSynchronize(mystream1);

the result is correct, but the copying time is about 55 seconds again. I’m new in GPU programming and it would be great if you provide some suggestions or examples how to be fast returning from the device to host.

Robert_Crovella · May 11, 2021, 2:14pm

You’re getting confused by the timing due to the asynchronous calls you are using. The cudaMemcpy call doesn’t take 55 seconds itself. Instead, the previous asynchronous calls are taking that time to complete, and the cudaMemcpy call is forcing the CPU thread to wait for the completion, so it appears to be taking all that time.

Focusing on “speeding up the copy operation” is the wrong idea here.

Topic		Replies	Views
cudaMemcpy host->device and device->host speed CUDA Programming and Performance	6	15503	April 29, 2014
Is there any way to copy data from device to host more efficiently in this case? CUDA Programming and Performance	4	1118	December 14, 2018
Possibly Studpid question bout cudaMemcpy CudaMemcpy getting slow by time CUDA Programming and Performance	4	2098	February 26, 2010
About CUDA CUDA Programming and Performance	2	4770	December 3, 2008
Why cudaMemcpyDeviceToHost is too slowly? CUDA Programming and Performance	1	686	November 16, 2021
`cudaMemcpyHostToDevice` is very slow CUDA Programming and Performance	8	2169	December 14, 2018
Is it possible cudaMemcpy can consume more than 100 milliseconds for just a few bytes of data? CUDA Programming and Performance cuda	4	382	October 14, 2021
copy memory slow? CUDA Programming and Performance	2	4862	February 12, 2009
Memory Transfer CUDA Programming and Performance	7	3072	October 10, 2008
cudaHostAlloc memory initial time CUDA Programming and Performance	0	389	August 19, 2018

cudaMemcpy too slow

Related topics