Why cudaMemcpyDeviceToHost is too slowly?

user51003 · November 16, 2021, 2:36pm

HI sir:
I have a question about cudaMemcpyDeviceToHost. In figure 1. image data from device copy to host is very slowly. But in figure 2. from host copy to host is fast. I am so confused. How to improve speed cudaMemcpyDeviceToHost speed?
thanks.

Image 10.bmp (5.9 MB)
Image 10.bmp (5.9 MB)

Robert_Crovella · November 16, 2021, 4:56pm

This is a common question. CUDA kernel launches and various CUDA library calls are asynchronous. This means they return control to the host thread before the operation is complete.

The subsequent cudaMemcpy operation, on the other hand, blocks the host thread until the previous CUDA operations are complete. So you are not timing what you think you are timing. To “fix” this, one possible approach would be like this:

...
cudaDeviceSynchronize();  // add this line
start = clock();
cudaMemcpy(dstCuda, ..., cudaMemcpyDeviceToHost);
end = clock();
...

Note that to make your timing of the previous nppiFilterBoxBorder function “correct”, you might actually want to put that cudaDeviceSynchronize(); call before the previous end=clock(); statement associated with timing of the npp function.

Topic		Replies	Views
cudaMemcpyDeviceToHost speed how to improve speed CUDA Programming and Performance	3	12509	June 13, 2008
cudaMemcpy CUDA Programming and Performance	0	1209	November 20, 2008
Device to Host memcpy How do i make this faster? CUDA Programming and Performance	2	2514	February 6, 2008
cudaMemcpyDeviceToHost taking much time? CUDA Programming and Performance	3	2655	July 15, 2009
Timming memcpy CUDA Programming and Performance	1	2095	June 24, 2009
how to improve the memory allocation rate,data transfer rate from host to device and device to host CUDA Programming and Performance	9	5266	February 26, 2010
Slow device to host transfer CUDA Programming and Performance	1	3095	June 14, 2007
cudaMemcpy2D slow CUDA Programming and Performance	4	5753	January 30, 2009
cudaMemcpyDeviceToHost 200 x longer than cudaMemcpyHostToDevice ? CUDA Programming and Performance	2	1477	November 25, 2011
cudaMemcpyDeviceToHost 3x slower than cudaMemcpyHostToDevice CUDA Programming and Performance	1	899	January 9, 2019

Why cudaMemcpyDeviceToHost is too slowly?

Related topics