How to copy small data from GPU to CPU many times efficiently?

yxc2010 · December 11, 2014, 6:43pm

Hi, I am a newbie in CUDA. I have a small application, which requires to copy small data (128 bits) from GPU to CPU many times.

It seems that each cudaMemcpy costs API time, which is more than my kernel time. Is there any way to avoid this?

Should I choose zero copy or other memory techniques?

Thank you very much for any suggestion.

bool cpu_handle(T* data_cpu); 

T* data;
cudaMalloc((void **)&data, sizeof(T));

T* data_cpu = (T*)Malloc(sizeof(T));

for (int i=0; i<max_it; i++){
    kernel1<<...>>>(..., data)

    cudaMemcpy(data_cpu, data, sizeof(T), cudaMemcpyDeviceToHost);

   if( cpu_handle(data_cpu) ){
         break;
   }
}

Robert_Crovella · December 11, 2014, 7:03pm

If your kernel executes faster than the overhead of a single cudaMemcpy call, you may not be efficiently utilizing the GPU. You might want to investigate trying to do more work or move more of your algorithm onto the GPU. You haven’t really provided enough of an outline to make a well-formed proposal, but as a simple example, move the for-loop and the cpu_handle test onto the GPU. Call the GPU kernel once, and have it return the desired result.

Topic		Replies	Views
Is there any way to copy data from device to host more efficiently in this case? CUDA Programming and Performance	4	1021	December 14, 2018
Small random memcpy (device to device) on GPU CUDA Programming and Performance	6	8387	August 21, 2015
Copies between CPU and GPU CUDA Programming and Performance	0	1277	October 23, 2009
Copies between CPU and GPU CUDA Programming and Performance	8	5380	November 3, 2009
Slow memory transfers CUDA Programming and Performance	7	2030	May 23, 2011
cudaMemcpy costs too much time CUDA Programming and Performance cuda	1	42	October 11, 2024
Copying a single value from device CUDA Programming and Performance	2	2155	July 8, 2009
Copying memory from device to Host takes too much time CUDA Programming and Performance	7	3415	October 5, 2010
Avoid retrieve cudaMemcpy size from GPU CUDA Programming and Performance	6	62	July 19, 2025
Data transfer speed between CPU and GPU CUDA Programming and Performance	5	15463	October 25, 2011

How to copy small data from GPU to CPU many times efficiently?

Related topics