Perfomance issue with memcpy

marat61 · January 28, 2020, 1:42pm

In my case coping data takes 40% of all time if I comment lines with it I get significant speedup. Is it OK?

before = GetTick();

Here >>> CHECK(cudaMemcpyAsync(buffers[inputIndex], input, INPSIZE, cudaMemcpyHostToDevice, stream));
context.enqueue(batchSize, buffers, stream, nullptr);
And here >>> CHECK(cudaMemcpyAsync(output, buffers[outputIndex], OUTSIZE, cudaMemcpyDeviceToHost, stream));
cudaStreamSynchronize(stream);

after = GetTick();

printTimeInterval(before, after);

NVES_R · January 28, 2020, 4:16pm

Hi,

In general, copying memory between the device and host can be a bottleneck in GPU-accelerated computation, so it’s best to be smart about your data transfers when possible for good performance. See this guide for memory optimizations on CUDA best practices: https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#memory-optimizations.

marat61 · January 28, 2020, 4:25pm

Can GPU work while some another process coping its data to device? If yes when some pipeline could be established, right?

NVES_R · January 28, 2020, 4:57pm

Yes, that’s essentially what doing an asynchronous memcpy would be, see this section: https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#asynchronous-transfers-and-overlapping-transfers-with-computation

There’s also a bit of an explanation in this SO post: How to mitigate host + device memory tranfer bottlenecks in OpenCL/CUDA - Stack Overflow

Topic		Replies	Views
`cudaMemcpyHostToDevice` is very slow CUDA Programming and Performance	8	1947	December 14, 2018
Is there any way to copy data from device to host more efficiently in this case? CUDA Programming and Performance	4	877	December 14, 2018
Slow memory transfers CUDA Programming and Performance	7	1986	May 23, 2011
Transfer data between host and device dynamicly? Maybe it's a problem. CUDA Programming and Performance	12	5253	April 2, 2008
during the copy, can cpu and gpu work? CUDA Programming and Performance	6	5213	June 11, 2008
Copies between CPU and GPU CUDA Programming and Performance	8	5324	November 3, 2009
Idea: a new memcpy from device to host for gain performance CUDA Programming and Performance	3	525	October 18, 2018
Memory copy/set async to kernel execution in different stream CUDA Programming and Performance	5	921	December 15, 2022
Copying memory from device to Host takes too much time CUDA Programming and Performance	7	3389	October 5, 2010
CHECK(cudaMemcpy) performance issues CUDA Programming and Performance	3	1032	January 24, 2022

Perfomance issue with memcpy

Related topics