Bug when overlapping tranfert & data

fcs · January 11, 2011, 3:57pm

Hy guys,

It seems that i can’t overlap asynchronous memory transfert with computation on my cards.

I have attached the little prog i use to reproduce it (it is inspired by the one found here: The Official NVIDIA Forums | NVIDIA)
(It is just for test purpose and does nothing interesting)

The results i got looks like:

time launching memcpy h2d =0.0017643
time memcpy h2d =3.72838
time between memcpy h2d and kernel =3.73197
time launching kernel =0.00332499
time kernel =0.436005
time launching memcpy d2h =0.00183749
time memcpy d2h =3.4469
time waiting =7.64547

I analyse time between memcpy h2d and kernel =3.73197 as if the kernel starts in stream “st1” only after memcpyasync in stream “st” has finished. (And that’s what the profiler shows)

For me, it’s a huge bug but maybe i miss something ovious.

I tested it on a machine with an Tesla S1070 card (driver 256.40) and on a Quadro FX5800 with driver 260.19.21. Both are Linux x86_64 and i made test with cuda 3.0,3.1 and 3.2(only for quadro) with similar results/

I will realy appreciate any answer.

Thank you very much
overlap.cu (3.38 KB)

fcs · February 11, 2011, 9:14am

Since I’ve reported the bug in the Developper web site, i update here proper reproducers. overlap_bugreport.cu shows that kernel waits for memcopy to finish, and overlap.cu shows that the overlaping is done only at first step.
I’ve still no answer from Nvidia about this.
overlap_bugreport.cu (3.57 KB)
overlap.cu (4.05 KB)

Topic		Replies	Views
Why cuda kernel computation cannot overlap with CPU to GPU data transfer? CUDA Programming and Performance cuda , kernel , pytorch	1	192	May 21, 2024
memory copy overlap CUDA Programming and Performance	7	14722	March 29, 2008
Could someone helpme to achieve overlapping between computation and transfer in GTX Titan card? CUDA Programming and Performance	3	964	October 26, 2013
Conditions for CUDA streams to overlap CUDA Programming and Performance	5	4356	June 9, 2013
kernal and memcpy cannot overlap when using cudaMemcpyDeviceToDevicev in some situations CUDA Programming and Performance	1	600	October 23, 2015
Concurrent copy & execution problem Device to host memory copy is not overlapped with kernel exe CUDA Programming and Performance	1	1765	June 23, 2010
cudaMemcpyAsync same direction overlap CUDA Programming and Performance	1	308	June 29, 2023
performance variation when using asynchronous calls CUDA Programming and Performance	1	621	February 11, 2011
Why can't I overlap asynchronous memcpy with kernel execution on fermi on win7 and CUDA 5.0? CUDA Programming and Performance	0	726	May 29, 2013
overlapping Issues CUDA Programming and Performance	1	757	January 11, 2011

Bug when overlapping tranfert & data

Related topics