data transmission time


I’ve implemented a CUDA program.

And now I’m measuring the time for performing kernel functions and the time for transmitting data (from GPU to CPU).

BTW, to measure data transmission time, I wrote codes like below.

Here, ‘cuSpatialTransform’ is a structure which has 12 float type variables.

There is Case 1 and its value of totHandles is 256.

When I execute this program with case 1, the data trasmission time(the value of timer2) is 1.6~1.7 (ms) in average.

And there is Case 2 which has 229 as the value of totHandles. It’s smaller than that of Case 1.

But the average data transmission time of this case 2 is 64~65 (ms) !!

I don’t understand it.

How this kind of thing is possible?

try putting cudaThreadSynchronize() right before cutStartTimer( timer2)


now the data transmission time of Case 2 is 0.02~0.03

What does cudaTreadSynchronize do to timer?

Anyway thank you very much!!! :D

It ensures that all previous asynchronous operations on the GPU have been completed. Basically, if you had called a kernel prior to the memcpy and then timed the memcpy, you were really timing the kernel+memcpy.

I got it!

Thanks a bunch for your kind answer :)