the same thing, different time consuming asking for help

qijiin21c · May 20, 2009, 12:39pm

hi,
I wrote 3 simple kernels, and found it takes quiet a long time to run the 3 kernels for 1000 times(the purpose of the kernels is to stimulate the logistic mapping). Then i found it run the first 300 times consuming 69ms, but 158ms in the next 50 times. I use CUDA2.2, 9600GT, and the OS is windows XP. but it happens also in rhel5.1.
can anyone tell me how this happens?

Jamie_K · May 20, 2009, 1:16pm

The most likely reason is you are not using cudaThreadSynchronize() and your timing measurements are not measuring what you think you’re measuring. Kernel launches are asynchronous and can queue up in the device. When you stop the stopwatch they may not have finished yet. Using cudaThreadSynchronize() will guarantee that they have finished.

This is not necessarily your problem but it’s the most likely culprit.

qijiin21c · May 20, 2009, 1:24pm

I used __syncthreads(), are they two the same?

Jamie_K · May 20, 2009, 1:40pm

__syncthreads is used on the device side to synchronize threads within a kernel (specifically, within a block).

cudaThreadSynchronize() is used on the host side to make sure the kernel calls have finished. You usually don’t need cudaThreadSynchronize() because most operations naturally wait for the previous operations to finish, for example if you cudaMemcpy() it will wait for any queued kernels to finish, (otherwise it would give wrong results!) But when making timing measurements, you do need to use cudaThreadSynchronize().

qijiin21c · May 20, 2009, 3:51pm

I’ll try that, thanks!

qijiin21c · May 26, 2009, 8:32am

So it is, thank you very much.

Topic		Replies	Views
Odd Slowdown Problem Same function slows down in loop CUDA Programming and Performance	3	9879	February 8, 2008
Oscilating performance, Code total times variates CUDA Programming and Performance	10	10579	June 21, 2009
is cudaThreadSynchronize() will take 600+ms to execute? CUDA Programming and Performance	3	1545	April 21, 2009
What could be possible reasons for affecting the kernel launch overhead for fast small kernels? CUDA Programming and Performance	5	43	October 22, 2024
Getting Different Execution Times of Running Same Kernel Twice CUDA Programming and Performance	2	30	August 13, 2024
Kernel Timing and cudaThreadSynchronize() CUDA Programming and Performance	6	2008	July 30, 2010
Strange Runtime behavior CUDA Programming and Performance	7	3103	December 18, 2009
Why CUDA kernel calls takes so long? CUDA Programming and Performance	2	1454	July 17, 2017
CUDA kernel is 6x slower in model than in a separate benchmark CUDA Programming and Performance cuda , kernel	6	452	February 17, 2023
Inconsistent kernel run times CUDA Programming and Performance	12	5801	August 5, 2009

the same thing, different time consuming asking for help

Related topics