kernel in loop (time explodes)

robOn · June 29, 2009, 12:13am

Hi,

i’m calling my kernel within a loop, but the the time i’m measuring seems to explode with the length of the kernel.

for(int i=0; i<1; i++)
{
1. copy some memory to the device
2. call the kernel
}

this takes 0.26 ms. if i now let the loop run 360 ( for(int i=0; i<360; i++) ) i measure 5398.12 ms.

why isn’t it just 0.26 * 360 = 93.6 ms (is there some invisible thread synchronisation)??

regards rob

robOn · June 29, 2009, 12:21am

currently i’m trying some loop-unrolling … is there also something else that helps? maybe i missed a part in the cuda manuals

Nico · June 29, 2009, 7:09am

Did you synchronize the threads before stopping the timer?
If you don’t perform an action (after calling the kernel) which requires that the result of the kernel is available (such as a DeviceToHost copy)
then the actual time needed to perform the kernel is not included in your timing. Kernel calls are asynchronous and return control to the host
immediately after calling.

N.

CUDAkk · June 29, 2009, 7:22am

Hi,

i’m calling my kernel within a loop, but the the time i’m measuring seems to explode with the length of the kernel.

for(int i=0; i<1; i++)

{
 1. copy some memory to the device

 2. call the kernel
}

this takes 0.26 ms. if i now let the loop run 360 ( for(int i=0; i<360; i++) ) i measure 5398.12 ms.

why isn’t it just 0.26 * 360 = 93.6 ms (is there some invisible thread synchronisation)??

regards rob

use cudaThreadSynchronize() before the time calculation in your first case when calling kernel just one time.

tmurray · June 29, 2009, 7:39am

If you fill up the launch queue, the driver will synchronize on you so it can keep queuing things instead of returning some sort of launch failure.

Topic		Replies	Views
Speed reduces 17 -> 20 times after the kernel is called 9th times! T_T! CUDA Programming and Performance	4	2550	November 18, 2008
Odd Slowdown Problem Same function slows down in loop CUDA Programming and Performance	3	9962	February 8, 2008
the same thing, different time consuming asking for help CUDA Programming and Performance	5	6322	May 26, 2009
Kernel Timing and cudaThreadSynchronize() CUDA Programming and Performance	6	2104	July 30, 2010
Kernels and For Loops CUDA Programming and Performance	2	4135	April 4, 2008
is cudaThreadSynchronize() will take 600+ms to execute? CUDA Programming and Performance	3	1617	April 21, 2009
Timing the Kernel CUDA Programming and Performance	3	3797	January 15, 2010
Strange Performance Issues Strange Performance Issues at the First Kernel Execution CUDA Programming and Performance	1	887	August 8, 2009
Strange Runtime behavior CUDA Programming and Performance	7	3194	December 18, 2009
Can anyone explain the difference in time? CUDA Programming and Performance	2	2502	November 21, 2008

kernel in loop (time explodes)

Related topics