Kernel execution time issue Execution time increases after several kernel launches


i ported the Gaussian Mixture Model algorithm for video foreground segmentation to CUDA. The CUDA kernel is executed for every frame so it will be launched rapidly. Following a code example:


while( NewFrame() )





   cudaEventRecord(start, 0);

UpdateBackgroundModel<<<grid, block, size>>>(...);

cudaEventRecord(stop, 0);


   cudaEventElapsedTime(&time, start, stop);




The execution time of my kernel is for the first ~300 launches about 1.2 ms and then it increases to 2.4 ms. After some seconds the kernel finally takes about 6.9 ms.

This measurement is done with a release version (without any debug information etc.)

Here some system information:

  • Windows 7 32-bit

  • GeForce 295 GTX (Multi-GPU, only one GPU used for CUDA kernel)

  • Nsight Runtime API 3.1

I have the suspicion that it might be a power saving problem of the GPU. The power saving options from windows is set highest performance. But this didn’t solve the issue.

I hope someone has a solution for this behavior, because in need to be fast as possible.

Best regards


hi, how are you , i am doing the same thing as you do ,the mixture of gaussian in CUDA, so i want to ask your idea about it ,