Kernels and For Loops

prawler3000 · April 4, 2008, 8:27pm

I have a question that I am hoping someone might be able to answer:

I am using a For Loop to execute a kernel multiple times, for the first 16 iterations, the processing time is consistent with what the average time should be. However, after the 16th iteration, the processing time jumps significantly, ie from 20 micro seconds to 5 milliseconds. This continues to occur even though I am using multiple kernels…after the 16th kernel call, the processing time sky rockets!

Can someone explain why this is happening, your time is greatly appreciated!

MisterAnderson42 · April 4, 2008, 8:30pm

Kernels are launched asynchronously, so all kernel calls return right away: your measured time for the first 16 calls should be essentially 0. The queue depth for kernels is 16, at which point there is an implicit synchronization with the device before queuing the next async kernel.

You can synchronize with the device yourself for timing purposes by calling cudaThreadSynchronize or using the events API.

prawler3000 · April 4, 2008, 8:33pm

Thank you very much, I really appreciate it. I originally had a thread sync call but removed it, I will need to go back and put it back on.

Topic		Replies	Views
Odd Slowdown Problem Same function slows down in loop CUDA Programming and Performance	3	9885	February 8, 2008
cost for launching (a lot of) CUDA kernels CUDA Programming and Performance	5	9738	April 15, 2010
kernel in loop (time explodes) CUDA Programming and Performance	4	3491	June 29, 2009
Calling kernel in a loop spends much time in cudaFree CUDA Programming and Performance	1	771	July 16, 2018
Extremely high number of iterations CUDA Programming and Performance	5	1348	February 14, 2013
Double For Loop Very Slow CUDA Programming and Performance	8	4477	August 20, 2008
Kernel Timing and cudaThreadSynchronize() CUDA Programming and Performance	6	2016	July 30, 2010
Parallel Kernels Best practices for creating a pipeline CUDA Programming and Performance	7	4717	June 1, 2007
kernel performance and number of threads CUDA Programming and Performance	2	6600	November 22, 2007
Newbie: async kernel, so I can do stuff on the CPU meanwhile, yeah? CUDA Programming and Performance	2	380	January 13, 2019

Kernels and For Loops

Related topics