I have observed some strange behavior on kernel execution time. I repeatedly run a kernel an measure the kernel time using Cuda events. The kernel parameters (number of blocks, threads, shared memory size) are not changed between the calls.
while (work to do)
{
ceResult = cudaEventRecord(kernelStartEvent, 0);
StartCudaKernel();
ceResult = cudaEventRecord(kernelStopEvent, 0);
ceResult = cudaEventSynchronize(kernelStopEvent);
ceResult = cudaEventElapsedTime(&fCudaKernelTime, kernelStartEvent, kernelStopEvent);
}
Running the kernel on a GeForce GTX 1050 Ti, I get almost constant kernel times.
When I start the same code on a GeForce GeForce GTX 750 Ti, the kernel times vary from 776ms up to 1449ms. Running on a notebook using a Quadro M2000M brings the same unsteady results.
I use the kernel time measurement to estimate how much calulation work can be done within 2 seconds for not running into TDR on Windows OS. With these varying execution times it is very hard to find a optimal work load. The problem was observed using CUDA 5.5 and CUDA 8.0
So far, the kernel execution time was very linear, meaning double calculation work resulted in double execution time. Have there been any changes to the driver or the architecture that cause this unsteady behavior?
Is there a workaround for achieving constant kernel times?