Strange Performance Issues Strange Performance Issues at the First Kernel Execution

Hello,

I have done a while loop which executes:

  1. a kernel and;
  2. a host function which do exactly the same operation in the kernel.

The objective is to measure the time which each function (host and device) takes. I have noticed that the first execution of the kernel (the first loop) is much faster than the others. Look below the results:

Type N: 1000
Type numThreadPerBlock (<= 512): 512

elapsedTimeCPU = 1.875000 miliseconds
elapsedTimeGPU = 0.088000 miliseconds
factor = 21.306818 (elapsedTimeCPU/elapsedTimeGPU)

Type N: 1000
Type numThreadPerBlock (<= 512): 512

elapsedTimeCPU = 1.848000 miliseconds
elapsedTimeGPU = 0.267000 miliseconds
factor = 6.921349 (elapsedTimeCPU/elapsedTimeGPU)

Type N: 1000
Type numThreadPerBlock (<= 512): 512

elapsedTimeCPU = 1.847000 miliseconds
elapsedTimeGPU = 0.268000 miliseconds
factor = 6.891791 (elapsedTimeCPU/elapsedTimeGPU)

Type N: 1000
Type numThreadPerBlock (<= 512): 512

elapsedTimeCPU = 1.847000 miliseconds
elapsedTimeGPU = 0.268000 miliseconds
factor = 6.891791 (elapsedTimeCPU/elapsedTimeGPU)

Type N: 1000
Type numThreadPerBlock (<= 512): 512

elapsedTimeCPU = 1.862000 miliseconds
elapsedTimeGPU = 0.269000 miliseconds
factor = 6.921933 (elapsedTimeCPU/elapsedTimeGPU)

Type N: 1000
Type numThreadPerBlock (<= 512): 512

elapsedTimeCPU = 1.850000 miliseconds
elapsedTimeGPU = 0.269000 miliseconds
factor = 6.877324 (elapsedTimeCPU/elapsedTimeGPU)

I have already check for errors in the first execution, but I found nothing. I am using the timer functions of the cutil library and I call cutilSafeThreadSync() before the beginning and end of timing.

Is there anyone who have already noticed that or which can try to reproduce the error with a simple kernel?

You didn’t post your code, but likely what you’re timing is the QUEUEING speed, not the execution speed.

Use cudaThreadSyncronize() before your timer call to make sure kernels have finished running before finishing the timing loop.