Timing the Kernel

BHC · January 14, 2010, 4:17pm

I have a timer which is producing output that is not what I expect. The code I am using goes like this. Basically, I create a timer, run the kernel, output the timer value, copy the results back to the host, and output the timer value again.

[codebox]//start a timer

printf(“\nStarting Kernel Now…\n”);

cutilCheckError(cutCreateTimer(&timer));

cutilCheckError(cutStartTimer(timer));

//run the kernel

invokeKernel<<<grid, threads>>>( …parameters… );

//output the amount of time elapsed so far

printf( “\nKernel Done. Execution time: %f (ms)\n”, cutGetTimerValue( timer));

// Copy the output back to main memory

printf( “\nGetting Output from GPU.\n”);

cutilSafeCall(cudaMemcpy(…, …, …, cudaMemcpyDeviceToHost));

//output the amount of time elapsed so far

cutilCheckError( cutStopTimer( timer));

printf( “\nTransfer done. Total time: %f (ms)\n”, cutGetTimerValue( timer));[/codebox]

In my application, the kernel is slow. I expect it to take about 30 seconds to complete. The output data is small (2 mb). However, when I run my application, the first timer output is something like 0.05 ms, and the second output is about 30 seconds.

The output of the application is correct. So there is no way my kernel is running in 0.05 ms. Likewise, there is no way it should take 30 seconds to transfer 2 mb of data from the GPU to the host. That leads me to believe I am using the timers incorrectly.

Doe the invokeKernel method return immediately after invoking the kernel, or does it wait for all threads to complete? Any advice in troubleshooting this would be much appreciated.

Thanks,

Bill

avidday · January 14, 2010, 4:40pm

kernel launches are asynchronous, so the invokeKernel time of 0.05ms is only the kernel queuing time, not the execution time. Add a call to cudaThreadSynchronize to make the host spinlock until the kernel finishes execution before you stop the timer. That will correct your timing.

RoBiK · January 14, 2010, 4:41pm

I have a timer which is producing output that is not what I expect. The code I am using goes like this. Basically, I create a timer, run the kernel, output the timer value, copy the results back to the host, and output the timer value again.

[codebox]//start a timer

printf(“\nStarting Kernel Now…\n”);

cutilCheckError(cutCreateTimer(&timer));

cutilCheckError(cutStartTimer(timer));

//run the kernel

invokeKernel<<<grid, threads>>>( …parameters… );

//output the amount of time elapsed so far

printf( “\nKernel Done. Execution time: %f (ms)\n”, cutGetTimerValue( timer));

// Copy the output back to main memory

printf( “\nGetting Output from GPU.\n”);

cutilSafeCall(cudaMemcpy(…, …, …, cudaMemcpyDeviceToHost));

//output the amount of time elapsed so far

cutilCheckError( cutStopTimer( timer));

printf( “\nTransfer done. Total time: %f (ms)\n”, cutGetTimerValue( timer));[/codebox]

In my application, the kernel is slow. I expect it to take about 30 seconds to complete. The output data is small (2 mb). However, when I run my application, the first timer output is something like 0.05 ms, and the second output is about 30 seconds.

The output of the application is correct. So there is no way my kernel is running in 0.05 ms. Likewise, there is no way it should take 30 seconds to transfer 2 mb of data from the GPU to the host. That leads me to believe I am using the timers incorrectly.

Doe the invokeKernel method return immediately after invoking the kernel, or does it wait for all threads to complete? Any advice in troubleshooting this would be much appreciated.

Thanks,

Bill

the kernel call only queues the kernel for execution, the kernel is executed on the device asynchronously.

you need to insert a cudaThreadSynchronize() call behind the kernel call, this will cause your program to wait for the kernel to finish.

BHC · January 15, 2010, 2:23am

Thank you to both of you. Works perfectly.

Topic		Replies	Views
Getting different time for kernel execution. CUDA Programming and Performance	6	5900	July 30, 2009
Kernel execution overhead CUDA Programming and Performance	2	1159	July 6, 2009
Kernel invocation time Minimum kernel invocation time CUDA Programming and Performance	6	4721	March 31, 2008
Speed up due to a kernel launch ? CUDA Programming and Performance	3	1192	December 26, 2009
time problem for data transfer and kernel execution fail to get the partial time separately CUDA Programming and Performance	0	686	October 14, 2011
clock() doesn't work properly CUDA Programming and Performance	10	6287	July 3, 2009
Strange Performance Issues Strange Performance Issues at the First Kernel Execution CUDA Programming and Performance	1	838	August 8, 2009
Inconsistent kernel run times CUDA Programming and Performance	12	5784	August 5, 2009
time measurement discrepancy timer, clock(), profiling CUDA Programming and Performance	4	6693	April 7, 2010
Strange Runtime behavior CUDA Programming and Performance	7	3103	December 18, 2009

Timing the Kernel

Related topics