Computation Time Discrepancy


we are employing the CUDA implementation of a Support Vector Machine (SVM) from However, we have an issues in terms of computation time:

Within the CUDA code, we measure the time with cudaEventElapsedTime(elapsed, start, stop). We get about 2ms, that’s pretty fast. However, if we measure the execution time of this CUDA function in the surrounding C-Code with , we get about 200ms. Hence we have a time discrepancy of almost 200ms while measuring “the same thing”.

We would appreciate any help or hints!
Thanks in advance,


Kernel launches are asynchronous. Only CUDA-time-measurement functions can measure the exact time spent on kernels.



For the sake of getting a quick answer of the correct amount of time, just pop in a for loop and run the kernal 100 times. That’ll tell you quite plainly which time is more correct.

You could try inserting a cudaThreadSynchronize between the kernal launch and the finish event. That will make sure the event doesn’t fire until after the kernal is done.