To measure the execution time of kernels, one can use cudaEventElapsedTime or Performance Analysis from Nsight(Under Windows). But after testing, values returned by cudaEventElapsedTime show larger fluctuations than those given by Nsight Performance Analysis.
Someone said the values from Performance Analysis are more accurate. If so, is there any way that we can improve the accuracy of cudaEventElapsedTime?
Since due to the official documentation https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__EVENT.html#group__CUDART__EVENT_1g40159125411db92c835edb46a0989cd6
cudaEventElapsedTime should be able to compute the elapsed time with a resolution of around 0.5 microseconds.