different results with cupti and nvprof.

I got different results with cupti and nvprof. The results after running callbacke_timestamp.cu are compared with the results of nvprof. The running time of the kernel varies greatly, obviously nvprof results are much faster. How is the result obtained by cupti converted to subtle, is it divided by 1000?


GPU time :76288
nvprof avg :1.9840us ( vceadd )

nvprof uses CUPTI to collect the profiling information. It’s expected that both the tools would produce the same results. CUPTI outputs the time in nsec while nvprof can display results in other time units like usec, msec, sec etc.

CUPTI sample callback_timestamp doesn’t measure the kernel execution time on the GPU, it shows how to collect the CUDA API trace using the CUPTI callbacks. CUDA API time for kernel launch is different than the kernel execution time on the GPU. CUDA API timing information is collected on the CPU/host while kernel execution time is collected on the GPU. Does it help understand the difference in the timing you observed?

I understand now , thank you for your guidance.