How to measure time in cuda kernel ...? [CUDA 4.0]

Hi,

I am using CUDA 4.0 and I need to accurately measure a time difference in CUDA kernel.

Currently I am using clock() function and note it is not accurate enough.

How to measure time in kernels accurately?

Note that question is not about measuring time from main() function … but from kernel …

Thank you.

What do you mean it is not accurate enough?

Your best bet if you are trying to profile your kernels will depend on what environment you are using. For example if you are using Windows/Visual Studio, then you can use the NSight tool (https://developer.nvidia.com/nvidia-nsight-visual-studio-edition)
to provide incredibly accurate timings of your kernels as well as useful profiling information such as GFLOPs, memory accesses and other metrics for finding performance bottlenecks.