More information from profiler Is there a way to get it?


CUDA profiler is very helpful, but is there a way to get more detailed information from it? It shows function names and GPU time spend in it, # of stores/loads etc. But is there a way to find out how much time was spent in some parts inside the function or inside sub routines called from the function? Otherwise it looks more like guessing which parts of the code inside the function actually need to be optimized.

Thanks in advance.

I don’t believe you can do anything about the profiler output. One option would be to use clock inside your kernel (check the clock SDK example) to time different sections of your functions.