Is there a way to view time spent in kernel per line of code?

I was able to profile my kernels and use the lineinfo to correlate the source to the profile, I was able to get information about registers, etc. But I would like to get timing information per line of code on how much time is spent doing say x = x^2 etc. I haven’t found any source of information on this, does anyone know anything?

There is no notion of “time” from a source line perspective in Nsight Compute. It’s difficult to determine any type of useful number for this with highly parallel GPUs. The closest thing would be the “Warp Stall Sampling (All Cycles)” metric on the source page, but this really only represents a proportion of all the samples that hit this line and the longer the GPU stalls on this line to be executed the more likely it is you would get a sample here. So you can use this as a reasonable indicator of “hotspots” but it isn’t directly related to time.

1 Like

thanks for your insight, will use the stalls to try to figure it out.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.