Hey guys
How does one go about profiling the code executed inside of a kernel?
I have a single kernel function that is launched, which in turn launches various static device functions.
Trouble is the nv profiler only has stats for the entire kernel launch - I want the GPU time spent between calls inside of the kernel code.
Maybe I can use clock() to help me somehow? I’m not sure how though…