I would like to analyze a CUDA kernel through Compute Visual Profiler (3.2). The profiler can only trace up to individual kernel functions and not any deeper. The kernel function includes several device functions. Can the profiler trace them? If so, how? FYI, I have turned on all Profiler Triggers, enabled CUDA API trace, and selected on the available events.
Thanks so much for the help.