Trace __device__ functions in Visual Profiler

Hi All,
I would like to analyze a CUDA kernel through Compute Visual Profiler (3.2). The profiler can only trace up to individual kernel functions and not any deeper. The kernel function includes several device functions. Can the profiler trace them? If so, how? FYI, I have turned on all Profiler Triggers, enabled CUDA API trace, and selected on the available events.
Thanks so much for the help.

Best,
8gpus

Hello,
Would anyone from NVIDAI Visual Profiler group to comment on this please? I would like to confirm whether it is currently limited.
Thanks a lot.
Best,
8gpus

I think it’s not possible to do this because device functions are usually inlined (unless noinline is specified), and I doubt visual profiler can record all function calls…