Load .so Files with Nsight System Command Line

In GUI of Nsight Systems, we have an option to collect call stacks of executing threads. It is my understanding that by clicking that, child processes and child threads are traced as a tree from the start of the program. Within this option, we can load .so files to include CUDA kernels called within the .so files in a profiling task. Otherwise, information of kernel execution within a .so file is not shown in the report.

Now that we are profiling within a container, it is my understanding that the GUI profiler is not usable. Instead, we should use the command line Nsight Systems. I am wondering if there is a similar load-so option for the command line, so that we can still see kernel execution durations.

The options for the CLI can be found at Nsight Systems User Guide :: Nsight Systems Documentation

The default CLI behavior is to trace process-tree wide.

May I ask what you are seeing?

(You can use the GUI inside a container, but it is much simpler to just use the CLI).

@skottapalli can you take a look at this?

Hi Ziqi,

The CLI’s default behavior is to trace APIs for all processes in the process tree. Are you not seeing the CUDA kernels from your .so when you use the CLI?

I am seeing the kernels in this profiling from CLI this time.

Just curious. In the GUI, .so locations have to be explicitly provided to show kernel behavior within the .so files. However, from what was told, in CLI, .so locations are not needed to track kernels within the .so files. Does it mean that the CLI can find the .co files automatically as the program executes and find kernel symbols from them?

Can you provide a screenshot of the GUI setting where you are explicitly providing .so locations to get kernel behavior to show up in the profile? This would clarify what you mean.

AFAIK there should be no difference between CLI and GUI when it comes to tracing CUDA kernels. Adding @atrachenko for help with question on GUI (library .so locations have to be explicitly provided to capture kernels within the .so files)

Please see the attached image for reference. There is a button named “Symbol locations”. This is what I meant. Without providing locations of .so files using this button, kernels inside the .so files are not tracked.

The symbol locations are used to resolve symbols in the call stacks. This is useful when stripped libraries are used on the target. Are you missing CUDA kernels in the profile when you don’t specify the symbol locations? That would be a odd and may be a bug.

Yes, I just recalled some while ago, I didn’t know why I didn’t see kernel execution information, and I discussed with another engineer. We tried blindly, and till the symbol location is provided, we saw kernel information from the profiler. Is this a bug?

Sounds like a bug. Please share the report file with and without providing the symbol locations so that we can make sure that missing symbol locations is actually causing the missing kernel information. Would it be possible for you to also share a repro so that we can debug the problem on our end?