See Cuda stream data in Nsight-systems or compute

I have a program running on Jetson NX - Jetpack 4.6.0.
I wish to profile it and see how the CUDA streams and kernals etc are performing. The Nsight-systems installed using the SDK Manager was able to profile the program but it’s not providing the detailed information about the GPU usage or the various cuda kernals.
The nsight-compute package I installed from here does not show an option to target aarch64 devices.
If you could please specify how to view detailed information regarding the gpu and cuda usage then that would be really helpful. Thanks!

Attaching the output I get with nsight-systems. I had run the program from my terminal on nx and provided the pid on the nsight-systems on the host device. Also, I had set the CUDA_INJECTION64_PATH variable in the terminal before launching the application(ie. the variable was set on the target device) but I get a warning saying that the variable is not set.

Just to add I had gone through this post and all the links shared in the post.
I probably looked over something but if you could guide me a bit to properly setup my environment then that would be greatly appreciated.

Nsight Systems will provide the required information you are looking for.
Please refer the CUDA Trace section in the User Guide :: Nsight Systems Documentation.
It will be better to move this post to the Nsight Systems category.

@Andrey_Trachenko to refer for assistance.

Thank you for speaking up about the difficulties that you faced.

Unfortunately this attach scenario is not supported anymore. It was a mistake to keep it in the docs, so I filed a bug internally to remove the information (the section that references the CUDA_INJECTION64_PATH variable). I’m sorry that this information got you confused. Unfortunately, this mechanism was too cumbersome to support, and we had to get rid of it in favor of the CLI on Jetson.

There are two ways to profile applications on Jetson:

  • Remotely via an SSH connection from a Linux host. Run Nsight Systems GUI on your desktop Linux, connect to Jetson, and setup all the configuration in the UI, then click Start.
  • Locally on the Jetson, using the CLI. The basic command is nsys profile ./myApp. I recommend going this way.

There are two ways the CLI can be installed on your Jetson:

  • If you select it to be installed during the JetPack installation, in which case nsight-systems-cli package is installed on Jetson, and the nsys CLI is automatically available in the shell
  • If you connect to the Jetson once from the GUI, and in which case it’s installed into /opt/nvidia/nsight_systems/nsys.

In one of the next releases of Nsight Systems for Jetson, there will also be the full GUI available to run on Jetson itself.

Hope this information helps you!

Thanks for replying!
I want to use the first method, i.e. connect remotely via SSH connection. As can be seen from the attached image, I’m able to do some basic profiling but the detailed information regarding the kernals and cuda streams is not appearing below the gpu section.

If you could help with getting the timeline information for various cuda streams then that would be really helpful.
PS: where should I check the meaning of different colours we see in the nsight-systems gui view?

The CUDA information is collected per process, so if it present in the report file, please expand the “Processes (1)” row to find it.

We could be more specific if you could let us see your report file Report 6.nsys-rep. Please note that the report file contains information about the profiled processes, as well as the environment where the profiling happened, so only share the report file if you are comfortable that others will see this information.

where should I check the meaning of different colours we see in the nsight-systems gui view?

There is currently no unified guide on the colors used. In many places you can infer the meaning of the color by looking at the tooltip:

image