NSight Systems not capturing CUDA functions in Jetson Nano


I’m running JetPack 4.6 on my Jetson Nano dev kit and NSight Systems 2021.5.4.19-e642d4b Linux on the remote Ubuntu 18.04 machine. I’m trying to profile a Python app based on the jetson-utils video-viewer example. I am able to successfully do this locally using nvprof, but when I try to do it remotely, using NSight, I only get generic ‘runXYZ’ information from the CUDA cores. Whereas profiling it locally using nvprof would show me information about all the individual CUDA functions called, memory copies etc.

PS: Apologies for the quality of the nvprof screenshot, it was done over VNC.

Thank you!

@Andrey_Trachenko can you have someone answer this one?

Hello, any news on this problem please? Thank you!

Hello @alexx88, I’m sorry that you face an issue and thank you for reporting that.

Nsight Systems 2021.5.4 is quite old and so it’s not easy to understand where the problem is. Based on the screenshot, I can see that CUDA API events were collected, but there are no CUDA GPU workload events. The GPU contexts row is useful to understand when then GPU is switching contexts, i.e. in a scenario of multiple processes competing for time on a single GPU.

I have a couple of questions to ask:

  • Does your application finish correctly when profiled in Nsight Systems? If needed, you can see the captured stdout and stderr in the Files page of the report.
  • There are warning messages in this report - can you please review them and see if there is anything related to CUDA trace?