Nsys cannot collect cuda information on Drive OS 5.1

Hi @SivaRamaKrishnaNV I created a VM to do this and finally installed on my VM, but the nsight-sys 2019.3.4 cannot launch the daemon, I doubt that the target system is not installed the corresponding version, maybe we need also flash the target system? but I am not sure if the flash will erase my other files. Also I also need to use nsys under docker

DaemonStartError (1405) {
OriginalExceptionClass: N5boost16exception_detail10clone_implIN13QuadDAnalysis16DaemonStartErrorEEE
OriginalFile: /build/agent/work/20a3cfcd1c25021d/QuadD/Host/Analysis/SshDevice.cpp
OriginalLine: 1247
OriginalFunction: virtual std::__cxx11::string QuadDAnalysis::SshDevice::StartDaemon(const string&)
ErrorText: Daemon start failed.
ExitCode: 1
}

sdkmanager doesn’t support vm but you already did it.
Per Nsys cannot collect cuda information on Drive OS 5.1 - #6 by shangping.guo, your target system is the same version. I don’t think you need to reflash it.

Please elaborate how you saw the issue. Thanks.

@VickNV Thanks for the reply. My steps:

  • in the VM terminal, go to /opt/nvidia/nsightsystems/nsightsystems-2019.3.4/Host-x86_64
  • perform command: ./nsight-sys
  • create ssh connection: nvidia, 10.160.66.213 port 22
  • follow the instruction by adding port 22, 45555, 2222 to the tcp list (on nvidia Nsight Systems User Guide :: Nsight Systems Documentation)
  • connect to this device (open another terminal and ping 10.160.66.213 is fine to make sure we can connect to it)
  • enter password
  • after a while, got the error (I notice that on the target machine, all the .so files are gone except the nsys program)

@SivaRamaKrishnaNV @VickNV @kayccc Hello Can you respond my message? I really need to use the tool for profiling purpose. Thank you for the help.

I didn’t see the issue. Where in the target system did you see the error messages?
Please try with non vm host system as I said which wasn’t verified.

@VickNV I think that might be not a problem since on another folder on the target machine I found all the files. I also tried to install on a linux desktop, but the target part is not fully installed which I think it shall not be a problem since we are not to flash.
The problem is the same using the linux desktop. Any ideas? Thanks

Let me know where you saw the message? still suggest you try it after installing (also flashing) successfully.

under bash: /opt/nvidia/nsightsystems/nsightsystems-2019.3.4/Host-x86_64/nsight-sys
connect ssh and it shows:
image
then click more info:
image
then click more info and I get the message:

p, li { white-space: pre-wrap; }

DaemonStartError (1405) {

OriginalExceptionClass: N5boost16exception_detail10clone_implIN13QuadDAnalysis16DaemonStartErrorEEE

OriginalFile: /build/agent/work/20a3cfcd1c25021d/QuadD/Host/Analysis/SshDevice.cpp

OriginalLine: 1247

OriginalFunction: virtual std::__cxx11::string QuadDAnalysis::SshDevice::StartDaemon(const string&)

ErrorText: Daemon start failed.

ExitCode: 1

}

Please check if the necessary port is open by running “sudo firewall-cmd --list-ports” on your target system.
After adding, you better power cycle and check before connecting.

Thanks @VickNV I forgot to do this on the desktop machine. I have added 22/45555/2222 to the port list. I do not have access to the desktop right now, I will return when it is ready

Hi @VickNV Sorry for the late reply. I disabled my desktop remote access by using firewall-cmd and caused the delay. I tested on the desktop with nsight-sys. The problem is still the same.
image

Did you try on the host system in which SDK Manager was run to install DRIVE Software 10 and the target flashed with DRIVE Software 10? Did you run firewall-cmd commands on the target system (instead of the host system)?
Please provide your detailed steps and commands for my clarification. Thanks.

hi @VickNV I only run firewall-cmd on my host machine.

  • using sdkmanager install host system (target is not installed partially on the host)
  • did not flash the target (target is not changed at all)
  • add the ports allowed (on the host machine)
  • run 2019 nsight-sys (on the host machine)
  • connect to my target via port 22

You should follow “Linux-Based Target Device” document and run the command on your target system.

@VickNV Thank you for the information. I add the port on the target machine.:

nvidia@pegasus2a:~$ sudo firewall-cmd --permanent --list-ports
45555/tcp 22/tcp 2222/tcp

However it is the same issue. I once installed nsight cli 2021.1 on the target machine, not sure if it is the reason. (later I removed it due to the concern)

nvidia@pegasus2a:~$ sudo apt list | grep nsight

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

cuda-nsight-compute-addon-l4t-10-2/unknown 10.2.19-1 all
insighttoolkit4-examples/bionic 4.12.2-dfsg1-1ubuntu1 all
libinsighttoolkit4-dev/bionic 4.12.2-dfsg1-1ubuntu1 arm64
libinsighttoolkit4.12/bionic 4.12.2-dfsg1-1ubuntu1 arm64
python-applicationinsights/bionic 0.11.0-1 all
python3-applicationinsights/bionic 0.11.0-1 all
nvidia@pegasus2a:~$ locate nsys
/home/nvidia/nsight_systems/nsys
/opt/nvidia/nsight_systems/nsys
/usr/lib/libnvwinsys.so

note the /home/nvidia/nsight-system is just the target files backup I saved.

Here are the steps I tried last time. FYR.

  • on host system install DRIVE Software 10 successfully (both host and target components).
  • after target system boots up, ssh to it and run below commands:
    sudo apt-get update
    sudo apt-get install firewalld
    sudo firewall-cmd --permanent --add-port 45555/tcp
    sudo reboot now
    sudo firewall-cmd --list-ports
  • on host system run below command to launch nsight system and connect to target via ssh.
    /opt/nvidia/nsightsystems/nsightsystems-2019.3.4/Host-x86_64/nsight-sys

@VickNV Thanks, I followed your procedure but now ssh to pegasus is lost and I need help from IT again. I will let you know when it is ready.

I tried again and hit a similar issue as yours.
Please try with the updated steps as below:

  • on host system install DRIVE Software 10 successfully (both host and target components).
  • after target system boots up, ssh to it and run below commands:
    sudo apt-get update
    sudo apt-get install firewalld
    sudo firewall-cmd --permanent --add-port 45555/tcp
    sudo firewall-cmd --reload
    sudo firewall-cmd --list-ports
  • on host system run below command to launch nsight system and connect to target via ssh.
    /opt/nvidia/nsightsystems/nsightsystems-2019.3.4/Host-x86_64/nsight-sys

@VickNV finally we had some progress. Now connected but shows this error message:

image

on host:
shangping.guo@sm-dub-3e3640:~$ sudo firewall-cmd --list-ports --permanent
45555/tcp 22/tcp 2222/tcp 443/tcp 4172/tcp 60443/tcp 4172/udp

on target:
nvidia@pegasus2a:~$ sudo firewall-cmd --list-ports
45555/tcp 22/tcp 2222/tcp

Did you follow my steps? I saw only 45555/tcp output by the command.

$ sudo firewall-cmd --list-ports
45555/tcp