I am using cuda toolkit 12.1 and Nsight Systems 2023.1.1 (nsight-sytems is installed in my user directory via the “run” installer). I am trying to profile a local target GPU on an Ubuntu 22.04 server which has a GUI desktop installed. There are two NVIDIA A5500 RTX GPUs on the system.
When I run nsys-ui and select the localhost target, I get an error “Failed to launch daemon”. In the log file, I find:
I23:31:45:978|quadd_device_base|45758|BaseDevice.cpp:561[CreateProxyInternal]: Start connection attempt to daemon at 127.0.0.1:45555, timeout = 10 seconds
I23:31:45:979|quadd_pbcomm_proxy|45757|ClientProxy.cpp:138[HandleStart]: ClientProxy[0x404df8e0780] is starting.
I23:31:45:979|quadd_pbcomm_tcp|45757|Communicator.cpp:296[Connector]: Connector[0x404de040000] created.
I23:31:45:979|quadd_pbcomm_tcp|45757|Communicator.cpp:301[Start]: Connector[0x404de040000] is connecting to 127.0.0.1:45555 .
I23:31:45:979|quadd_pbcomm_tcp|45757|Communicator.cpp:311[Start]: Connector[0x404de040000] set timeout 10 seconds.
W23:31:45:980|quadd_pbcomm_tcp|45759|Communicator.cpp:384[HandleConnect]: Connector[0x404de040000] failed to connect: Connection refused
If I monitor processes while the host application is trying to connect to the target daemon, I briefly see two instances of the daemon running, but then they disappear:
sscott@demo:~/esat-rx$ /home/sscott/nsight-systems-2023.1.1/target-linux-x64/nsys -v
NVIDIA Nsight Systems version 2023.1.1.127-32365746v0
sscott@demo:~/esat-rx$ /home/sscott/nsight-systems-2023.1.1/target-linux-x64/nsys profile ./esat
The timeout expired.
If I run an older version, which I think was installed with the 12.1 toolset:
sscott@demo:~/esat-rx$ which nsys
/usr/local/bin/nsys
sscott@demo:~/esat-rx$ nsys -v
NVIDIA Nsight Systems version 2022.4.2.50-32196742v0
sscott@demo:~/esat-rx$ nsys profile ./esat
Agent launcher failed.
I’m out of ideas. Does anyone know how I can get the tool to run?
sscott@demo:~/esat-rx$ nsys status -e
Timestamp counter supported: Yes
CPU Profiling Environment Check
Root privilege: disabled
Linux Kernel Paranoid Level = 2
Linux Distribution = Ubuntu
Linux Kernel Version = 5.15.0-67-generic: OK
Linux perf_event_open syscall available: OK
Sampling trigger event available: OK
Intel(c) Last Branch Record support: Available
CPU Profiling Environment (process-tree): OK
CPU Profiling Environment (system-wide): Fail
See the product documentation at https://docs.nvidia.com/nsight-systems for more information,
including information on how to set the Linux Kernel Paranoid Level.
sscott@demo:~/esat-rx$ which nsys
/usr/local/bin/nsys
Sorry for the delayed response, somehow I didn’t receive notification for this post.
The log you attached is helpful. I suspect it’s due to longer-than-expected driver initilization time, which can happen if the driver hasn’t been initialized before the collection. Could you share the result of nvidia-smi on your system?
Setting the driver to persistent mode may help, there are two ways:
I believe nvidia-persistenced was already running as part of the original installation - do I need to do anything else? We do notice that it takes our application about 30 seconds to initialize the gpu every time we run. We thought this should be faster after the first time running after a reboot.
sscott@demo:~$ ps ax | grep persist
1935 ? Ss 0:00 /usr/bin/nvidia-persistenced --verbose
173788 pts/46 S+ 0:00 grep --color=auto persist
sscott@demo:~$ sudo systemctl status nvidia-persistenced
[sudo] password for sscott:
● nvidia-persistenced.service - NVIDIA Persistence Daemon
Loaded: loaded (/lib/systemd/system/nvidia-persistenced.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2023-03-08 17:04:39 UTC; 1 week 0 days ago
Main PID: 1935 (nvidia-persiste)
Tasks: 1 (limit: 308801)
Memory: 812.0K
CPU: 28ms
CGroup: /system.slice/nvidia-persistenced.service
└─1935 /usr/bin/nvidia-persistenced --verbose
Mar 08 17:04:39 demo nvidia-persistenced[1935]: Verbose syslog connection opened
Mar 08 17:04:39 demo nvidia-persistenced[1935]: Started (1935)
Mar 08 17:04:39 demo nvidia-persistenced[1935]: device 0000:31:00.0 - registered
Mar 08 17:04:39 demo nvidia-persistenced[1935]: device 0000:31:00.0 - persistence mode enabled.
Mar 08 17:04:39 demo nvidia-persistenced[1935]: device 0000:31:00.0 - NUMA memory onlined.
Mar 08 17:04:39 demo nvidia-persistenced[1935]: device 0000:4b:00.0 - registered
Mar 08 17:04:39 demo nvidia-persistenced[1935]: device 0000:4b:00.0 - persistence mode enabled.
Mar 08 17:04:39 demo nvidia-persistenced[1935]: device 0000:4b:00.0 - NUMA memory onlined.
Mar 08 17:04:39 demo nvidia-persistenced[1935]: Local RPC services initialized
Mar 08 17:04:39 demo systemd[1]: Started NVIDIA Persistence Daemon.
sscott@demo:~$ nvidia-smi
Wed Mar 15 17:15:48 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA RTX A5500 On | 00000000:31:00.0 Off | Off |
| 30% 32C P8 17W / 230W| 6MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA RTX A5500 On | 00000000:4B:00.0 Off | Off |
| 30% 33C P8 18W / 230W| 6MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 7007 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 7007 G /usr/lib/xorg/Xorg 4MiB |
+---------------------------------------------------------------------------------------+
Thanks for sharing the results. To explain the issue more clearly, in the log you shared, we do see a 32 seconds delay while Nsys is calling “cuInit” internally to force driver initialization, but Nsys has a 30-seconds timeout for it, that’s why it bailed out.
This should usually be very short if the system already has persistenced daemon running. I need to discuss internally to see why it’s not the case on your system.
Thanks for the experiment, that shows the longer-than-expected delay for cuInit is indeed the root cause.
Per internal discussion we’ve only hit this previously when there’s a bad GPU device, do you know if that could be the case on your system?
Can you run strace -T -o /tmp/strace.txt matrixMul (matrixMul can be any simple CUDA program) and send us strace.txt ? That can help us find which GPU node causes the long delay.
If we can identify a problematic GPU node, the suggestion would be disabling that node before running Nsys. Or if that’s not possible, we will need to add a way in Nsys to allow extending the timeout to workaround it, but that won’t be available to you until next public release which is a few monthes later, unless you or your company has NDA with NVIDIA in which case we can share you an internal build.
Unfortunately this is out of the scope of Nsight Systems development, I suggest reporting the issue to CUDA - NVIDIA Developer Forums and see if they have a suggestion. As long as the long delay for cuInit() can be eliminated, Nsys should work well.
I saw someone from CUDA forum has been working with you on it. Note that you can point out the long delay is caused by cuInit() call in case they haven’t been aware of it. The fact that every CUDA app needs 32s initialization time is definitely not ideal so it’s best to get it fixed.
If that does not work out we can look into how to get you a customized build
Yes, I’ve pointed it out to them a couple of times, but there seems to be a tendency to go off track. I’ll keep iterating with them, they have some valid things to consider. But we’re not converging and I’m feeling a bit of time pressure to get the profiler running, so could we begin the process of getting a customized build that just gets us past the problem for now?
I’ll need to confirm internally to see what’s the process to share you a customized build or get you into NDA before that. I’ll give an update tomorrow.
Could you share which country and company are you from? I need to reach out to legal team for the NDA process. Also, could you share your company email address?