Nsight system fails to connect to daemon

I am using cuda toolkit 12.1 and Nsight Systems 2023.1.1 (nsight-sytems is installed in my user directory via the “run” installer). I am trying to profile a local target GPU on an Ubuntu 22.04 server which has a GUI desktop installed. There are two NVIDIA A5500 RTX GPUs on the system.

When I run nsys-ui and select the localhost target, I get an error “Failed to launch daemon”. In the log file, I find:

I23:31:45:978|quadd_device_base|45758|BaseDevice.cpp:561[CreateProxyInternal]: Start connection attempt to daemon at 127.0.0.1:45555, timeout = 10 seconds
I23:31:45:979|quadd_pbcomm_proxy|45757|ClientProxy.cpp:138[HandleStart]: ClientProxy[0x404df8e0780] is starting.
I23:31:45:979|quadd_pbcomm_tcp|45757|Communicator.cpp:296[Connector]: Connector[0x404de040000] created.
I23:31:45:979|quadd_pbcomm_tcp|45757|Communicator.cpp:301[Start]: Connector[0x404de040000] is connecting to 127.0.0.1:45555 .
I23:31:45:979|quadd_pbcomm_tcp|45757|Communicator.cpp:311[Start]: Connector[0x404de040000] set timeout 10 seconds.
W23:31:45:980|quadd_pbcomm_tcp|45759|Communicator.cpp:384[HandleConnect]: Connector[0x404de040000] failed to connect: Connection refused

If I monitor processes while the host application is trying to connect to the target daemon, I briefly see two instances of the daemon running, but then they disappear:

sscott@demo:~$ ps -ax | grep nsys
  45570 ?        S      0:00 /bin/bash /home/sscott/nsight-systems-2023.1.1/host-linux-x64/nsys-ui
  45615 ?        Sl     0:00 /home/sscott/nsight-systems-2023.1.1/host-linux-x64/CrashReporter --hide-stack NVIDIA Nsight Systems NsightSystems 2023.1.1 (Build 2023.1.1.127-32365746v0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x2023.1.1.127-32365746v0_Q05J870wHW0574h_BUILD_VERSION) /home/sscott/nsight-systems-2023.1.1/host-linux-x64/nsys-ui.bin
  45618 ?        Sl     0:00 /home/sscott/nsight-systems-2023.1.1/host-linux-x64/nsys-ui.bin
  45804 ?        S      0:00 /bin/sh -c LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/home/sscott/nsight-systems-2023.1.1/target-linux-x64  QUADD_INSTALL_DIR=/home/sscott/nsight-systems-2023.1.1/target-linux-x64 /home/sscott/nsight-systems-2023.1.1/target-linux-x64/nsys --daemon --lock_file /run/user/1001/nsys.lock 
  45805 ?        S      0:00 /home/sscott/nsight-systems-2023.1.1/target-linux-x64/nsys --daemon --lock_file /run/user/1001/nsys.lock
  45806 ?        Rsl    0:29 /home/sscott/nsight-systems-2023.1.1/target-linux-x64/nsys --daemon --lock_file /run/user/1001/nsys.lock
  45828 pts/5    S+     0:00 grep --color=auto nsys

sscott@demo:~$ ps -ax | grep nsys
  45570 ?        S      0:00 /bin/bash /home/sscott/nsight-systems-2023.1.1/host-linux-x64/nsys-ui
  45615 ?        Sl     0:00 /home/sscott/nsight-systems-2023.1.1/host-linux-x64/CrashReporter --hide-stack NVIDIA Nsight Systems NsightSystems 2023.1.1 (Build 2023.1.1.127-32365746v0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x2023.1.1.127-32365746v0_Q05J870wHW0574h_BUILD_VERSION) /home/sscott/nsight-systems-2023.1.1/host-linux-x64/nsys-ui.bin
  45618 ?        Sl     0:00 /home/sscott/nsight-systems-2023.1.1/host-linux-x64/nsys-ui.bin
  45850 pts/5    S+     0:00 grep --color=auto nsys

If I just run the cli version:

sscott@demo:~/esat-rx$ /home/sscott/nsight-systems-2023.1.1/target-linux-x64/nsys -v
NVIDIA Nsight Systems version 2023.1.1.127-32365746v0

sscott@demo:~/esat-rx$ /home/sscott/nsight-systems-2023.1.1/target-linux-x64/nsys profile ./esat
The timeout expired.

If I run an older version, which I think was installed with the 12.1 toolset:

sscott@demo:~/esat-rx$ which nsys
/usr/local/bin/nsys
sscott@demo:~/esat-rx$ nsys -v
NVIDIA Nsight Systems version 2022.4.2.50-32196742v0
sscott@demo:~/esat-rx$ nsys profile ./esat
Agent launcher failed.

I’m out of ideas. Does anyone know how I can get the tool to run?

Thanks.

Can you send us the output from running “nsys status -e”?

@liuyis can you take a look at this one?

nsys installed with toolkit:

sscott@demo:~/esat-rx$ nsys status -e
Timestamp counter supported: Yes

CPU Profiling Environment Check
Root privilege: disabled
Linux Kernel Paranoid Level = 2
Linux Distribution = Ubuntu
Linux Kernel Version = 5.15.0-67-generic: OK
Linux perf_event_open syscall available: OK
Sampling trigger event available: OK
Intel(c) Last Branch Record support: Available
CPU Profiling Environment (process-tree): OK
CPU Profiling Environment (system-wide): Fail

See the product documentation at https://docs.nvidia.com/nsight-systems for more information,
including information on how to set the Linux Kernel Paranoid Level.
sscott@demo:~/esat-rx$ which nsys
/usr/local/bin/nsys

Hi @sscott2, can you try the following commands and share the results:

/home/sscott/nsight-systems-2023.1.1/target-linux-x64/nsys profile echo 0

/home/sscott/nsight-systems-2023.1.1/target-linux-x64/nsys profile -t none ./esat

/home/sscott/nsight-systems-2023.1.1/target-linux-x64/nsys profile -t none -s none --cpuctxsw=none ./esat

these are not solutions, just to help us identifying the issue

I attached a log in case that helps.

sscott@demo:~/esat-rx$ nsys profile echo 0
The timeout expired.
sscott@demo:~/esat-rx$ nsys profile -t none ./esat
The timeout expired.
nsys profile -t none -s none --cpuctxsw=none ./esat
The timeout expired.
sscott@demo:~/esat-rx$ nsys --version
NVIDIA Nsight Systems version 2023.1.1.127-32365746v0

nsys-ui.log (65.0 KB)

Sorry for the delayed response, somehow I didn’t receive notification for this post.

The log you attached is helpful. I suspect it’s due to longer-than-expected driver initilization time, which can happen if the driver hasn’t been initialized before the collection. Could you share the result of nvidia-smi on your system?

Setting the driver to persistent mode may help, there are two ways:

  1. (preferred) sudo nvidia-persistenced
  2. nvidia-smi -pm 1

Could you try if any of them work?

I believe nvidia-persistenced was already running as part of the original installation - do I need to do anything else? We do notice that it takes our application about 30 seconds to initialize the gpu every time we run. We thought this should be faster after the first time running after a reboot.

sscott@demo:~$ ps ax | grep persist
   1935 ?        Ss     0:00 /usr/bin/nvidia-persistenced --verbose
 173788 pts/46   S+     0:00 grep --color=auto persist

sscott@demo:~$ sudo systemctl status nvidia-persistenced
[sudo] password for sscott: 
● nvidia-persistenced.service - NVIDIA Persistence Daemon
     Loaded: loaded (/lib/systemd/system/nvidia-persistenced.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2023-03-08 17:04:39 UTC; 1 week 0 days ago
   Main PID: 1935 (nvidia-persiste)
      Tasks: 1 (limit: 308801)
     Memory: 812.0K
        CPU: 28ms
     CGroup: /system.slice/nvidia-persistenced.service
             └─1935 /usr/bin/nvidia-persistenced --verbose

Mar 08 17:04:39 demo nvidia-persistenced[1935]: Verbose syslog connection opened
Mar 08 17:04:39 demo nvidia-persistenced[1935]: Started (1935)
Mar 08 17:04:39 demo nvidia-persistenced[1935]: device 0000:31:00.0 - registered
Mar 08 17:04:39 demo nvidia-persistenced[1935]: device 0000:31:00.0 - persistence mode enabled.
Mar 08 17:04:39 demo nvidia-persistenced[1935]: device 0000:31:00.0 - NUMA memory onlined.
Mar 08 17:04:39 demo nvidia-persistenced[1935]: device 0000:4b:00.0 - registered
Mar 08 17:04:39 demo nvidia-persistenced[1935]: device 0000:4b:00.0 - persistence mode enabled.
Mar 08 17:04:39 demo nvidia-persistenced[1935]: device 0000:4b:00.0 - NUMA memory onlined.
Mar 08 17:04:39 demo nvidia-persistenced[1935]: Local RPC services initialized
Mar 08 17:04:39 demo systemd[1]: Started NVIDIA Persistence Daemon.

sscott@demo:~$ nvidia-smi
Wed Mar 15 17:15:48 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A5500                On | 00000000:31:00.0 Off |                  Off |
| 30%   32C    P8               17W / 230W|      6MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA RTX A5500                On | 00000000:4B:00.0 Off |                  Off |
| 30%   33C    P8               18W / 230W|      6MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      7007      G   /usr/lib/xorg/Xorg                            4MiB |
|    1   N/A  N/A      7007      G   /usr/lib/xorg/Xorg                            4MiB |
+---------------------------------------------------------------------------------------+

Thanks for sharing the results. To explain the issue more clearly, in the log you shared, we do see a 32 seconds delay while Nsys is calling “cuInit” internally to force driver initialization, but Nsys has a 30-seconds timeout for it, that’s why it bailed out.

This should usually be very short if the system already has persistenced daemon running. I need to discuss internally to see why it’s not the case on your system.

In the meantime could you build and run the following program on your system to see if the long delay also happens independently:

#include <cuda.h>

#include <chrono>
#include <cstdio>

int main()
{
    auto start = std::chrono::high_resolution_clock::now();
    CUresult result = cuInit(0);
    auto stop = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(stop - start);

    printf("Time: %ldms, Result: %s\n", duration.count(), result == CUDA_SUCCESS ? "success" : "failed");

    return 0;
}

Save it to cuInitTest.cpp and you can build it with nvcc -l cuda -o cuInitTest cuInitTest.cpp

Here are the results - I ran it twice in a row.

sscott@demo:~/esat-rx$ ./cuInitTest 
Time: 32083ms, Result: success
sscott@demo:~/esat-rx$ ./cuInitTest 
Time: 32096ms, Result: success

Thanks for the experiment, that shows the longer-than-expected delay for cuInit is indeed the root cause.

Per internal discussion we’ve only hit this previously when there’s a bad GPU device, do you know if that could be the case on your system?

Can you run strace -T -o /tmp/strace.txt matrixMul (matrixMul can be any simple CUDA program) and send us strace.txt ? That can help us find which GPU node causes the long delay.

If we can identify a problematic GPU node, the suggestion would be disabling that node before running Nsys. Or if that’s not possible, we will need to add a way in Nsys to allow extending the timeout to workaround it, but that won’t be available to you until next public release which is a few monthes later, unless you or your company has NDA with NVIDIA in which case we can share you an internal build.

strace.txt (39.8 KB)

I used your cuInitTest program - let me know if you want a different one.

We have two graphics cards, we see the long delay on startup on both cards.

Thanks for sharing the result. I’m not seeing a problematic GPU device based on the log, however I do see the system calls that caused the long delay:

ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe4da090b0) = 0 <8.005325>
...
ioctl(4, _IOC(_IOC_NONE, 0, 0x25, 0), 0x7ffe4da0c350) = 0 <24.041903>

and tracking back I can see the FDs they were trying to control were nvidiactl and nvidia-uvm:

openat(AT_FDCWD, "/dev/nvidiactl", O_RDWR) = 3 <0.000019>
...
openat(AT_FDCWD, "/dev/nvidia-uvm", O_RDWR|O_CLOEXEC) = 4 <0.000020>

Unfortunately this is out of the scope of Nsight Systems development, I suggest reporting the issue to CUDA - NVIDIA Developer Forums and see if they have a suggestion. As long as the long delay for cuInit() can be eliminated, Nsys should work well.

On our side we can provide the WAR to increase the timeout limit as mentioned in Nsight system fails to connect to daemon - #11 by liuyis. Are you an NDA customer?

We don’t have an NDA - Can you point me to the right place to get that process started?

I will post to the developer forum. Do you have a recommended category to post in?

I saw someone from CUDA forum has been working with you on it. Note that you can point out the long delay is caused by cuInit() call in case they haven’t been aware of it. The fact that every CUDA app needs 32s initialization time is definitely not ideal so it’s best to get it fixed.

If that does not work out we can look into how to get you a customized build

Yes, I’ve pointed it out to them a couple of times, but there seems to be a tendency to go off track. I’ll keep iterating with them, they have some valid things to consider. But we’re not converging and I’m feeling a bit of time pressure to get the profiler running, so could we begin the process of getting a customized build that just gets us past the problem for now?

I’ll need to confirm internally to see what’s the process to share you a customized build or get you into NDA before that. I’ll give an update tomorrow.

ok. The discussion in the other forum appears to be a dead end.

Could you share which country and company are you from? I need to reach out to legal team for the NDA process. Also, could you share your company email address?

Company Info:
Serrano Systems Inc
5235 Avenida Encinas, Suite G
Carlsbad, CA 92008
U.S.A.

corporate contact is Neal Riedel, email: nriedel@serranosystems.com