Nsight system fails to connect to daemon

sscott2 · March 7, 2023, 12:33am

I am using cuda toolkit 12.1 and Nsight Systems 2023.1.1 (nsight-sytems is installed in my user directory via the “run” installer). I am trying to profile a local target GPU on an Ubuntu 22.04 server which has a GUI desktop installed. There are two NVIDIA A5500 RTX GPUs on the system.

When I run nsys-ui and select the localhost target, I get an error “Failed to launch daemon”. In the log file, I find:

If I monitor processes while the host application is trying to connect to the target daemon, I briefly see two instances of the daemon running, but then they disappear:

sscott@demo:~$ ps -ax | grep nsys
  45570 ?        S      0:00 /bin/bash /home/sscott/nsight-systems-2023.1.1/host-linux-x64/nsys-ui
  45615 ?        Sl     0:00 /home/sscott/nsight-systems-2023.1.1/host-linux-x64/CrashReporter --hide-stack NVIDIA Nsight Systems NsightSystems 2023.1.1 (Build 2023.1.1.127-32365746v0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x2023.1.1.127-32365746v0_Q05J870wHW0574h_BUILD_VERSION) /home/sscott/nsight-systems-2023.1.1/host-linux-x64/nsys-ui.bin
  45618 ?        Sl     0:00 /home/sscott/nsight-systems-2023.1.1/host-linux-x64/nsys-ui.bin
  45804 ?        S      0:00 /bin/sh -c LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/home/sscott/nsight-systems-2023.1.1/target-linux-x64  QUADD_INSTALL_DIR=/home/sscott/nsight-systems-2023.1.1/target-linux-x64 /home/sscott/nsight-systems-2023.1.1/target-linux-x64/nsys --daemon --lock_file /run/user/1001/nsys.lock 
  45805 ?        S      0:00 /home/sscott/nsight-systems-2023.1.1/target-linux-x64/nsys --daemon --lock_file /run/user/1001/nsys.lock
  45806 ?        Rsl    0:29 /home/sscott/nsight-systems-2023.1.1/target-linux-x64/nsys --daemon --lock_file /run/user/1001/nsys.lock
  45828 pts/5    S+     0:00 grep --color=auto nsys

sscott@demo:~$ ps -ax | grep nsys
  45570 ?        S      0:00 /bin/bash /home/sscott/nsight-systems-2023.1.1/host-linux-x64/nsys-ui
  45615 ?        Sl     0:00 /home/sscott/nsight-systems-2023.1.1/host-linux-x64/CrashReporter --hide-stack NVIDIA Nsight Systems NsightSystems 2023.1.1 (Build 2023.1.1.127-32365746v0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x2023.1.1.127-32365746v0_Q05J870wHW0574h_BUILD_VERSION) /home/sscott/nsight-systems-2023.1.1/host-linux-x64/nsys-ui.bin
  45618 ?        Sl     0:00 /home/sscott/nsight-systems-2023.1.1/host-linux-x64/nsys-ui.bin
  45850 pts/5    S+     0:00 grep --color=auto nsys

If I just run the cli version:

sscott@demo:~/esat-rx$ /home/sscott/nsight-systems-2023.1.1/target-linux-x64/nsys -v
NVIDIA Nsight Systems version 2023.1.1.127-32365746v0

sscott@demo:~/esat-rx$ /home/sscott/nsight-systems-2023.1.1/target-linux-x64/nsys profile ./esat
The timeout expired.

If I run an older version, which I think was installed with the 12.1 toolset:

sscott@demo:~/esat-rx$ which nsys
/usr/local/bin/nsys
sscott@demo:~/esat-rx$ nsys -v
NVIDIA Nsight Systems version 2022.4.2.50-32196742v0
sscott@demo:~/esat-rx$ nsys profile ./esat
Agent launcher failed.

I’m out of ideas. Does anyone know how I can get the tool to run?

Thanks.

hwilper · March 7, 2023, 6:43pm

Can you send us the output from running “nsys status -e”?

@liuyis can you take a look at this one?

sscott2 · March 7, 2023, 6:51pm

nsys installed with toolkit:

sscott@demo:~/esat-rx$ nsys status -e
Timestamp counter supported: Yes

CPU Profiling Environment Check
Root privilege: disabled
Linux Kernel Paranoid Level = 2
Linux Distribution = Ubuntu
Linux Kernel Version = 5.15.0-67-generic: OK
Linux perf_event_open syscall available: OK
Sampling trigger event available: OK
Intel(c) Last Branch Record support: Available
CPU Profiling Environment (process-tree): OK
CPU Profiling Environment (system-wide): Fail

See the product documentation at https://docs.nvidia.com/nsight-systems for more information,
including information on how to set the Linux Kernel Paranoid Level.
sscott@demo:~/esat-rx$ which nsys
/usr/local/bin/nsys

liuyis · March 8, 2023, 2:45am

Hi @sscott2, can you try the following commands and share the results:

/home/sscott/nsight-systems-2023.1.1/target-linux-x64/nsys profile echo 0

/home/sscott/nsight-systems-2023.1.1/target-linux-x64/nsys profile -t none ./esat

/home/sscott/nsight-systems-2023.1.1/target-linux-x64/nsys profile -t none -s none --cpuctxsw=none ./esat

these are not solutions, just to help us identifying the issue

sscott2 · March 8, 2023, 6:00pm

I attached a log in case that helps.

sscott@demo:~/esat-rx$ nsys profile echo 0
The timeout expired.
sscott@demo:~/esat-rx$ nsys profile -t none ./esat
The timeout expired.
nsys profile -t none -s none --cpuctxsw=none ./esat
The timeout expired.
sscott@demo:~/esat-rx$ nsys --version
NVIDIA Nsight Systems version 2023.1.1.127-32365746v0

nsys-ui.log (65.0 KB)

liuyis · March 15, 2023, 8:34am

Sorry for the delayed response, somehow I didn’t receive notification for this post.

The log you attached is helpful. I suspect it’s due to longer-than-expected driver initilization time, which can happen if the driver hasn’t been initialized before the collection. Could you share the result of nvidia-smi on your system?

Setting the driver to persistent mode may help, there are two ways:

(preferred) sudo nvidia-persistenced
nvidia-smi -pm 1

Could you try if any of them work?

sscott2 · March 15, 2023, 5:31pm

I believe nvidia-persistenced was already running as part of the original installation - do I need to do anything else? We do notice that it takes our application about 30 seconds to initialize the gpu every time we run. We thought this should be faster after the first time running after a reboot.

sscott@demo:~$ ps ax | grep persist
   1935 ?        Ss     0:00 /usr/bin/nvidia-persistenced --verbose
 173788 pts/46   S+     0:00 grep --color=auto persist

sscott@demo:~$ sudo systemctl status nvidia-persistenced
[sudo] password for sscott: 
● nvidia-persistenced.service - NVIDIA Persistence Daemon
     Loaded: loaded (/lib/systemd/system/nvidia-persistenced.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2023-03-08 17:04:39 UTC; 1 week 0 days ago
   Main PID: 1935 (nvidia-persiste)
      Tasks: 1 (limit: 308801)
     Memory: 812.0K
        CPU: 28ms
     CGroup: /system.slice/nvidia-persistenced.service
             └─1935 /usr/bin/nvidia-persistenced --verbose

Mar 08 17:04:39 demo nvidia-persistenced[1935]: Verbose syslog connection opened
Mar 08 17:04:39 demo nvidia-persistenced[1935]: Started (1935)
Mar 08 17:04:39 demo nvidia-persistenced[1935]: device 0000:31:00.0 - registered
Mar 08 17:04:39 demo nvidia-persistenced[1935]: device 0000:31:00.0 - persistence mode enabled.
Mar 08 17:04:39 demo nvidia-persistenced[1935]: device 0000:31:00.0 - NUMA memory onlined.
Mar 08 17:04:39 demo nvidia-persistenced[1935]: device 0000:4b:00.0 - registered
Mar 08 17:04:39 demo nvidia-persistenced[1935]: device 0000:4b:00.0 - persistence mode enabled.
Mar 08 17:04:39 demo nvidia-persistenced[1935]: device 0000:4b:00.0 - NUMA memory onlined.
Mar 08 17:04:39 demo nvidia-persistenced[1935]: Local RPC services initialized
Mar 08 17:04:39 demo systemd[1]: Started NVIDIA Persistence Daemon.

sscott@demo:~$ nvidia-smi
Wed Mar 15 17:15:48 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A5500                On | 00000000:31:00.0 Off |                  Off |
| 30%   32C    P8               17W / 230W|      6MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA RTX A5500                On | 00000000:4B:00.0 Off |                  Off |
| 30%   33C    P8               18W / 230W|      6MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      7007      G   /usr/lib/xorg/Xorg                            4MiB |
|    1   N/A  N/A      7007      G   /usr/lib/xorg/Xorg                            4MiB |
+---------------------------------------------------------------------------------------+

liuyis · March 16, 2023, 3:06am

Thanks for sharing the results. To explain the issue more clearly, in the log you shared, we do see a 32 seconds delay while Nsys is calling “cuInit” internally to force driver initialization, but Nsys has a 30-seconds timeout for it, that’s why it bailed out.

This should usually be very short if the system already has persistenced daemon running. I need to discuss internally to see why it’s not the case on your system.

liuyis · March 16, 2023, 3:20am

In the meantime could you build and run the following program on your system to see if the long delay also happens independently:

#include <cuda.h>

#include <chrono>
#include <cstdio>

int main()
{
    auto start = std::chrono::high_resolution_clock::now();
    CUresult result = cuInit(0);
    auto stop = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(stop - start);

    printf("Time: %ldms, Result: %s\n", duration.count(), result == CUDA_SUCCESS ? "success" : "failed");

    return 0;
}

Save it to cuInitTest.cpp and you can build it with nvcc -l cuda -o cuInitTest cuInitTest.cpp

sscott2 · March 16, 2023, 6:19pm

Here are the results - I ran it twice in a row.

sscott@demo:~/esat-rx$ ./cuInitTest 
Time: 32083ms, Result: success
sscott@demo:~/esat-rx$ ./cuInitTest 
Time: 32096ms, Result: success

liuyis · March 17, 2023, 3:37am

Thanks for the experiment, that shows the longer-than-expected delay for cuInit is indeed the root cause.

Per internal discussion we’ve only hit this previously when there’s a bad GPU device, do you know if that could be the case on your system?

Can you run strace -T -o /tmp/strace.txt matrixMul (matrixMul can be any simple CUDA program) and send us strace.txt ? That can help us find which GPU node causes the long delay.

If we can identify a problematic GPU node, the suggestion would be disabling that node before running Nsys. Or if that’s not possible, we will need to add a way in Nsys to allow extending the timeout to workaround it, but that won’t be available to you until next public release which is a few monthes later, unless you or your company has NDA with NVIDIA in which case we can share you an internal build.

sscott2 · March 17, 2023, 3:49am

strace.txt (39.8 KB)

I used your cuInitTest program - let me know if you want a different one.

We have two graphics cards, we see the long delay on startup on both cards.

liuyis · March 17, 2023, 4:17am

Thanks for sharing the result. I’m not seeing a problematic GPU device based on the log, however I do see the system calls that caused the long delay:

ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe4da090b0) = 0 <8.005325>
...
ioctl(4, _IOC(_IOC_NONE, 0, 0x25, 0), 0x7ffe4da0c350) = 0 <24.041903>

and tracking back I can see the FDs they were trying to control were nvidiactl and nvidia-uvm:

openat(AT_FDCWD, "/dev/nvidiactl", O_RDWR) = 3 <0.000019>
...
openat(AT_FDCWD, "/dev/nvidia-uvm", O_RDWR|O_CLOEXEC) = 4 <0.000020>

Unfortunately this is out of the scope of Nsight Systems development, I suggest reporting the issue to CUDA - NVIDIA Developer Forums and see if they have a suggestion. As long as the long delay for cuInit() can be eliminated, Nsys should work well.

On our side we can provide the WAR to increase the timeout limit as mentioned in Nsight system fails to connect to daemon - #11 by liuyis. Are you an NDA customer?

sscott2 · March 17, 2023, 3:22pm

We don’t have an NDA - Can you point me to the right place to get that process started?

I will post to the developer forum. Do you have a recommended category to post in?

liuyis · March 20, 2023, 2:36am

I saw someone from CUDA forum has been working with you on it. Note that you can point out the long delay is caused by cuInit() call in case they haven’t been aware of it. The fact that every CUDA app needs 32s initialization time is definitely not ideal so it’s best to get it fixed.

If that does not work out we can look into how to get you a customized build

sscott2 · March 20, 2023, 5:22pm

Yes, I’ve pointed it out to them a couple of times, but there seems to be a tendency to go off track. I’ll keep iterating with them, they have some valid things to consider. But we’re not converging and I’m feeling a bit of time pressure to get the profiler running, so could we begin the process of getting a customized build that just gets us past the problem for now?

liuyis · March 21, 2023, 2:39am

I’ll need to confirm internally to see what’s the process to share you a customized build or get you into NDA before that. I’ll give an update tomorrow.

sscott2 · March 22, 2023, 12:18am

ok. The discussion in the other forum appears to be a dead end.

liuyis · March 22, 2023, 3:56am

Could you share which country and company are you from? I need to reach out to legal team for the NDA process. Also, could you share your company email address?

sscott2 · March 22, 2023, 5:40pm

Company Info:
Serrano Systems Inc
5235 Avenida Encinas, Suite G
Carlsbad, CA 92008
U.S.A.

corporate contact is Neal Riedel, email: nriedel@serranosystems.com

Topic		Replies	Views
[QuadDCommon::tag_message*] = No GPU associated to the given UUID Profiling Linux Targets	24	907	November 5, 2024
Nsys cannot collect cuda information on Drive OS 5.1 DRIVE AGX Xavier General drive-devtools	62	3879	October 12, 2021
Nsight nsys not collecting any CUDA kernel data (2023.1.2.43-32377213v0) Profiling Linux Targets	19	2513	September 14, 2023
Help decipher logs(No GPU associated to the given GPU ID) Profiling Linux Targets	38	4628	November 28, 2022
Nsys fails with RNNT Profiling Linux Targets	9	827	November 13, 2022
Nsys is not collecting kernel data Profiling Linux Targets nsight , wsl	30	7218	March 5, 2025
Generating CUPTI_* tables with nsys Profiling Linux Targets cuda	25	1661	January 12, 2023
Long delays on CUDA app startup causing Nsight System to fail on startup CUDA Programming and Performance	37	1854	May 19, 2023
Nsys profile error: invalidArgumentException, unknown API driver activity Profiling Linux Targets nsight	17	3438	July 28, 2023
Nsys Does not Show the kernels output Profiling Embedded Targets	21	3151	October 20, 2022

Nsight system fails to connect to daemon

Related topics