Ubuntu 22.04 Server with Desktop installed
I have been iterating with @liuyis in the Nsight Systems group regarding a problem with Nsight Systems nsys daemon startup.
After several experiments, it seems the failure happens because
the call to cuInit takes too long and Nsys times out, even though nvidia-persistenced is running.
Running strace while starting up a CUDA application revealed the system calls causing the long delay:
Thanks for sharing the result. I’m not seeing a problematic GPU device based on the log, however I do see the system calls that caused the long delay:
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe4da090b0) = 0 <8.005325> ... ioctl(4, _IOC(_IOC_NONE, 0, 0x25, 0), 0x7ffe4da0c350) = 0 <24.041903>
and tracking back I can see the FDs they were trying to control were
openat(AT_FDCWD, "/dev/nvidiactl", O_RDWR) = 3 <0.000019> ... openat(AT_FDCWD, "/dev/nvidia-uvm", O_RDWR|O_CLOEXEC) = 4 <0.000020>
Unfortunately this is out of the scope of Nsight Systems development, I suggest reporting the issue to CUDA - NVIDIA Developer Forums.
Is there someone here who can pick up this thread and help us resolve the issue? We also notice all our CUDA applications take about 30 seconds to startup, which is driving us crazy…
strace.txt (39.8 KB)