The first time a process interacts with cuda, it seems to cause a 10s-of-seconds hang. For instance, cupy.cuda.runtime.getDeviceCount() takes over 60 seconds the first time it is called, but subsequent calls within the same process are fast. A basic hello-world cuda-example has the same symptom. I was able to use Nsight Compute and determined that cuInit
is the culprit. It takes 30-60 seconds although it returns a success code. The only exception is that nvidia-smi
is able to show status immediately.
This is not a headless node. It is running Ubuntu 20.04. I have tried nvidia-smi -pm 0 and 1, with no effect.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.86 Driver Version: 470.86 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA RTX A5000 On | 00000000:19:00.0 Off | Off |
| 30% 29C P8 16W / 230W | 10MiB / 24256MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA RTX A5000 On | 00000000:1A:00.0 Off | Off |
| 30% 35C P8 16W / 230W | 10MiB / 24256MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA RTX A5000 On | 00000000:67:00.0 Off | Off |
| 30% 39C P8 18W / 230W | 10MiB / 24256MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA RTX A5000 On | 00000000:68:00.0 Off | Off |
| 30% 40C P8 21W / 230W | 214MiB / 24253MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1808 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 11327 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 1808 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 11327 G /usr/lib/xorg/Xorg 4MiB |
| 2 N/A N/A 1808 G /usr/lib/xorg/Xorg 4MiB |
| 2 N/A N/A 11327 G /usr/lib/xorg/Xorg 4MiB |
| 3 N/A N/A 1808 G /usr/lib/xorg/Xorg 57MiB |
| 3 N/A N/A 11327 G /usr/lib/xorg/Xorg 121MiB |
| 3 N/A N/A 11481 G /usr/bin/gnome-shell 24MiB |
+-----------------------------------------------------------------------------+