Nvidia-smi No Devices Found

I have an Ubuntu 20.04 VM under ESXi with a Quadro P400 passed through and for a couple years, it worked beautifully for transcoding for Plex. Suddenly, I suspect due to automatic updates being configured for this VM, it suddenly stopped working and the output of nvidia-smi is simply “No devices Found”. After hours of troubleshooting, I decided to spin a Debian 12 VM and start fresh following this guide: https://phoenixnap.com/kb/nvidia-drivers-debian

Unfortunately, it doesn’t work and the result is the same.

lspci -nn | egrep -i "3d|display|vga"

00:0f.0 VGA compatible controller [0300]: VMware SVGA II Adapter [15ad:0405]
13:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107GL [Quadro P400] [10de:1cb3] (rev a1)
nvidia-detect
Detected NVIDIA GPUs:
13:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107GL [Quadro P400] [10de:1cb3] (rev a1)

Checking card:  NVIDIA Corporation GP107GL [Quadro P400] (rev a1)
Your card is supported by all driver versions.
Your card is also supported by the Tesla 470 drivers series.
It is recommended to install the
    nvidia-driver
lsmod | grep nvidia
nvidia_uvm           1540096  0
nvidia_drm             77824  0
nvidia_modeset       1314816  1 nvidia_drm
video                  65536  1 nvidia_modeset
nvidia              56795136  2 nvidia_uvm,nvidia_modeset
drm_kms_helper        212992  4 vmwgfx,nvidia_drm
drm                   614400  8 vmwgfx,drm_kms_helper,nvidia,drm_ttm_helper,nvidia_drm,ttm
sudo dmesg | grep -i nvidia
[    2.130300] nvidia: loading out-of-tree module taints kernel.
[    2.130313] nvidia: module license 'NVIDIA' taints kernel.
[    2.251731] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[    2.447507] nvidia-nvlink: Nvlink Core is being initialized, major device number 245
[    2.456455] nvidia 0000:13:00.0: enabling device (0000 -> 0003)
[    2.457208] nvidia 0000:13:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[    2.581646] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  535.216.01  Tue Sep 17 16:54:04 UTC 2024
[    2.601294] audit: type=1400 audit(1743860140.695:2): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=490 comm="apparmor_parser"
[    2.601298] audit: type=1400 audit(1743860140.695:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=490 comm="apparmor_parser"
[    2.963338] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  535.216.01  Tue Sep 17 16:46:49 UTC 2024
[    3.309517] [drm] [nvidia-drm] [GPU ID 0x00001300] Loading driver
[    3.309520] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:13:00.0 on minor 1
[   26.703549] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
[   26.737104] nvidia-uvm: Loaded the UVM driver, major device number 243.
nvidia-smi
No devices were found


Could this be hardware related even though the host sees the GPU?

Logs:

nvidia-bug-report.log (870.9 KB)

the key entry in your logs is this one:

Apr 05 09:35:45 debian kernel: NVRM: GPU 0000:13:00.0: RmInitAdapter failed! (0x23:0x65:1438)

Unfortunately the meaning of these codes (0x23:0x65:1438) is not public and only Nvidia engs are able to understand them…

Debian-12 is a pretty outdated system with an ancient 6.1 kernel: better try Debian-13 “trixie” combinded with the newest 570 driver (just follow the instructions for 12).

I made some progress ever since I started troubleshooting. I discovered that if I reboot the hypervisor host, everything works flawlessly but if I reboot the VM, it breaks again. I found a few threads of people complaining of the exact issue.

1 Like

I decided to spin up a fresh Ubuntu 22.04 VM and upon installing 535 version, I get No devices were found.

● nvidia-persistenced.service - NVIDIA Persistence Daemon
     Loaded: loaded (/lib/systemd/system/nvidia-persistenced.service; static)
     Active: active (running) since Sat 2025-04-05 21:24:04 UTC; 1min 15s ago
    Process: 797 ExecStart=/usr/bin/nvidia-persistenced --user nvidia-persistenced --no-persistence-mode --verbose (cod>
   Main PID: 799 (nvidia-persiste)
      Tasks: 1 (limit: 2121)
     Memory: 840.0K
        CPU: 2ms
     CGroup: /system.slice/nvidia-persistenced.service
             └─799 /usr/bin/nvidia-persistenced --user nvidia-persistenced --no-persistence-mode --verbose

Apr 05 21:24:04 u2204olex systemd[1]: Starting NVIDIA Persistence Daemon...
Apr 05 21:24:04 u2204olex nvidia-persistenced[799]: Verbose syslog connection opened
Apr 05 21:24:04 u2204olex nvidia-persistenced[799]: Now running with user ID 114 and group ID 119
Apr 05 21:24:04 u2204olex nvidia-persistenced[799]: Started (799)
Apr 05 21:24:04 u2204olex nvidia-persistenced[799]: device 0000:13:00.0 - registered
Apr 05 21:24:04 u2204olex nvidia-persistenced[799]: Local RPC services initialized
Apr 05 21:24:04 u2204olex systemd[1]: Started NVIDIA Persistence Daemon.
nvidia-smi
No devices were found
lspci -nn | egrep -i "3d|display|vga"
00:0f.0 VGA compatible controller [0300]: VMware SVGA II Adapter [15ad:0405]
13:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107GL [Quadro P400] [10de:1cb3] (rev a1)

@morgwai666 After rebooting the ESXi Host:

nvidia-smi
Sat Apr  5 22:37:55 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.120                Driver Version: 550.120        CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Quadro P400                    Off |   00000000:13:00.0 Off |                  N/A |
| 34%   37C    P8             N/A /  N/A  |       2MiB /   2048MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found  

After rebooting the VM:

nvidia-smi
No devices were found
nvidia-persistenced.service - NVIDIA Persistence Daemon
     Loaded: loaded (/lib/systemd/system/nvidia-persistenced.service; static)
     Active: active (running) since Sat 2025-04-05 22:38:28 UTC; 1min 12s ago
    Process: 791 ExecStart=/usr/bin/nvidia-persistenced --user nvidia-persistenced --no-persistence-mode --verbose (cod>
   Main PID: 798 (nvidia-persiste)
      Tasks: 1 (limit: 2125)
     Memory: 968.0K
        CPU: 3ms
     CGroup: /system.slice/nvidia-persistenced.service
             └─798 /usr/bin/nvidia-persistenced --user nvidia-persistenced --no-persistence-mode --verbose

Apr 05 22:38:28 u2204olex systemd[1]: Starting NVIDIA Persistence Daemon...
Apr 05 22:38:28 u2204olex nvidia-persistenced[798]: Verbose syslog connection opened
Apr 05 22:38:28 u2204olex nvidia-persistenced[798]: Now running with user ID 114 and group ID 119
Apr 05 22:38:28 u2204olex nvidia-persistenced[798]: Started (798)
Apr 05 22:38:28 u2204olex nvidia-persistenced[798]: device 0000:13:00.0 - registered
Apr 05 22:38:28 u2204olex nvidia-persistenced[798]: Local RPC services initialized
Apr 05 22:38:28 u2204olex systemd[1]: Started NVIDIA Persistence Daemon.

Strange don’t you agree?