Vmware | Ubuntu 22.04 LTS | Quadro P600 | nvidia-smi: no devices found

nvidia-bug-report.log.gz (140.8 KB)
Hi, I’ve noticed the nvenc powered transcoding now fails in my self-hosted media server (jellyfin running within docker on Ubuntu 22.04 LTS running inside vmware). I assume some update wrecked my setup, as it was working fine not long ago.
During initial setup I used nvidia-drivers-515.

So far I’ve tried uninstalling/purging then reinstalling all nvidia and libnvidia packages, both 515 and 525 versions (never used “-open” drivers).
apt remove --purge '^nvidia-.‘*
apt remove --purge '^libnvidia-.’*
apt install nvidia-driver-525

Tried setting kenel parameter: nvidia.NVreg_OpenRmEnableUnsupportedGpus=1
sudo vi /etc/modprobe.d/nvidia.conf
options nvidia NVreg_OpenRmEnableUnsupportedGpus=1
sudo update-initramfs -u

Tried blacklisting nouveau
sudo vi /etc/modprobe.d/blacklist-nvidia-nouveau.conf
blacklist nouveau
options nouveau modeset=0
sudo update-initramfs -u

sudo lspci -d “10de:*” -v -xxx
03:00.0 VGA compatible controller: NVIDIA Corporation GP107GL [Quadro P600] (rev a1) (prog-if 00 [VGA controller])
DeviceName: pciPassthru0
Subsystem: NVIDIA Corporation GP107GL [Quadro P600]
Physical Slot: 160
Flags: bus master, fast devsel, latency 64, IRQ 18
Memory at fc000000 (32-bit, non-prefetchable) [size=16M]
Memory at d0000000 (64-bit, prefetchable) [size=256M]
Memory at e4000000 (64-bit, prefetchable) [size=32M]
I/O ports at 4000 [size=128]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [128] Power Budgeting <?> Capabilities: [420] Advanced Error Reporting Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] Secondary PCI Express
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

systemctl status nvidia-persistenced.service
● nvidia-persistenced.service - NVIDIA Persistence Daemon
Loaded: loaded (/lib/systemd/system/nvidia-persistenced.service; static)
Active: active (running) since Sun 2023-04-09 22:48:25 CEST; 7min ago
Process: 834 ExecStart=/usr/bin/nvidia-persistenced --user nvidia-persistenced --no-persistence-mode --verbose (code=exited, status=0/SUCCESS)
Main PID: 835 (nvidia-persiste)
Tasks: 1 (limit: 4466)
Memory: 804.0K
CPU: 3ms
CGroup: /system.slice/nvidia-persistenced.service
└─835 /usr/bin/nvidia-persistenced --user nvidia-persistenced --no-persistence-mode --verbose

sudo nvidia-smi
No devices were found

Can somebody point out what am I missing?

TL:DR: reboot your hypervisor

Hi all, replying to myself. Managed to fixed this today by:

  1. shutting down all running VMs on host
  2. enabling maintenance mode in esxi
  3. rebooting the server
  4. fired up the VM after reboot and the GPU was recognized OK.

Unsure whether the maintenance mode part was necessary or not. Maybe all it needed was a reboot.

Cheers,
Peter

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.