Nvidia-smi - no devices were found - p620 - ubuntu 22.04

This is a fresh baremetal ubuntu install

lspci | grep -e VGA
01:00.0 VGA compatible controller: NVIDIA Corporation GP107GL [Quadro P620] (rev a1)

Installed drivers automatically

sudo ubuntu-drivers autoinstall

Is my card dead?
nvidia-bug-report.log (301.0 KB)

Hi j1mur,

Please use a dedicated installation method on Ubuntu instead of autoinstall. It is very likely that the kernel module compilation did not work correctly or some similar issue.

Please remove any currently installed NVIDIA drivers and kernel modules and install either through apt or through the “Software & Updates->Additional Dirvers” tab, using one of the proprietary NVIDIA drivers. NOT Open and NOT Server.

Follow instructions exactly!

I am sure that will resolve your issues.

Thanks

1 Like

Thanks for the tip, I used apt install nvidia-driver-515 on baremetal ubuntu desktop 22.04 and I got nvidia-smi to actually work! This is great, first time I feel like I got somewhere, and at least I feel like my card isn’t dead.

So I tried to go back to the actual application I needed which was passthrough to esxi, tried installing the exact same distro, used the exact same install command and no devices were found

device is detected.

$ sudo lshw -C display
  *-display
       description: VGA compatible controller
       product: GP107GL [Quadro P620]
       vendor: NVIDIA Corporation
       physical id: 2
       bus info: pci@0000:02:02.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress vga_controller bus_master cap_list
       configuration: driver=nvidia latency=0
       resources: irq:19 memory:fd000000-fdffffff memory:c0000000-cfffffff memory:d0000000-d1ffffff ioport:a80(size=128)

I realize there seems to be an issue with passthrough for virtualization and I got it working a few years ago, but recently i updated something and it broke and I’ve just been trying to to get it working again.
nvidia-bug-report.log (1.4 MB)

I’ve tried going through this thread but still get stuck with the same issue

Good to hear that you got some progress!

But something is still off. Your log file shows

Oct 13 00:36:15 plex kernel: [    2.350673] nvidia-nvlink: Nvlink Core is being initialized, major device number 238
Oct 13 00:36:15 plex kernel: [    2.469391] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  525.125.06  Tue May 30 05:11:37 UTC 2023
Oct 13 00:36:15 plex kernel: [    2.488974] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  525.125.06  Tue May 30 04:58:48 UTC 2023
Oct 13 00:36:15 plex kernel: [    2.492219] [drm] [nvidia-drm] [GPU ID 0x00000202] Loading driver
Oct 13 00:36:19 plex kernel: [    6.502401] NVRM: GPU 0000:02:02.0: RmInitAdapter failed! (0x23:0x65:1413)
Oct 13 00:36:19 plex kernel: [    6.502475] NVRM: GPU 0000:02:02.0: rm_init_adapter failed, device minor number 0
Oct 13 00:36:19 plex kernel: [    6.505738] [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000202] Failed to allocate NvKmsKapiDevice

exactly the same error as before and it is the 525 driver, not 515.

Alright, so I tried to spin up a popos, I ran sudo apt-get install nvidia-driver-515 and it installed 535. I tried to download the .run directly but couldn’t get it installed due to this error: Unable to load the kernel module ‘nvidia.ko’
535bugreport.log (1.9 MB)
I was able to install 470 but still get the same issue no devices were found
470bugreport.log (1.5 MB)

fresh install of Ubuntu 22.04.3 LTS (GNU/Linux 6.2.0-34-generic x86_64)

$ nvidia-smi
No devices were found
$ modinfo /usr/lib/modules/6.2.0-34-generic/updates/dkms/nvidia.ko | grep ^version
version:        470.199.02
$ nvidia-smi
No devices were found
$ modinfo /usr/lib/modules/6.2.0-34-generic/updates/dkms/nvidia.ko | grep ^version
version:        525.125.06

nvidia-bug-report.log (1.4 MB)

Tried the original os and kernel that it was working on before this issue began Ubuntu 20.04.3 LTS (GNU/Linux 5.4.0-164-generic x86_64)

$ modinfo /usr/lib/modules/5.4.0-164-generic/updates/dkms/nvidia.ko | grep ^version
version:        470.199.02

When this was last working for me, I used sudo apt install --no-install-recommends nvidia-cuda-toolkit nvidia-headless-460 nvidia-utils-460 libnvidia-encode-460
Now when I use this, it installs 470 automatically and nvidia-smi does not work…

nvidia-bug-report.log (1.3 MB)

I got it working on baremetal so it must be something with ESXi passthrough…

Things I’ve tried:

  • Enable IOMMU
  • Advanced configuration:
    • hypervisor.cpuid.v0=FALSE
    • pciPassthru0.msiEnabled=FALSE
  • Updated to ESXi 8u2