Ubuntu 22.04.3 VM nvidia-smi shows no devices found

Hello,
I’m trying to install the nvidia drivers for my Ubuntu server 22.04.3 build. attached is my bug-report log. Let me know what other info is needed. I had the device working at one point, moved to pass the gpu to a docker container, but things stopped working there.

I’ve tried a windows VM and that works without issue with passthrough so I do know the GPU and passthrough are functionally working.
I tried an ubuntu 20.04.6 LTS build and still have the same issue

Build:
ESXI host
Ubuntu VM 22.04.3 LTS
1070 Ti passed through to VM
Nouveau is blacklisted
Drivers: 535-server - was installed using: sudo apt install nvidia-driver-535-server
Now purged with sudo apt purge *nvidia* -y

nvidia-bug-report.log.gz (74.7 KB)

I can see the GPU within my VM and it appears as though the drivers are installed.
sudo lshw -c display

~$ sudo lshw -c display
  *-display                 
       description: VGA compatible controller
       product: SVGA II Adapter
       vendor: VMware
       physical id: f
       bus info: pci@0000:00:0f.0
       logical name: /dev/fb0
       version: 00
       width: 32 bits
       clock: 33MHz
       capabilities: vga_controller bus_master cap_list rom fb
       configuration: depth=32 driver=vmwgfx latency=64 resolution=1176,885
       resources: irq:16 ioport:840(size=16) memory:f0000000-f7ffffff memory:ff000000-ff7fffff memory:c0000-dffff
  *-display
       description: VGA compatible controller
       product: GP104 [GeForce GTX 1070 Ti]
       vendor: NVIDIA Corporation
       physical id: 1e
       bus info: pci@0000:02:05.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress vga_controller bus_master cap_list
       configuration: driver=nvidia latency=64
       resources: irq:18 memory:fd000000-fdffffff memory:c0000000-cfffffff memory:d0000000-d1ffffff ioport:a80(size=128)

sudo dmesg | grep nvidia

$ sudo dmesg | grep nvidia
[    1.643175] nvidia: loading out-of-tree module taints kernel.
[    1.644985] nvidia: module license 'NVIDIA' taints kernel.
[    1.673698] nvidia-nvlink: Nvlink Core is being initialized, major device number 236
[    1.676848] nvidia 0000:02:05.0: enabling device (0000 -> 0003)
[    1.678196] nvidia 0000:02:05.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[    1.871644] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  535.129.03  Thu Oct 19 18:42:12 UTC 2023
[    1.873920] [drm] [nvidia-drm] [GPU ID 0x00000205] Loading driver
[    1.873922] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:02:05.0 on minor 1
[    4.042793] nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint.
[    4.054637] nvidia-uvm: Loaded the UVM driver, major device number 234.
[    4.316376] audit: type=1400 audit(1703699386.464:2): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=756 comm="apparmor_parser"
[    4.316380] audit: type=1400 audit(1703699386.464:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=756 comm="apparmor_parser"

Edits: Extra clarity and detail

Friend, may I know if you have solved this issue? I am encountering the similar issue. My case:
ESXi 7.0, Ubuntu 20.04, RTX 3080 passThru works fine under the driver version 460.91.03 but would show “no devices were found” when running nvidia-smi under the driver version of 535.129.03, 535.146.02, 535.154.05, 550.40.07

For the kind assistance if someone has ideas to fix it. Thanks a lot.

nvidia-bug-report.log.gz (141.9 KB)
This is the bug report.

Please try the nvidia-open driver version and set kernel parameter nvidia.NVreg_OpenRmEnableUnsupportedGpus=1

I ended up moving to a quadro P1000. @billsen.xu let me know if generix’s answer works for you and I’ll flag that as solution.

on vmware esxi 8 :

  • pciPassthru.use64bitMMIO= TRUE
  • pciPassthru.64bitMMIOSizeGB = 64 #chage this value(64), if your gpu memory=12go the value must 12*2 = 24
  • hypervisor.cpuid.v0 = FALSE

#-------------------------------------------------------------

  • sudo apt update
  • sudo apt upgrade
  • sudo apt-get install alsa-utils

install instagp1.sh

  • sudo nano instagp1.sh #copy past content …
  • sudo nano instagp2.sh #copy past content …

after execute choose password as : ddd123456

  • chmod +x ./instagp1.sh ./instagp2.sh

#begin by only :

  • ./instagp1.sh

#after reboot …

on screen blue … change mode disable_lockdown (change state roll…)

enter one charactere of password : the system show index and you must enter the charcter correspand.

disable lockdown mode to yes .

---------

#install the part two :

  • ./instagp2.sh

normaly all work perfect.

instagp1.txt (1.3 KB)
instagp2.txt (4.4 KB)

I‘ve tried this but not work =(