Nvidia-smi "No devices were found" - VMWare ESXI Ubuntu Server 20.04.03 with RTX3070

It seems work on workstation level card.


A4000 + Ubuntu 20 + ESXi 7.0U3

A5000 + Ubuntu 18 + ESXi 7.0U3

hello,I met the same question when i use 3080 on ubuntu 20.04.4 ESXi6. 7.U2 . nvidia-driver is 470.57.01. I want to use ubuntu20.04.3 and nvidia-driver6-460. Do you have installation package about ubuntu 20.04.3.iso and nvidia-driver-460 ?could you email to me (qq:1210586191)?

Hi,

I have just the same problem with my 1650. Trying right now with the 460 driver.

Update: 460 driver (first disable nouveau) works. I hope there wil be a fix fast for the new drivers.

Any Updates?
all Drivers above 470.57 wont work with ESXI → Ubuntu and my RTX 30er Series.
on Windows VMs no Problems.
need the new drivers (5XX)

i have the same issue, need working 510 drivers for ESXi

What changes were made in 470.86 because 470.82.01 works but 470.86 upwards doesn’t?

I’m just here to throw some wood on the fire.

Guess what? The beta release of 515.43.04 open source driver works with Ampere cards, but the proprietary driver shipped in the same installer does not. Hmmmmmmmmmmmmmmmmmmm

I am in the same position as others in the thread trying to use a 3090 in ESXi 7.0U3c with DirectPath I/O (PCIe passthrough). I confirm that the 470.82 drivers were the last proprietary drivers to work until 515.43.04/kernel-open I also want to reiterate that the properietary kernel driver 515.43/04/kernel still fails with the same errors as discussed in this thread.

Per the instructions here, install with: ./NVIDIA-Linux-x86_64-515.43.04.run -m=kernel-open and make sure to include options nvidia NVreg_OpenRmEnableUnsupportedGpus=1 in /etc/modprobe.d/nvidia

I lowkey suspect this whole “bug” is making it difficult to do this by design, and whatever chicanery NVidia is up to in the proprietary driver isn’t gonna fly on open source. Hooray for open source!

$ dmesg | grep vmware
[ 0.000000] vmware: TSC freq read from hypervisor : 2992.968 MHz
[ 0.000000] vmware: Host bus clock speed read from hypervisor : 66000000 Hz
[ 0.000000] vmware: using clock offset of 18001067827 ns
[ 5.414545] systemd[1]: Detected virtualization vmware.

$ nvidia-smi
Thu May 19 16:38:19 2022
±----------------------------------------------------------------------------+
| NVIDIA-SMI 515.43.04 Driver Version: 515.43.04 CUDA Version: 11.7 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce … Off | 00000000:03:00.0 Off | N/A |
| 30% 38C P8 35W / 350W | 5MiB / 24576MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2942 G /usr/lib/xorg/Xorg 4MiB |
±----------------------------------------------------------------------------+

1 Like

Had the same issue, running RTX 3060ti with ubuntu 20.04 on esxi. Hardware is Asus x370 crosshair 3700x. Was working fine, then updated and no longer had a gpu. As of 8/15/22 I tried updating to the latest patch of esxi, was on 7.0 update 1 and installed to update 3 U3f, no change. Tried 22.04 and Manjaro with preconfigured 515.65.01 drivers, no luck. I was able to downgrade to 460.84 in Manjaro and get that to work. I can also confirm opensource driver 515.65.01 does work on Manjaro with kernel 5.4.210-1 using the manual install path from guide https://linuxconfig.org/how-to-install-the-nvidia-drivers-on-manjaro-linux and using -m=kernel-open with the .run download. Then added options nvidia NVreg_OpenRmEnableUnsupportedGpus=1 in /etc/modprobe.d/nvidia.conf
and rebooted and I now have a working nvidia-smi output.

Hi! Is it possible to explain how you install the (latest) open source drivers on a ubuntu server? Thanks!

So this is the full solution! use the open source drivers of nvidia. Here is how to do it (on my ubuntu server):
First Disable nouveau & enable unsuported GPU’s for open source drivers:
1. Go to: /etc/modprobe.d/
2. Make a file: blacklist-nvidia-nouveau.conf
3. Put this in the file:
blacklist nouveau
options nouveau modeset=0
4. Make a other “nvidia.conf” file and put this in the file: options nvidia NVreg_OpenRmEnableUnsupportedGpus=1
5. Updat kernel init ram fs: sudo update-initramfs -u
6. Reboot
7. Go to the Nvidia site to the page to download the driver that you want.
8. Copy the URL of the download butten and past it behind wget to download it to the current folder
9. Sudo chmod 700 the file
10. Run the install file: sudo .\filename.run
a. For open source: sudo .\filename.run -m=kernel-open
b. Watch out! See that the nvidia.conf file exist in the modprobe folder and you have rebooted (and run the update-initramfs commando). Then only the GTX/RTX/QUADRO cards wil work!
11. After the instalation reboot the server
12. Test with nvidia-smi
13. Great succes!

4 Likes

thanks for Excellent Anser. blacklist-nvidia-nouveau.conf & sudo update-initramfs -u is needed for ubuntu 22.04, but needless for 20.04

Successfully got it running on Ubuntu 22.04 using the following command

touch /etc/modprobe.d/blacklist-nvidia-nouveau.conf
cat >> /etc/modprobe.d/blacklist-nvidia-nouveau.conf << EOF
blacklist nouveau
options nouveau modeset=0
EOF
touch /etc/modprobe.d/nvidia.conf
cat >> /etc/modprobe.d/nvidia.conf << EOF
options nvidia NVreg_OpenRmEnableUnsupportedGpus=1
EOF
sudo update-initramfs -u
sudo apt install axel 
sudo reboot
axel -n 2 https://download.nvidia.com/XFree86/Linux-x86_64/525.89.02/NVIDIA-Linux-x86_64-525.89.02.run
sudo chmod u+x NVIDIA-Linux-x86_64-525.89.02.run
sudo apt install build-essential
sudo apt install pkg-config libglvnd-dev
./NVIDIA-Linux-x86_64-525.89.02.run  -m=kernel-open

It was a major pain to get the drivers to work in the first place, but eventually got them “working.” I cannot get the GPU fans to spin; however, and I’m wondering if anyone else has run into the issue.
Running an RTX 4080 in ESXI in PCIe passthrough to an Ubuntu Server VM.

  • CPU - EPYC 7302P
  • Motherboard - Supermicro MBDH12SSLNTO
nvidia-smi
Thu Apr 13 20:05:21 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.89.02    Driver Version: 525.89.02    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:0B:00.0 Off |                  N/A |
|ERR!   28C    P0    32W / 320W |      1MiB / 16376MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

This was installed with the open source kernel. Proprietary kernel, the GPUs would never be recognized.

cat /proc/driver/nvidia/version

NVRM version: NVIDIA UNIX Open Kernel Module for x86_64  525.89.02  Release Build  (dvs-builder@U16-F11-34-6)  Wed Feb  1 23:19:51 UTC 2023
GCC version:  gcc version 11.3.0 (Ubuntu 11.3.0-1ubuntu1~22.04)

nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0

It was installed with the following options

cat /etc/modprobe.d/blacklist-nvidia-nouveau.conf

blacklist nouveau
options nouveau modeset=0
cat /etc/modprobe.d/nvidia.conf
options nvidia NVreg_OpenRmEnableUnsupportedGpus=1

ESXI VM has the following advanced options set:

pciPassthru.use64bitMMIO = TRUE
pciPassthru.64bitMMIOSizeGB = 64
pciPassthru.msiEnabled = FALSE
hypervisor.cpuid.v0 = FALSE

I’m able to run programs that use the GPU, but the GPU fans refuse to spin, as you can see in the ERR! in the nvidia-smi screenshot. I’ve tried
Spoofing xorgs using various versions of coolgpus, and making necessary modifications to get it to work. It always errors out on setting the fan speed with an Unknown Error
I’ve tried controlling the fans through IPMI with superfans-gpu-controller and manually. This also does not work. FANA, which I’m guessing is the peripheral fan (GPU), remains at 0 RPM even with manual raw commands and the IPMI GUI set to full speed.

The GPU fans spin when the VM is OFF. I’m out of ideas on how to fix the fan issue.
nvidia-bug-report.log.gz (495.7 KB)

I follow steps from nvidia1855 and get this error while installing drivers downloaded from nvidia.

using EPYC 75551p and supermicro h1ssl-i

ERROR: Unable to load the kernel module ‘nvidia-modeset.ko’. This happens most frequently when this kernel module was built against the wrong or
improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if another driver,
such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA device(s), or no NVIDIA device
installed in this system is supported by this NVIDIA Linux graphics driver release.

     Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more
     information.

any idea on how to fix and passthrough rtx 3060 to ubuntu linux vm?

1 Like

so the latest driver from ubuntu 22.04 for rtx 3060 wont work at all on vmware esxi?

same issue here ubuntu 22.04 with rtx 3060 on vmware esxi not working

how you get that working? i have a rtx 3060 with ubuntu 22.04 vmware esxi but cant make it work :(

followed all that and im getting this

nvidia-smi

NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

anyone got his working on any linux vm under esxi? windows vm works fine!

Hi I am having issues getting my P2200 Quadro working via GPU passthrough on ESXi 7.03N.

I have built a fresh VM of Ubuntu 22.0.4,
Blacklisted the nouveau drivers with a config file at /etc/modprobe.d
created the nvidia config as mentioned above.
Install the driver via sudo apt install nvidia-driver-535
Seems to work but if you reboot the VM it stops working and the following erros in DMESG.
I tried an older driver which seems to display my card in /dev/dri/
but get the “no devices were found” when I run Nvidia smi

1.309793] nvidia: loading out-of-tree module taints kernel.
[    1.309814] nvidia: module license 'NVIDIA' taints kernel.
[    1.309831] Disabling lock debugging due to kernel taint
[    1.326807] nvidia: unknown parameter 'NVreg_OpenRmEnableUnsupportedGpus' ignored
[    1.447568] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  470.199.02  Thu May 11 11:46:56 UTC 2023
[    3.690310] nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint.
[   24.901054] ACPI Warning: \_SB.PCI0.PE50.S1F0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20210730/nsarguments-61)
[   25.112573] NVRM: GPU 0000:0b:00.0: RmInitAdapter failed! (0x23:0xffff:1195)
[   25.112634] NVRM: GPU 0000:0b:00.0: rm_init_adapter failed, device minor number 0
[   25.237637] NVRM: GPU 0000:0b:00.0: RmInitAdapter failed! (0x23:0xffff:1195)
[   25.237698] NVRM: GPU 0000:0b:00.0: rm_init_adapter failed, device minor number 0