Issues loading driver on VMware virtualised Ubuntu 18.04

marc.richter · March 15, 2019, 2:57pm

Hi,

we have a system with 2 x GV100GL (Tesla V100 PCIe 16GB). This system is running with VMware ESXi 6.7. In that hypervisor, we have the GPU configured for “PCI Passthrough” and assigned one of the cards to a VM which is installed with Ubuntu 18.04 LTS. Once in that system, the card is recognized:

# lspci | grep NVIDIA
13:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)

I downloaded the driver NVIDIA-Linux-x86_64-418.43.run and installed it like this:

# ./NVIDIA-Linux-x86_64-418.43.run --no-opengl-files --dkms -s

At the end of that process, I see the following error:

ERROR: Unable to load the 'nvidia-drm' kernel module.
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

In that logfile, not more than these two lines are written, regarding the issue.

dmesg seems to have additional info, but neither do I understand what the issue means, nor can I find that on the net:

[  291.353568] nvidia-nvlink: Nvlink Core is being initialized, major device number 243
[  291.354057] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR1 is 0M @ 0x0 (PCI:0000:13:00.0)
[  291.354058] NVRM: The system BIOS may have misconfigured your GPU.
[  291.354062] nvidia: probe of 0000:13:00.0 failed with error -1
[  291.354076] NVRM: The NVIDIA probe routine failed for 1 device(s).
[  291.354076] NVRM: None of the NVIDIA graphics adapters were initialized!
[  291.354210] nvidia-nvlink: Unregistered the Nvlink Core, major device number 243

I could not find something on the net matching this virtualization setup and issue.
Please assist.

BR,
Marc
nvidia-bug-report.log.gz (46.8 KB)
nvidia-installer.log (2.2 KB)

generix · March 15, 2019, 3:34pm

I don’t think this is virtualization specific, the same problem has been reported several times recently for bare metal installs. At some time the kernel introduced a bug regarding resource allocation:

[    0.274904] pci 0000:13:00.0: BAR 1: no space for [mem size 0x400000000 64bit pref]
[    0.274956] pci 0000:13:00.0: BAR 1: failed to assign [mem size 0x400000000 64bit pref]

In your case, it’s trying to map 16GB, which doesn’t work.
At least to me, reason and circumstances are unknown. You can only try up/downgrading the kernel.

generix · March 15, 2019, 3:36pm

NB: On an Ubuntu system, you shouldn’t use the .run installer, instead add the Ubuntu graphics ppa and install the driver from there.

Topic		Replies	Views
Installing driver fails for Tesla V100 Linux	3	3605	October 12, 2021
Understanding open driver error load with V100 GPU (Ubuntu 22.04) Linux	2	50	November 5, 2024
NVIDIA Driver Installation Failure on Ubuntu22.04 VM with GTX 1080 Ti Linux	1	102	November 5, 2024
Tesla M6 on ESXi 6. Unable to load the kernel module Linux	3	1467	August 7, 2017
Unable to load the 'nvidia-drm' kernel module. Ubuntu 18.04 Linux	14	22964	October 12, 2021
Centos 7.7 Installation Tesla v100 graphics card driver failed Linux	18	1336	October 12, 2021
Ubuntu 20.04 driver installation: Unable to load the kernel module 'nvidia.ko' Linux ubuntu	0	1783	June 20, 2023
Ubuntu 22.04 installation driver error Nvidia[A10] Linux	4	2239	May 22, 2024
Ubuntu 18.04 Server : Nvidia driver module loaded but device not found Linux	1	1386	May 5, 2021
My nvidia driver is not loading Linux kernel , driver	3	2043	February 14, 2022

Issues loading driver on VMware virtualised Ubuntu 18.04

Related topics