Hi,
we have a system with 2 x GV100GL (Tesla V100 PCIe 16GB). This system is running with VMware ESXi 6.7. In that hypervisor, we have the GPU configured for “PCI Passthrough” and assigned one of the cards to a VM which is installed with Ubuntu 18.04 LTS. Once in that system, the card is recognized:
# lspci | grep NVIDIA
13:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)
I downloaded the driver NVIDIA-Linux-x86_64-418.43.run and installed it like this:
# ./NVIDIA-Linux-x86_64-418.43.run --no-opengl-files --dkms -s
At the end of that process, I see the following error:
ERROR: Unable to load the 'nvidia-drm' kernel module.
ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
In that logfile, not more than these two lines are written, regarding the issue.
dmesg seems to have additional info, but neither do I understand what the issue means, nor can I find that on the net:
[ 291.353568] nvidia-nvlink: Nvlink Core is being initialized, major device number 243
[ 291.354057] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:13:00.0)
[ 291.354058] NVRM: The system BIOS may have misconfigured your GPU.
[ 291.354062] nvidia: probe of 0000:13:00.0 failed with error -1
[ 291.354076] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 291.354076] NVRM: None of the NVIDIA graphics adapters were initialized!
[ 291.354210] nvidia-nvlink: Unregistered the Nvlink Core, major device number 243
I could not find something on the net matching this virtualization setup and issue.
Please assist.
BR,
Marc
nvidia-bug-report.log.gz (46.8 KB)
nvidia-installer.log (2.2 KB)