Centos 7.7 Installation Tesla v100 graphics card driver failed

jeanClaude · April 10, 2020, 10:20am

I have a VM with CentOS 7.7 where I want to install nvidia driver 440, which is hosted by an ESXI host with nvidia driver already installed. When I try to install the nvidia driver on VM, I get this:
Error: Unable to load the ‘nvidia-drm’ kernel module.

generix · April 10, 2020, 3:23pm

You’ll have to hide the hypervisor.

jeanClaude · April 13, 2020, 1:36pm

Hello! What do you mean? What I have to do for hidding the hypervisor?

generix · April 13, 2020, 2:51pm

Forget about that, Teslas should work without.
Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post. You will have to rename the file ending to something else since the forum software doesn’t accept .gz files (nifty!).

jeanClaude · April 13, 2020, 3:26pm

nvidia-bug-report.log.log (54.3 KB) nvidia-bug-report.log.log (54.3 KB)

generix · April 13, 2020, 3:35pm

You’re running into
This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:02:00.0)

Please check your bios for an option like “above 4G decoding” or “Large/64bit BARs” and enable it.
What kind of host mainboard/server are you using?

jeanClaude · April 13, 2020, 3:40pm

Dell PowerEdge R740

generix · April 13, 2020, 3:50pm

Might be tricky:
https://www.dell.com/community/PowerEdge-Hardware-General/Enabling-Memory-Mapped-IO-gt-4GB-has-issues-on-R720/td-p/4468413

jeanClaude · April 13, 2020, 4:03pm

I found that above 4G decoding and is Enabled. The problem isn’t on my host, because my host detects the Nvidia card, also I’ve installed nvdia driver on it, and everything it is ok. But when I try to install a VM on the host, then the problem show up. The VM detect de PCI Nvidia, but when I try to install the driver is the problem.

generix · April 13, 2020, 6:37pm

Then please check if you enabled the correct options for the vm:
https://kb.vmware.com/s/article/2139299
Edit: updated article on that:
https://kb.vmware.com/s/article/2142307

jeanClaude · April 13, 2020, 7:13pm

This is not working. I don’t want to use passthrough. Thanks!

generix · April 13, 2020, 7:30pm

This is also valid for vgpu setups, please see:
https://docs.nvidia.com/grid/latest/grid-vgpu-release-notes-vmware-vsphere/index.html
->" Requirements for Using C-Series Virutal Compute Server vGPUs"

jeanClaude · April 15, 2020, 3:11pm

Hello! I modified what you say, but still not working. If I choose Quadro vDWS, or GRID Virtual Application, everything is fine. The problem is when I want to use Virtual Compute Server.

generix · April 15, 2020, 5:40pm

which esxi version are you running?

jeanClaude · April 15, 2020, 6:03pm

6.7

jeanClaude · April 15, 2020, 6:04pm

Server

• GRID card model(s) = Tesla V100-PCIE-16GB
• Server Brand / Model / Memory per server = Dell PowerEdge R740
• Number of GRID cards and models installed per server = 1 GRID card → Tesla V100-PCIE-16GB
• Virtualization platform / Hypervisor = vSphere 6.7.0, 15160138 Hypervisor
• Patches applied over host hypervisor (if any) = No patches;
• vSGA, vGPU, VDA, DDA, HDX 3D Pro, RemoteFX, Bare Metal or Pass-through = vGPU
• vGPU Manager driver version (vib/rpm) = NVIDIA-VMware_ESXi_6.7_Host_Driver-430.83-1OEM.670.0.0.8169922x86_64.vib installed
• vGPU profile used for each GPU = 1
• Type of Profile used / Number of VMs using each vGPU profile = 1 VM
• DRS Enabled (if part of ESXi cluster) = No

VM

• Display driver version = NVIDIA-Linux-x86_64-430.83-grid.run
• OS / Version = CentOS 7.7
• System Memory = 32 GB
• Number of vCPUs =16
• Number of displays / Display resolution = -
• Remoting Solution / Method of connecting to VM = ssh
• Version or Release of Remoting Solution = -
• Name of VM having issue (if applicable) = vGPU-AI-ML-01

License Server

• License Manager Software version = 2019.11.0.27609837; Build Number:27609831
• OS / Version = CentOS Linux release 7.7
• VM or physical PC = VM

nvidia-bug-report.log.gz

Guide used for installation: 430.83-432.33-grid-vgpu-user-guide.pdf

Steps for installation:

NVIDIA Virtual GPU Manager Package for vSphere → done
Verifying the Installation of the NVIDIA vGPU Software
Package for vSphere → done
Configuring VMware vMotion with vGPU for
VMware vSphere → done
Changing the Default Graphics Type in VMware
vSphere 6.7 → done
Configuring a vSphere VM with NVIDIA GPU → done
And now the problems:
After I have configured vSphere with GPU, I have started the VM with CentOS 7.7. After the VM has booted, the installation of NVIDIA GPU has failed.
The problem: ERROR: Unable to load the ‘nvidia-drm’ kernel module.
On the VM I have configured the following:
->Nvidia graphic card model is displayed
->Disabled nouveau driver by changing the configuration /etc/default/grub file. Add the nouveau.modeset=0 into line starting with GRUB_CMDLINE_LINUX.

generix · April 15, 2020, 6:50pm

6.7 should handle that by itself.
Looking at the early dmesg output again, you’re not using efi but csm to boot. With old bios boot, this not goning to work. No 64bit resources available. Please properly configure and install your vm with efi boot.

jeanClaude · April 21, 2020, 5:05am

I reinstalled the vm with efi boot and I set these:firmware=“efi" and pciPassthru.64bitMMIOSizeGB = “128" and now everything is fine.Thanks again!

Topic		Replies	Views
Issue with installing NVIDIA Graphics Driver Linux	4	1512	December 24, 2020
redhat 7.5 Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 (390.67) Fai on Guest OS in V... Linux	5	636	October 12, 2021
RedHat 7.4 installation tesla v100 graphics card driver failed Linux	1	1046	May 27, 2019
Installing driver fails for Tesla V100 Linux	3	3897	October 12, 2021
VMware ESXi V7.0U3d Nvidia Driver Linux 510 wont load Linux pcie , cuda , kernel , ubuntu , nvbugs	4	1258	May 17, 2022
Issues loading driver on VMware virtualised Ubuntu 18.04 Linux	2	1738	March 15, 2019
Installing Tesla P40 VGPU on RHEL 8.7 Linux	5	2223	March 1, 2023
I am using vmware 7 and trying to get a Tesla p6 Nvidia driver installed on redhat 7 or 8 virtual General Discussion	2	1100	December 9, 2022
ESXi 6.7 + Tesla V100 + 430.27 not working NVIDIA Virtual GPU Drivers	8	15184	July 23, 2019
redhat 7.5 Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 (390.67) Fai on Guest OS in V NVIDIA Virtual GPU Drivers	0	1978	June 6, 2019

Centos 7.7 Installation Tesla v100 graphics card driver failed

Related topics