Unable to load the 'nvidia-drm' kernel module on CentOS 7

I want setup GPU with Tesla K80 on the computer is Dell T7500 workstation, 144GB RAM, CentOS 7 64bit. When install the drive NVIDIA-Linux-x86-64-384.145.run, it shows “ERROR: Unable to load the ‘nvidia-drm’ kernel module”.
I did:
Compare running kernel
sudo systemctl isolate multi-user.target
yum -y install dkms
/etc/modprobe.d/blacklist.conf
blacklist nouveau
nouveau modeset=0
/etc/default/grub
rd.driver.blacklist=nouveau nouveau.modeset=0

Please help me figure my problem out. Thanks

nvidia-installer.log (8.99 KB)

Please run nvidia-bug-report.sh as root and attach the resulting .gz file to your post. Hovering the mouse over an existing post of yours will reveal a paperclip icon.
https://devtalk.nvidia.com/default/topic/1043347/announcements/attaching-files-to-forum-topics-posts/

Hello Generix,
I run the command as you suggestion but there does not have this directory. below is the result. Also, I attached the log file of NVIDIA driver installation.
[wge@NCU0124369 Downloads]$ sudo sh nvidia-bug-report.sh
[sudo] password for wge:

nvidia-bug-report.sh will now collect information about your
system and create the file ‘nvidia-bug-report.log.gz’ in the current
directory. It may take several seconds to run. In some
cases, it may hang trying to capture data generated dynamically
by the Linux kernel and/or the NVIDIA kernel module. While
the bug report log file will be incomplete if this happens, it
may still contain enough data to diagnose your problem.

Please include the ‘nvidia-bug-report.log.gz’ log file when reporting
your bug via the NVIDIA Linux forum (see devtalk.nvidia.com)
or by sending email to ‘linux-bugs@nvidia.com’.

Running nvidia-bug-report.sh…ls: cannot access /proc/driver/nvidia/./gpus/: No such file or directory

If the bug report script hangs after this point consider running with
–safe-mode and --extra-system-data command line arguments.

complete.

[wge@NCU0124369 Downloads]$ dir /proc/driver/
nvram rtc

nvidia-installer.log (8.99 KB)

From the installer log:

WARNING: The NVIDIA Quadro NVS 420 GPU installed in this system is supported through the NVIDIA 340.xx legacy Linux graphics drivers.

So while the Tesla would work with the 384 driver, your graphics card won’t.

Thanks for you mention that. On this computer has a video card for the monitor. maybe I have to remove this card before install the Nvidia driver. I found the result of nvidia-bug-report.sh. I will attached it.
nvidia-bug-report.log (563 KB)

I reboot this computer and removed the video card and try install NVIDIA driver again. It is still shows “ERROR: Unable to load the ‘nvidia-drm’ kernel module”.
nvidia-installer.log (8.68 KB)

Make sure that secure boot is disabled.
Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file.

there is the new report from nvidia-bug-report.sh.
nvidia-bug-report.log.gz (72.4 KB)
nvidia-installer.log (8.68 KB)

The kernel fails to properly assign resources to the Teslas:

[    1.530702] pci 0000:06:00.0: BAR 1: no space for [mem size 0x400000000 64bit pref]
[    1.530704] pci 0000:06:00.0: BAR 1: trying firmware assignment [mem 0xf490000000000000-0xf4900003ffffffff 64bit pref]
[    1.530707] pci 0000:06:00.0: BAR 1: [mem 0xf490000000000000-0xf4900003ffffffff 64bit pref] conflicts with PCI mem [mem 0x00000000-0xffffffffff]
[    1.530708] pci 0000:06:00.0: BAR 1: failed to assign [mem size 0x400000000 64bit pref]
[    1.530711] pci 0000:06:00.0: BAR 3: no space for [mem size 0x02000000 64bit pref]
[    1.530713] pci 0000:06:00.0: BAR 3: trying firmware assignment [mem 0xdcfc00000000-0xdcfc01ffffff 64bit pref]
[    1.530715] pci 0000:06:00.0: BAR 3: [mem 0xdcfc00000000-0xdcfc01ffffff 64bit pref] conflicts with PCI mem [mem 0x00000000-0xffffffffff]
[    1.530716] pci 0000:06:00.0: BAR 3: failed to assign [mem size 0x02000000 64bit pref]
[    1.530732] pci 0000:07:00.0: BAR 1: no space for [mem size 0x400000000 64bit pref]
[    1.530734] pci 0000:07:00.0: BAR 1: trying firmware assignment [mem 0xf4a0000000000000-0xf4a00003ffffffff 64bit pref]
[    1.530736] pci 0000:07:00.0: BAR 1: [mem 0xf4a0000000000000-0xf4a00003ffffffff 64bit pref] conflicts with PCI mem [mem 0x00000000-0xffffffffff]
[    1.530737] pci 0000:07:00.0: BAR 1: failed to assign [mem size 0x400000000 64bit pref]
[    1.530740] pci 0000:07:00.0: BAR 3: no space for [mem size 0x02000000 64bit pref]
[    1.530742] pci 0000:07:00.0: BAR 3: trying firmware assignment [mem 0xccfc00000000-0xccfc01ffffff 64bit pref]
[    1.530744] pci 0000:07:00.0: BAR 3: [mem 0xccfc00000000-0xccfc01ffffff 64bit pref] conflicts with PCI mem [mem 0x00000000-0xffffffffff]
[    1.530745] pci 0000:07:00.0: BAR 3: failed to assign [mem size 0x02000000 64bit pref]

Please try kernel parameter
pci=realloc
Otherwise, check if it works with a different kernel.

See this for a bios setting probably needed:
https://www.dell.com/support/article/de/de/debsdt1/sln289189/precision-workstations-with-multiple-graphic-cards-or-pcie-cards-may-not-finish-post?lang=en

Hello generix,
Thanks for your suggestion. My computer is T7500 has different BIOS configure with the link you provided. I have three kernels on this computer:
kernel-devel-3.10.0-957.el7.x86_64
kernel-devel-3.10.0-957.1.3.el7.x86_64
kernel-devel-3.10.0-957.5.1.el7.x86_64
The last one is default running kernel. I use this kernel and *1.3.el7.x86_64 install the NVIDIA driver. Both ware Failed and shows “Unable to load the ‘nvidia-drm’ kernel module”.

Did you try the pci=realloc kernel parameter?
Otherwise, please check if your mainboard at least supports only one Tesla by removing the other one.

I do not do the pci=realloc kernel parameter. Could you give me advice how to do that?

I remove the video card that for the monitor, and install the Tesla K80. It still has same issue.

See this how to add a kernel parameter on Centos:
https://www.thegeekdiary.com/centos-rhel-7-how-to-modify-the-kernel-command-line/

I added the kernel parameter follow the link of https://www.thegeekdiary.com/centos-rhel-7-how-to-modify-the-kernel-command-line/. The drive installation still shows “Unable too load the ‘nvidia-drm’ kernel module”.

You’re running a very old bios, please update it to the latest version.

You are right, the BIOS is very old. But I installed Tesla drive on same computer months before without any issue. I will try update the BIOS. Thanks

Which OS or kernel have you been running when the Teslas used to work?

CentOS 7, running kernel is 3.10.0-862.14.4.el7.x86_64

You could try kernel-lt from elrepo, anyway you should report this to centos/redhat