Unable to load the 'nvidia-drm' kernel module on Ubuntu 16.04

I got the error “Unable to load the ‘nvidia-drm’ kernel module” when I install driver from ‘NVIDIA-Linux-x86_64-387.34.run’. (I have tried the version 375 and 384, but got the same error)
My machine is ThinkPad S5 (with GTK 1050Ti) Ubuntu 16.04, I have disabled the secure boot in the BIOS.
nvidia-bug-report.log.gz (56.7 KB)
nvidia-installer.log (1.91 KB)

When I execute NVIDIA-Linux-x86_64-387.34.run with --no-drm, the installation is completed, but nvidia-smi report message:
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

lspci | grep ‘VGA|3D’
00:02.0 VGA compatible controller: Intel Corporation Device 591b (rev 04)
02:00.0 3D controller: NVIDIA Corporation Device 1c8c (rev a1)

The driver is complaining about resource conflicts, upgrade your bios, it’s quite outdated.

I have updated the BIOS to the latest version, but it can’t solve my problem.

-> Driver file installation is complete.
-> Installing DKMS kernel module:
-> done.
ERROR: Unable to load the ‘nvidia-drm’ kernel module.
nvidia-bug-report.log.gz (92.4 KB)

Ok, resource conflicts still there, then you will have to upgrade your kernel to 4.13/4.14. If this is a fresh install, maybe try with ubuntu 17.10 first.

Thanks

I install ubuntu 17.10 as you suggest,there are some changes in ‘lspci’
lspci | grep “VGA|3D”
00:02.0 VGA compatible controller: Intel Corporation Device 591b (rev 04)
02:00.0 3D controller: NVIDIA Corporation GP107M [GeForce GTX 1050 Ti Mobile] (rev ff)

But the problem remains:
-> Driver file installation is complete.
-> Installing DKMS kernel module:
-> done.
ERROR: Unable to load the ‘nvidia-drm’ kernel module.
ERROR: Installation has failed. Please see the file ‘/var/log/nvidia-installer.log’ for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

What can id do next? Is this a bug of kernel or driver. This device is running properly under windows 10 (Nvidia driver version: 382.05)

nvidia-bug-report.log.gz (46.5 KB)
nvidia-installer.log (1.63 KB)

Ok. The kernel 4.13 gets along better with your hardware, but the resource conflict is still there.

[    0.597420] pci 0000:02:00.0: BAR 6: no space for [mem size 0x00080000 pref]
[    0.597423] pci 0000:02:00.0: BAR 6: failed to assign [mem size 0x00080000 pref]
...
[  183.714840] NVRM: This is a 64-bit BAR mapped above 4GB by the system
               NVRM: BIOS or the Linux kernel, but the PCI bridge
               NVRM: immediately upstream of this GPU does not define
               NVRM: a matching prefetchable memory window.

The nvidia card doesn’t get the memory the bios tells the kernel it wants, it gets remapped and then the drivers fails.
This is an incompatibility between kernel and bios. Should be reported to ubuntu/kernel bugzilla. Please try
pci=nocrs
as kernel parameter for a workaround.

I have added the pci=nocrs to boot parameter, but the the problem remains. Do you have any more suggestion for solving this problem?
Thank you

Please run sudo dmesg >dmesg.txt and attach that so I can take a look at it.
It’s only left to advance the kernel version to see if this is fixed in a newer version. Download kernel image and headers from Ubuntu and install them manually. You don’t need to install the nvidia drivers, switch to intel (sudo prime-select intel) prior to installing the kernels then simply check if
cat /proc/version
returns the correct kernel version and then
sudo dmesg |grep “BAR 6”
still contains the ‘failed’ message.
Test twice, with pci=nocrs set and unset.
Holding ‘shift’ on reboot gets you to the grub boot menu where you can load your old kernels in case of boot failure.
Start with kernel 4.14.12:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.14.12/linux-image-4.14.12-041412-generic_4.14.12-041412.201801051649_amd64.deb
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.14.12/linux-headers-4.14.12-041412-generic_4.14.12-041412.201801051649_amd64.deb

If that doesn’t work, advance to kernel 4.15rc7
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15-rc7/linux-image-4.15.0-041500rc7-generic_4.15.0-041500rc7.201801072330_amd64.deb
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15-rc7/linux-headers-4.15.0-041500rc7-generic_4.15.0-041500rc7.201801072330_amd64.deb
If that doesn’t work, there’s only left to issue a bug report at kernel bugzilla.

Sorry to say but after further research I can say that the BAR 6 issue is a red herring, unrelated to the Nvidia gpu not working. BAR 6 is just the option rom which is not needed anyway and failing to get mapped on most systems. I even think the NVRM message about the BAR is a red herring. So erase and rewind, the only error left then is:

nvidia 0000:02:00.0: Refused to change power state, currently in D3

Which points to an acpi problem. Please run
sudo acpidump > acpidump.txt and attach.
Then try using kernel parameter
acpi_osi=! acpi_osi=“Windows 2009”
Reboot and attach dmesg output.

Wow, this method really working! After add 'acpi_osi=! acpi_osi=“Windows 2009” ’ to kernel parameter, the nvidia driver loaded successfully. Why that works? is this approach safe or stable?

The attachement ‘acpidump.txt’ generated before adding kernel parameter, ‘dmesg.log’ after adding kernel parameter.
acpidump.txt (826 KB)
dmesg.log (67.4 KB)

Basically, the parameter is instructing the kernel to tell the bios it is Windows 7 instead of Windows 10. This changes settings and methods used for power management etc. It can have adverse effects like backlight control not working, touchpad not working, slightly higher power draw on battery etc. If not, it’s fine.
I’ve taken a look at the acpidump and it looks like a variant of this bug:
https://bugs.acpica.org/show_bug.cgi?id=1333#c32
Unfortunately the first fix (that was incorporated in kernel 4.13) wasn’t fixing it. I hope the next try will be in 4.17 or so. Until then, use the workaround.
Sidenote: your bluetooth is missing firmware to work: rtl_bt/rtl8822b_fw.bin and your webcam throws an error, don’t know if that is affecting it.

another observation from the acpidump: I think instead of acpi_osi=! acpi_osi=“Windows 2009” using just acpi_osi=Linux would also work.

You are right, ‘acpi_osi=Linux’ also work. I have reported this bug to ubuntu kernel.

Thank you very much!