Driver 390.42 installation failed - Unable to load the 'nvidia-drm'

Hi,

Profile

Graphic card - Nvidia GTX 1050 ti
OS - Ubuntu 16.04 LTS
Kernel - Linux 4.4.0-119-generic
Processor n’ Memory - Intel i7-6700 CPU 16GB RAM
BIOS details - Legacy with disabled secure boot (not UEFI)

Current status:

lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation Device 1c82 (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 0fb9 (rev a1)

Background/Issue

I had the GPU perfectly installed with driver 370.26 (using .run installation). I had it along with CUDA and cuDNN. Last week, after some regular system updates I got into a login loop.
It was not my first login loop, and after some struggle and using https://gist.github.com/wangruohui/df039f0dc434d6486f5d4d098aa52d07#file-install-nvidia-driver-and-cuda-md, I’ve managed to login. However the installation returned Unable to load the ‘nvidia-drm’ kernel module

Installation steps:

sudo apt-get purge nvidia*
sudo apt-get autoremove 
sudo dpkg -P cuda-repo-ubuntu1604
sudo apt-get install build-essential gcc-multilib dkms
sudo service lightdm stop
cd ~
chmod +x NVIDIA-Linux-x86_64-390.42.run
sudo ./NVIDIA-Linux-x86_64-390.42.run --dkms -s --no-opengl-files

** I’ve also blacklisted nouveau

Output:
-> Installing ‘NVIDIA Accelerated Graphics Driver for Linux-x86_64’ (390.42):
executing: ‘/sbin/ldconfig’…
-> done.
-> Driver file installation is complete.
-> Installing DKMS kernel module:
-> done.
ERROR: Unable to load the ‘nvidia-drm’ kernel module.
ERROR: Installation has failed. Please see the file ‘/var/log/nvidia-installer.log’ for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

Previously, I’ve tried to use PPA installation using

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update

and entered a login loop

**nvidia-bug-report:
When running in root mode

/usr/bin/nvidia-bug-report.sh --safe-mode --extra-system-data

I recieved
Running nvidia-bug-report.sh…ls: cannot access ‘/proc/driver/nvidia/./gpus/’: No such file or directory
complete.

Please help me getting back on the nvidia horse ;)
Thanks

nvidia-bug-report.log.gz (85.7 KB)
nvidia-installer.log (1.83 KB)
acpidump.txt (857 KB)

Your setup is really confusing. It seems the monitor is connected to the intel gpu which doesn’t work because you’re using kernel parameter ‘nomodeset’. Using the installer option --no-opengl-files tells that you’re indeed using graphics on the iGPU and the nvidia for CUDA only. The kernel driver is installed, but it seems to be blacklisted, do a
grep -i nvidia /etc/modprobe.d/*
to look for unusual entries.

You are right, the monitor is indeed connected to the iGPU. I thought that this will prevent the login loops during installation. Do you advise something else? Should I drop the –no-opengl-files and change the installation commands?

the command returns:

grep -i nvidia /etc/modprobe.d/*
/etc/modprobe.d/blacklist-framebuffer.conf:blacklist nvidiafb
/etc/modprobe.d/nvidia-installer-disable-nouveau.conf:# generated by nvidia-installer
/etc/modprobe.d/nvidia-installer-disable-nouveau.conf:blacklist nvidia-173
/etc/modprobe.d/nvidia-installer-disable-nouveau.conf:blacklist nvidia-96
/etc/modprobe.d/nvidia-installer-disable-nouveau.conf:blacklist nvidia-current
/etc/modprobe.d/nvidia-installer-disable-nouveau.conf:blacklist nvidia-173-updates
/etc/modprobe.d/nvidia-installer-disable-nouveau.conf:blacklist nvidia-96-updates
/etc/modprobe.d/nvidia-installer-disable-nouveau.conf:alias nvidia nvidia_current_updates

**update
I’ve tried to re-install the driver with the monitor connected to the nvidia-card. Got into login-loop with the same error

ERROR: Unable to load the 'nvidia-drm' kernel module.
ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

Eventually, had to

sudo apt-get remove nvidia-*
sudo apt-get autoremove
sudo nvidia-uninstall

and got back to the iGPU without being able to use the monitor’s speaker.

Please remove the file
/etc/modprobe.d/nvidia-installer-disable-nouveau.conf
and reinstall using the ubuntu package.
If the module still doesn’t load (use dmesg |grep -i nvidia to check) use
sudo modprobe -v nvidia to load the module and post any errors.

Sorry, but what do you mean in ubuntu package - ppa:graphics-drivers repo? nvidia-current? .deb? .run?
Also, does it matter if I use the intel-GPU or the nvidia-GPU with the monitor while doing the install?

It shouldn’t matter which gpu you’re using to install. Some kernel module should load anyway.

removed the nvidia-installer-disable-nouveau.conf file (BTW there is still a blacklist-nouveau.conf that I’ve created according to aforementioned installation instructions)
tried -

sudo service lightdm stop
chmod +x NVIDIA-Linux-x86_64-390.42.run
sudo ./NVIDIA-Linux-x86_64-390.42.run --dkms

and got the ERROR: Unable to load the ‘nvidia-drm’ kernel module. with a login loop…
Next, I’ve uninstalled and followed:

dmesg |grep -i nvidia
[    3.588359] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input14
[    3.588495] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input15
[    3.588545] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input16

And trying to load the module failed

sudo modprobe -v nvidia
modprobe: FATAL: Module nvidia not found in directory /lib/modules/4.4.0-119-generic

Should I maybe use UEFI and not LEGACY mode in th BIOS configurations?
**Also in dmesg without the filtering, I got the following errors

[    3.016684] snd_hda_intel 0000:00:1f.3: failed to add i915_bpo component master (-19)
[    3.307765] vboxdrv: version magic '4.4.0-119-generic SMP mod_unload modversions ' should be '4.4.0-119-generic SMP mod_unload modversions retpoline '
[   12.706358] vboxdrv: version magic '4.4.0-119-generic SMP mod_unload modversions ' should be '4.4.0-119-generic SMP mod_unload modversions retpoline '

nvidia-installer.log (2.66 KB)

Your gcc is too old, all new ubuntu kernels are compiled with retpoline mitigation, so you the gcc needs retpoline support, too to compile a loadable module. Upgrade your system/HWE.

Thanks!!!

I followed the rollback commands in https://devtalk.nvidia.com/default/topic/1030665/linux/387-26-on-ubuntu-16-04-4-lts-4-4-0-116-kernel-unable-to-load-the-nvidia-drm-kernel-module/ and rolled back to kernel 4.4.0-57 generic, and installation now works great.

Yet, I am not sure if the rollback is the right thing to do (do you recommend and update instead?)

Changing to an old kernel is just a temporary workaround, you should update your system. Something is wrong there, your system reports version 16.04.4 which is current but your kernel, xorg and gcc are the ones from 16.04.0, the initial release. So you might have some software that blocks proper updates. Run
sudo apt update
sudo apt upgrade
and carefully read what’s going on like deferred packages.

Hello,

I have similar issue:

[ 1521.439154] vboxdrv: version magic '4.4.0-119-generic SMP mod_unload modversions ’ should be '4.4.0-119-generic SMP mod_unload modversions retpoline ’
root@stefanozinna-RC530-RC730:/home/stefanozinna# sudo modprobe -v nvidia
modprobe: FATAL: Module nvidia not found in directory /lib/modules/4.4.0-119-generic

When i run:
sudo apt update
sudo apt upgrade

Everything seems fine.

root@stefanozinna-RC530-RC730:/home/stefanozinna# sudo apt update
Hit:1 http://ppa.launchpad.net/makehuman-official/makehuman-11x/ubuntu xenial InRelease
Hit:2 http://archive.ubuntu.com/ubuntu xenial InRelease
Hit:3 http://ppa.launchpad.net/rebuntu16/avidemux+unofficial/ubuntu xenial InRelease
Hit:4 http://ppa.launchpad.net/ubuntu-toolchain-r/test/ubuntu xenial InRelease
Get:5 http://archive.ubuntu.com/ubuntu xenial-updates InRelease [102 kB]
Get:6 http://cz.archive.ubuntu.com/ubuntu trusty-updates InRelease [65.9 kB]
Ign:7 http://archive.canonical.com/ubuntu precise InRelease
Get:8 http://archive.ubuntu.com/ubuntu xenial-backports InRelease [102 kB]
Hit:9 http://archive.canonical.com/ubuntu xenial InRelease
Get:10 http://archive.ubuntu.com/ubuntu xenial-security InRelease [102 kB]
Ign:11 http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64 InRelease
Hit:12 http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64 Release
Hit:13 http://archive.canonical.com/ubuntu precise Release
Get:14 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages [755 kB]
Hit:15 https://repo.skype.com/deb stable InRelease
Hit:16 https://download.virtualbox.org/virtualbox/debian xenial InRelease
Get:18 http://archive.ubuntu.com/ubuntu xenial-updates/main i386 Packages [699 kB]
Get:19 http://cz.archive.ubuntu.com/ubuntu trusty-updates/main amd64 Packages [1,069 kB]
Get:21 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 DEP-11 Metadata [316 kB]
Get:22 http://archive.ubuntu.com/ubuntu xenial-updates/main DEP-11 64x64 Icons [232 kB]
Get:23 http://archive.ubuntu.com/ubuntu xenial-updates/universe amd64 Packages [620 kB]
Get:24 http://archive.ubuntu.com/ubuntu xenial-updates/universe i386 Packages [573 kB]
Get:25 http://archive.ubuntu.com/ubuntu xenial-updates/universe amd64 DEP-11 Metadata [242 kB]
Get:26 http://archive.ubuntu.com/ubuntu xenial-updates/universe DEP-11 64x64 Icons [328 kB]
Get:27 http://archive.ubuntu.com/ubuntu xenial-updates/multiverse amd64 DEP-11 Metadata [5,972 B]
Get:28 http://archive.ubuntu.com/ubuntu xenial-backports/main amd64 DEP-11 Metadata [3,324 B]
Get:29 http://archive.ubuntu.com/ubuntu xenial-backports/universe amd64 DEP-11 Metadata [5,088 B]
Get:30 http://archive.ubuntu.com/ubuntu xenial-security/main amd64 DEP-11 Metadata [67.6 kB]
Get:31 http://archive.ubuntu.com/ubuntu xenial-security/main DEP-11 64x64 Icons [81.7 kB]
Get:32 http://archive.ubuntu.com/ubuntu xenial-security/universe amd64 DEP-11 Metadata [107 kB]
Get:33 http://archive.ubuntu.com/ubuntu xenial-security/universe DEP-11 64x64 Icons [142 kB]
Get:34 http://cz.archive.ubuntu.com/ubuntu trusty-updates/main i386 Packages [1,009 kB]
Ign:35 http://dl.google.com/linux/talkplugin/deb stable InRelease
Hit:36 http://dl.google.com/linux/talkplugin/deb stable Release
Fetched 6,628 kB in 5s (1,256 kB/s)
Reading package lists… Done
Building dependency tree
Reading state information… Done
All packages are up to date.
W: http://archive.canonical.com/ubuntu/dists/precise/Release.gpg: Signature by key 630239CC130E1A7FD81A27B140976EAF437D05B5 uses weak digest algorithm (SHA1)

Any suggestions?

stezi, please run nvidia-bug-report.sh as root and attach the resulting .gz file to your post. Hovering the mouse over an existing post will reveal a paperclip icon.

Hi,

I uploaded the resulting .gz to this message.
nvidia-bug-report.log.gz (64.3 KB)

stezi, your install is outdated, too. Upgrade to the latest HWE stack:
https://wiki.ubuntu.com/Kernel/LTSEnablementStack
upgrade gcc,
gcc --version
should return version
Ubuntu 5.4.0-6ubuntu1~16.04.9
https://packages.ubuntu.com/de/xenial/gcc-5

hi,

Thanks. It worked.

Hi,

I’m facing the same issue here and just can’t find a solution… been trying for about a week or so, and nothing seems to work. I’ve try following this bug report (https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers/+bug/1611635) but with no success.

I tried to attach nvidia-bug-report.sh resulting here but it seems that it no longer accept a zip file so here is the link (https://drive.google.com/file/d/1w-PkEzobL4CFlHCfCbb3e9birYMuQv3d/view?usp=sharing). I really hope you guys could give me a hand. Can’t make nvidia-persistenced service to run =/

You have mixed drivers:

[    9.112473] NVRM: API mismatch: the client has the version 440.82, but
               NVRM: this kernel module has the version 390.132.  Please
               NVRM: make sure that this kernel module and all NVIDIA driver
               NVRM: components have the same version.

Please remove all driver packages
sudo apt remove nvidia*
then reinstall the 440 driver

Hi, I’m also getting this error but haven’t been able to figure out the issue in my case. I’m using CentOs 7 and followed these instructions to try to install the correct driver for my 1050 GTX. I’m uploading the nvidia-bug-report.sh output nvidia-bug-report.log (406.6 KB) .

@gowbrian, please disable secure boot in bios.

whats happened is that nvidia-detect suggest legacy driver series 390 and when I install nvidia-driver ir automatic install module version 390.132 . After removing all nvidia components and blacklist nouveau ( echo “blacklist nouveau” >> /etc/modprobe.d/blacklist.config ) I was able to get the best resolution of my gpu so far… but in the end i’m not using any nvidia driver. Should I install manually the 440.82 driver? Would I get any benefit from that?