I have servers running Ubuntu Linux 16.04.2 that have (4) GeForce GTX 1080 Ti GPUs installed in each server. I brought them up installing driver v378.13 via the official NVIDIA-Linux-x86_64-378.13.run script. They have been working fine, except today when one server experienced a strange hang (‘nvidia-smi’ as well as a Python script that pulls GPU watts via use of the ‘pynvml’ wrapper module both hung.) So I looked at the driver page, and saw there is a new driver available (v381.22). However, when I went to run the newer NVIDIA-Linux-x86_64-381.22.run file, it terminated with an error “Unable to load the kernel module ‘nvidia.ko’.” I tried many things to overcome this, but nothing has worked to resolve this issue. I will attach the ‘nvidia-installer.log’ and the ‘nvidia-bug-report.log.gz’ to this post, in hopes that someone more knowledgeable can assist me…
You should never install NVIDIA drivers directly unless you totally understand what they do to your system - and they do quite a lot.
They replace libglx and various OpenGL libraries, so whenever you update the X.org server package or Mesa* libraries you risk totally breaking your system unless you reinstall the said packages.
There are NVIDIA drivers already packaged for your distro. Please do use them.
These servers are not running a GUI/X interface - the GPUs are solely used for computation. The distro-provided drivers are too old to support these newer GPUs, hence installing the ones from NVIDIA (which should and mostly do work fine, excepting this bug…)
Hi WillDennis,
I see nouveau driver is loaded in you system, Please blacklist it :
You can add Nouveau Driver in /etc/modprobe.d/blacklist.conf file. OR create file like /etc/modprobe.d/disable-nouveau.conf with below entries
blacklist nouveau
options nouveau modeset=0
And replace kernel parameters : vga=0 rdblacklist=nouveau nouveau.modeset=0
Make sure no any application using nvidia module. For that you can uninstalled earlier driver with nvidia-uninstall , reboot system and then try fresh driver installation.
If you don’t need any GUI/X or opengl libs on you system. Then you can install driver with --no-opengl-files
I think you are running kernel 4.4.0-77-generic . Make sure you have same version of linux-headers-4.4.0-77-generic and linux-headers-4.4.0-77 installed packages.