NVidia Driver installation fails on CentOS 7.3-1611 (GTX 1080)

We are trying to install the NVidia drivers for Linux CentOS 7.3-1611. The current loaded kernel is 3.10.0-862.9.1.el7.x86_64.

We try to install the drivers from the file “NVIDIA-Linux-x86_64-384.98.run” and we basically got our graphical.target broken now (the system works like a terminal for now only and GNOME won’t start up).

I was also unable to find the file “nvidia-bug-report.sh” as I was checking the sticky thread.

The installer failed with this error message:

ERROR: Failed to run `usr/sbin/dkms build -m nvidia -v 304.137 -k 3.10.0-862.9.1.el7.x86_64`:
kernel preparation unnecessary for this kernel. Skipping...

Building module:
cleaning build area...
make -j20 KERNELRELEASE=3.10.0-862.9.1.el7.x86_64 module SYSSRC=/lib/modules/3.10.0-862.9.1.el7.x86_64/build......(bad exit status: 2)
Error! Bad return status for module build on kernel: 3.10.0-862.9.1.el7.x86_64 (x86_64)
Consult /var/lib/dkms/nvidia/304.137/build/make.log for more information.

The contents of “/var/log/nvidia-installer.log” are:

nvidia-installer log file '/var/log/nvidia-installer.log'
creation time: Wed Jul 18 17:51:57 2018
installer version: 304.137

PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/home/hasan/arm-cortex_a9-linux-gnueabihf-linaro_4.9-master/bin:/root/bin

nvidia-installer command line:
    ./nvidia-installer

Unable to load: nvidia-installer ncurses v6 user interface

Using: nvidia-installer ncurses user interface
-> Tagging shared libraries with chcon -t textrel_shlib_t.
-> License accepted.
-> Installing NVIDIA driver version 304.137.
-> There appears to already be a driver installed on your system (version: 304.137).  As part of installing this driver (version: 304.137), the existing driver will be uninstalled.  Are you sure you want to continue? ('no' will abort installation) (Answer: Yes)
-> Would you like to register the kernel module sources with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later. (Answer: Yes)
-> Installing both new and classic TLS OpenGL libraries.
-> Installing both new and classic TLS 32bit OpenGL libraries.
-> Install NVIDIA's 32-bit compatibility OpenGL libraries? (Answer: Yes)
-> Uninstalling the previous installation with /usr/bin/nvidia-uninstall.
-> Searching for conflicting X files:
ERROR: Unable to open '/usr/lib64/xorg/modules/nvidia-396.37/libglx.so' for reading (No such file or directory)
ERROR: Unable to open '/usr/lib64/nvidia/xorg/libglx.so' for reading (No such file or directory)
ERROR: Unable to open '/usr/lib64/xorg/modules/nvidia-396.37/libglx.so' for reading (No such file or directory)
ERROR: Unable to open '/usr/lib64/nvidia/xorg/libglx.so' for reading (No such file or directory)
ERROR: Unable to open '/usr/lib64/xorg/modules/nvidia-396.37/libglx.so' for reading (No such file or directory)
ERROR: Unable to open '/usr/lib64/xorg/modules/nvidia-396.37/libglx.so' for reading (No such file or directory)
ERROR: Unable to open '/usr/lib64/nvidia/xorg/libglx.so' for reading (No such file or directory)
-> done.
-> Searching for conflicting OpenGL files:
-> done.
-> Installing 'NVIDIA Accelerated Graphics Driver for Linux-x86_64' (304.137):
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-glcore.so.304.137'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libGL.so.304.137'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/xorg/modules/extensions/libglx.so.304.137'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-tls.so.304.137'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/tls/libnvidia-tls.so.304.137'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/xorg/modules/drivers/nvidia_drv.so'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/xorg/modules/libnvidia-wfb.so.304.137'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libXvMCNVIDIA.so.304.137'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-ml.so.304.137'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-cfg.so.304.137'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libcuda.so.304.137'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-opencl.so.304.137'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libOpenCL.so.1.0.0'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libnvidia-compiler.so.304.137'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/vdpau/libvdpau_nvidia.so.304.137'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib64/libnvcuvid.so.304.137'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib/libcuda.so.304.137'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib/libnvidia-ml.so.304.137'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib/libOpenCL.so.1.0.0'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib/libnvidia-compiler.so.304.137'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib/libnvidia-opencl.so.304.137'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib/libGL.so.304.137'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib/libnvidia-glcore.so.304.137'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib/libnvidia-tls.so.304.137'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib/tls/libnvidia-tls.so.304.137'...
   executing: '/usr/bin/chcon -t textrel_shlib_t /usr/lib/vdpau/libvdpau_nvidia.so.304.137'...
   executing: '/usr/sbin/ldconfig'...
   /usr/sbin/ldconfig: /lib/libaperi_min.so is for unknown machine 40.

   /usr/sbin/ldconfig: /lib/liblowLevelApi.so.0 is not a symbolic link

   /usr/sbin/ldconfig: /lib/libuv.so.1 is not a symbolic link

   /usr/sbin/ldconfig: /lib/libtv.so.0 is not a symbolic link

   /usr/sbin/ldconfig: /lib/libtestHwRpcApi.so.0 is not a symbolic link

   /usr/sbin/ldconfig: /lib/libtestHwApi.so.0 is not a symbolic link

   /usr/sbin/ldconfig: /lib/liblinear.so.0 is not a symbolic link

   /usr/sbin/ldconfig: /lib/libhwRpcApi.so.0 is not a symbolic link

   /usr/sbin/ldconfig: /lib/libhwApi.so.0 is not a symbolic link

   executing: '/usr/sbin/depmod -aq'...
-> done.
-> Driver file installation is complete.
-> Installing DKMS kernel module:
ERROR: Failed to run `/usr/sbin/dkms build -m nvidia -v 304.137 -k 3.10.0-862.9.1.el7.x86_64`:
Kernel preparation unnecessary for this kernel.  Skipping...

Building module:
cleaning build area...
make -j20 KERNELRELEASE=3.10.0-862.9.1.el7.x86_64 module SYSSRC=/lib/modules/3.10.0-862.9.1.el7.x86_64/build......(bad exit status: 2)
Error! Bad return status for module build on kernel: 3.10.0-862.9.1.el7.x86_64 (x86_64)
Consult /var/lib/dkms/nvidia/304.137/build/make.log for more information.
-> error.
ERROR: Failed to install the kernel module through DKMS. No kernel module was installed; please try installing again without DKMS, or check the DKMS logs for more information.
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

And the contents of “/var/lib/dkms/nvidia/304.137/build/make.log” are (cropped, it’s too much to copy and paste here):

In file included from /var/lib/dkms/nvidia/304.137/build/nv-linux.h:82:0,
                 from /var/lib/dkms/nvidia/304.137/build/nv-chrdev.c:15:
include/linux/mm.h:1377:6: note: expected ‘struct page **’ but argument is of type ‘struct vm_area_struct **’
 long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
      ^
In file included from /var/lib/dkms/nvidia/304.137/build/nv-chrdev.c:15:0:
/var/lib/dkms/nvidia/304.137/build/nv-linux.h:1919:45: error: too few arguments to function ‘get_user_pages_remote’
                                             pages, vmas);
                                             ^
In file included from /var/lib/dkms/nvidia/304.137/build/nv-linux.h:82:0,
                 from /var/lib/dkms/nvidia/304.137/build/nv-chrdev.c:15:
include/linux/mm.h:1377:6: note: declared here
 long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
      ^
make[3]: *** [/var/lib/dkms/nvidia/304.137/build/nv-procfs.o] Error 1
make[3]: *** [/var/lib/dkms/nvidia/304.137/build/nv-chrdev.o] Error 1
make[3]: *** [/var/lib/dkms/nvidia/304.137/build/os-registry.o] Error 1
make[2]: *** [_module_/var/lib/dkms/nvidia/304.137/build] Error 2
NVIDIA: left KBUILD.
nvidia.ko failed to build!
make[1]: *** [module] Error 1
make: *** [module] Error 2

I understand, that the nvidia.ko module could not be built. What can be done to get the drivers working on Linux CentOS 7?

CentOS 7.3 is dead.

304.137 is unmaintained.

3.10.0-862.9.1.el7.x86_64 is the latest kernel for CentOS 7.5.

384.98 predates the 7.5 kernel.

What are you expecting from this hodgepodge?

I wouldn’t know. I am not that savvy with Linux yet. Thanks for telling me.

Furthermore, there are also traces in your logs of
nvidia-396.37
which presumably came with a cuda 9.2 install. So if you want to use that, a 384 driver doesn’t work. You shouldn’t use the .run installer unless you know exactly what you do, better use the rpm package. See what nvidia stuff got installed, how and why, clean it, upgrade your system and use the rpms.

Thank you for your response generix. I appreciate it.