Problem installing NVIDIA-Linux-x86_64-331.89 in CentOS 7.6

Environments:

  1. OS info, CentOS 7.6 VM:

cat /etc/centos-release

CentOS Linux release 7.6.1810 (Core)

  1. kernel info

uname -r

3.10.0-957.el7.x86_64

  1. rpm install two RPMs:
    kernel-devel-3.10.0-957.el7.x86_64.rpm and kernel-headers-3.10.0-957.el7.x86_64.rpm

  2. GPU Card info

lspci -nn |grep NVIDIA

00:09.0 3D controller [0302]: NVIDIA Corporation GK110BGL [Tesla K40m] [10de:1023] (rev a1)

  1. Driver info
    TESLA DRIVER FOR LINUX X64
    Link: https://www.nvidia.com/Download/driverResults.aspx/146683/en-us

I had already removed the nouveau driver, the running the following command:

./NVIDIA-Linux-x86_64-331.89.run

Here is a part of installer log:
/tmp/selfgz13567/NVIDIA-Linux-x86_64-331.89/kernel/nv-drm.c:89:25: error: dereferencing pointer to incomplete type
int page_count = obj->size >> PAGE_SHIFT;
^
/tmp/selfgz13567/NVIDIA-Linux-x86_64-331.89/kernel/nv-drm.c: At top level:
/tmp/selfgz13567/NVIDIA-Linux-x86_64-331.89/kernel/nv-drm.c:107:13: error: ‘drm_gem_mmap’ undeclared here (not in a function)
.mmap = drm_gem_mmap,
^
/tmp/selfgz13567/NVIDIA-Linux-x86_64-331.89/kernel/nv-drm.c:116:5: warning: initialization from incompatible pointer type [enabled by default]
.unload = nv_drm_unload,
^
/tmp/selfgz13567/NVIDIA-Linux-x86_64-331.89/kernel/nv-drm.c:116:5: warning: (near initialization for ‘nv_drm_driver.unload’) [enabled by default]
/tmp/selfgz13567/NVIDIA-Linux-x86_64-331.89/kernel/nv-drm.c: In function ‘nv_drm_init’:
/tmp/selfgz13567/NVIDIA-Linux-x86_64-331.89/kernel/nv-drm.c:142:5: error: implicit declaration of function ‘drm_pci_init’ [-Werror=implicit-function-declaration]
ret = drm_pci_init(&nv_drm_driver, pci_driver);
^
/tmp/selfgz13567/NVIDIA-Linux-x86_64-331.89/kernel/nv-drm.c: In function ‘nv_drm_exit’:
/tmp/selfgz13567/NVIDIA-Linux-x86_64-331.89/kernel/nv-drm.c:152:5: error: implicit declaration of function ‘drm_pci_exit’ [-Werror=implicit-function-declaration]
drm_pci_exit(&nv_drm_driver, pci_driver);
^
/tmp/selfgz13567/NVIDIA-Linux-x86_64-331.89/kernel/nv-drm.c: In function ‘nv_alloc_os_descriptor_handle’:
/tmp/selfgz13567/NVIDIA-Linux-x86_64-331.89/kernel/nv-drm.c:204:5: error: implicit declaration of function ‘drm_gem_private_object_init’ [-Werror=implicit-function-declaration]
drm_gem_private_object_init(nvl->drm, &nv_obj->base, size);
^
/tmp/selfgz13567/NVIDIA-Linux-x86_64-331.89/kernel/nv-drm.c:206:5: error: implicit declaration of function ‘drm_gem_handle_create’ [-Werror=implicit-function-declaration]
ret = drm_gem_handle_create(file_priv, &nv_obj->base, handle);
^
/tmp/selfgz13567/NVIDIA-Linux-x86_64-331.89/kernel/nv-drm.c:213:5: error: implicit declaration of function ‘drm_gem_object_unreference_unlocked’ [-Werror=implicit-function-declaration]
drm_gem_object_unreference_unlocked(&nv_obj->base);
^
cc1: some warnings being treated as errors
make[3]: *** [/tmp/selfgz13567/NVIDIA-Linux-x86_64-331.89/kernel/nv-drm.o] Error 1
make[2]: *** [module/tmp/selfgz13567/NVIDIA-Linux-x86_64-331.89/kernel] Error 2
make[1]: *** [sub-make] Error 2
make[1]: Leaving directory `/usr/src/kernels/3.10.0-957.el7.x86_64’
NVIDIA: left KBUILD.
nvidia.ko failed to build!
make: *** [nvidia.ko] Error 1
-> Error.
ERROR: Unable to build the NVIDIA kernel module.

Driver v331 is about 5 years old, that doesn’t work on new kernels and is incompatible to cuda newer than v6 or so. The link you posted was for a v418 driver, why didn’t you install that?
Apart from that, I’d recommend using a repo driver like rpmfusion instead of the .run installer to keep your system sane.

Thanks, I had post the wrong link. I had found other newer drivers in https://docs.nvidia.com/datacenter/tesla/index.html, it works for me now, thanks!