cuda 9.2 install on RHEL 7.5 results in driver mismatch nvidia.ko(396.26) nvidia-modeset.ko(396.24)

This is baffling.

Very carefully I ensured that nouveau has been blacklisted out of the system and
entirely not loaded :

# lsmod | grep "nouveau"

That reports nothing and that is excellent.

Also the kernel options during boot in the grub2 config really ensures nouveau is
out of the picture :

modprobe.blacklist=nouveau rd.driver.blacklist=nouveau nouveau.modeset=0

So no issues there.

I was careful to use systemctl to switch to console mode and thus no X windows :

# systemcontrol set-default

Great … reboot … and nothing but a black screen.

Then carefully check for libvdpau and DKMS bits :

# rpm -qa | grep -i "libvdpau" 
# rpm -qa | grep -i "dkms" 

Looks good … follow the install instructions for cuda 9.2 and install :

# rpm --install /root/nvidia/cuda-repo-rhel7-9-2-local-9.2.88-1.x86_64.rpm 
# yum install cuda
Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager
Resolving Dependencies
--> Running transaction check
---> Package cuda.x86_64 0:9.2.88-1 will be installed
--> Processing Dependency: cuda-9-2 >= 9.2.88 for package: cuda-9.2.88-1.x86_64
--> Running transaction check
---> Package cuda-9-2.x86_64 0:9.2.88-1 will be installed
.  great stuff happens here .. no warnings .. no errors 

I see drivers were installed in that process :

# yumdb info cuda-drivers-396.26-1.x86_64
Loaded plugins: langpacks, product-id, subscription-manager
     checksum_data = 60f2ad911fdc80613ff413dc4d2e7561d1a03398
     checksum_type = sha
     command_line = install cuda
     from_repo = cuda-9-2-local
     from_repo_revision = 1525131274
     from_repo_timestamp = 1525131286
     installed_by = 1641
     reason = dep
     releasever = 7Workstation
     var_uuid = 9ad4d18b-f055-4d9b-a838-b981569e755b

However we have a mess at reboot :

systemctl reboot

then the console logs say :

[    3.356513] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  396.26  Mon Apr 30 18:01:39 PDT 2018 (using threaded interrupts)
[    3.396218] nvidia-modeset: Version mismatch: nvidia.ko(396.26) nvidia-modeset.ko(396.24)
[   16.559419] nvidia-modeset: Version mismatch: nvidia.ko(396.26) nvidia-modeset.ko(396.24)
[  189.498429] nvidia-modeset: Version mismatch: nvidia.ko(396.26) nvidia-modeset.ko(396.24)

Not much works :

# nvidia-smi
Failed to initialize NVML: Driver/library version mismatch

Sure enough the kernel module isn’t doing much for me :

# lsmod | grep "^nvidia"
nvidia              14019833  0

Baffled here given that the driver to download appears to be NVIDIA-Linux-x86_64-396.24 however
the cuda 9.2 kit conflicts with that in version and what I get is a mess.

Is there a trivial way out of this ?

Minor update but it seems that the driver package for 396.26 does exist :

Version: 396.26
Release Date: 2018.5.17
Operating System: Linux 64-bit
CUDA Toolkit: 9.2

Perhaps I need to install that separately from the cuda 9.2 kit.


ps: sorry about the line numbers but I don’t see a way to post fixed width
font for easy readability


I solved this problem in Ubuntu with :
sudo rm /lib/modules/4.15.0-36-generic/updates/dkms/nvidia-modeset.ko
Then reinstall the driver.