This is baffling.
Very carefully I ensured that nouveau has been blacklisted out of the system and
entirely not loaded :
# lsmod | grep "nouveau"
That reports nothing and that is excellent.
Also the kernel options during boot in the grub2 config really ensures nouveau is
out of the picture :
modprobe.blacklist=nouveau rd.driver.blacklist=nouveau nouveau.modeset=0
So no issues there.
I was careful to use systemctl to switch to console mode and thus no X windows :
# systemcontrol set-default multi-user.target
Great … reboot … and nothing but a black screen.
Then carefully check for libvdpau and DKMS bits :
# rpm -qa | grep -i "libvdpau"
libvdpau-1.1.1-3.el7.x86_64
# rpm -qa | grep -i "dkms"
dkms-2.6.1-1.el7.noarch
Looks good … follow the install instructions for cuda 9.2 and install :
# rpm --install /root/nvidia/cuda-repo-rhel7-9-2-local-9.2.88-1.x86_64.rpm
# yum install cuda
Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager
Resolving Dependencies
--> Running transaction check
---> Package cuda.x86_64 0:9.2.88-1 will be installed
--> Processing Dependency: cuda-9-2 >= 9.2.88 for package: cuda-9.2.88-1.x86_64
--> Running transaction check
---> Package cuda-9-2.x86_64 0:9.2.88-1 will be installed
.
.
. great stuff happens here .. no warnings .. no errors
.
Complete!
#
I see drivers were installed in that process :
# yumdb info cuda-drivers-396.26-1.x86_64
Loaded plugins: langpacks, product-id, subscription-manager
cuda-drivers-396.26-1.x86_64
checksum_data = 60f2ad911fdc80613ff413dc4d2e7561d1a03398
checksum_type = sha
command_line = install cuda
from_repo = cuda-9-2-local
from_repo_revision = 1525131274
from_repo_timestamp = 1525131286
installed_by = 1641
reason = dep
releasever = 7Workstation
var_uuid = 9ad4d18b-f055-4d9b-a838-b981569e755b
#
However we have a mess at reboot :
systemctl reboot
then the console logs say :
[ 3.356513] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 396.26 Mon Apr 30 18:01:39 PDT 2018 (using threaded interrupts)
[ 3.396218] nvidia-modeset: Version mismatch: nvidia.ko(396.26) nvidia-modeset.ko(396.24)
[ 16.559419] nvidia-modeset: Version mismatch: nvidia.ko(396.26) nvidia-modeset.ko(396.24)
[ 189.498429] nvidia-modeset: Version mismatch: nvidia.ko(396.26) nvidia-modeset.ko(396.24)
Not much works :
# nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
Sure enough the kernel module isn’t doing much for me :
# lsmod | grep "^nvidia"
nvidia 14019833 0
Baffled here given that the driver to download appears to be NVIDIA-Linux-x86_64-396.24 however
the cuda 9.2 kit conflicts with that in version and what I get is a mess.
Is there a trivial way out of this ?
Minor update but it seems that the driver package for 396.26 does exist :
http://www.nvidia.com/download/driverResults.aspx/134377/en-us
Version: 396.26
Release Date: 2018.5.17
Operating System: Linux 64-bit
CUDA Toolkit: 9.2
Perhaps I need to install that separately from the cuda 9.2 kit.
Dennis
ps: sorry about the line numbers but I don’t see a way to post fixed width
font for easy readability