I have a Redhat 7.4 server running Tesla P100 driver version 384. Recently the users updated Xorg and now I keep getting message.
This server has a video driver ABI version 24.o that this driver does not officially support. Please check http://www.nvidia.com for driver updates or downgrade to an X server with a supported driver ABI.
I downloaded the new driver V. 418 and updated but running nvidia-smi still shows me old driver. I want to know why is the new driver not being recognized and how can I bring the server back up in GUI mod?
The old driver is probably just left in the initrd, rebuild it as root with
dracut -f
Using the .run installer over a probably packaged previous install is not a good thing to do, probably needing a reinstall of the driver on kernel updates. You should rather uninstall the .run installer using the --uninstall option and switch to a repo driver like rpmfusion, which requires a fully updated RHEL (current 7.7?) or at least re-run the .run installer with the --dkms option.
If you have further problems, please run nvidia-bug-report.sh as root and attach the resulting .gz file to your post. Hovering the mouse over an existing post of yours will reveal a paperclip icon.
[url]https://devtalk.nvidia.com/default/topic/1043347/announcements/attaching-files-to-forum-topics-posts/[/url]
Thanks for your reply. I ran dracut -f and still the same. I have attached the bug report for review. How can I obtain the .run installer or where can I find it? The 418 driver is an rpm and when I ran rpm -ivh I got the message the driver is installed. Please let me know what you find out from the bug report.
It looks a bit messy, seems different 384.x drivers were installed around 12/2017 using different methods (.run/rpm) but there’s no trace of any 418 driver being installed. Stick to the .rpm for now.
Please post the output of
dkms status
Thanks generix. Here is the output from dkms status.
nvidia, 384.81, 3.10.0-693.11.1.el7.x86_64, x86_64: installed
nvidia, 384.81, 3.10.0-693.11.6.el7.x86_64, x86_64: installed (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!)
nvidia, 384.81, 3.10.0-693.5.2.el7.x86_64, x86_64: installed (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!)
How can I find the Diff between built and installed module?
The difference doesn’t matter, it’s just that the same driver (384.81) is installed three times, ignore.
More noteworthy is that there’s no trace of the 418 driver. Please reinstall it and post the complete output.
Here is the output of reinstall.
[rajaya@SRHS /]$ sudo rpm -ivh nvidia-diag-driver-local-repo-rhel7-418.67-1.0-1.x86_64.rpm
[sudo] password for rajaya:
Preparing… ################################# [100%]
package nvidia-diag-driver-local-repo-rhel7-418.67-1.0-1.x86_64 is already installed
→ Running transaction check
—> Package kmod-nvidia-3.10.0-957.el7.x86_64.x86_64 3:430.26-1.el7 will be installed
→ Processing Dependency: nvidia-kmod-common >= 3:430.26 for package: 3:kmod-nvidia-3.10.0-957.el7.x86_64-430.26-1.el7.x86_64
Package xorg-x11-drv-nvidia is obsoleted by nvidia-driver, but obsoleting package does not provide for requirements
—> Package libselinux-python.x86_64 0:2.5-11.el7 will be updated
—> Package libselinux-python.x86_64 0:2.5-14.1.el7 will be an update
—> Package nvidia-driver-cuda.x86_64 3:418.67-4.el7 will be installed
→ Processing Dependency: nvidia-persistenced = 3:418.67 for package: 3:nvidia-driver-cuda-418.67-4.el7.x86_64
→ Processing Conflict: 3:dkms-nvidia-418.67-1.el7.x86_64 conflicts nvidia-kmod
→ Processing Conflict: 3:dkms-nvidia-418.67-1.el7.x86_64 conflicts nvidia-kmod
→ Finished Dependency Resolution
→ Running transaction check
—> Package kernel.x86_64 0:3.10.0-693.5.2.el7 will be erased
—> Package kmod-nvidia-3.10.0-957.el7.x86_64.x86_64 3:430.26-1.el7 will be installed
→ Processing Dependency: nvidia-kmod-common >= 3:430.26 for package: 3:kmod-nvidia-3.10.0-957.el7.x86_64-430.26-1.el7.x86_64
Package xorg-x11-drv-nvidia is obsoleted by nvidia-driver, but obsoleting package does not provide for requirements
—> Package nvidia-driver-cuda.x86_64 3:418.67-4.el7 will be installed
→ Processing Dependency: nvidia-persistenced = 3:418.67 for package: 3:nvidia-driver-cuda-418.67-4.el7.x86_64
→ Processing Conflict: 3:dkms-nvidia-418.67-1.el7.x86_64 conflicts nvidia-kmod
→ Processing Conflict: 3:dkms-nvidia-418.67-1.el7.x86_64 conflicts nvidia-kmod
→ Finished Dependency Resolution
Error: Package: 3:nvidia-driver-cuda-418.67-4.el7.x86_64 (cuda-10-1-local-10.1.168-418.67)
Requires: nvidia-persistenced = 3:418.67
Available: 3:nvidia-persistenced-418.67-1.el7.x86_64 (cuda-10-1-local-10.1.168-418.67)
nvidia-persistenced = 3:418.67-1.el7
Installing: 3:nvidia-persistenced-430.26-1.el7.x86_64 (rpmfusion-nonfree-updates)
nvidia-persistenced = 3:430.26-1.el7
Error: Package: 3:kmod-nvidia-3.10.0-957.el7.x86_64-430.26-1.el7.x86_64 (rpmfusion-nonfree-updates)
Requires: nvidia-kmod-common >= 3:430.26
Installing: 3:nvidia-driver-418.67-4.el7.x86_64 (cuda-10-1-local-10.1.168-418.67)
nvidia-kmod-common = 3:418.67
Available: 3:xorg-x11-drv-nvidia-430.26-1.el7.x86_64 (rpmfusion-nonfree-updates)
nvidia-kmod-common = 3:430.26
Error: dkms-nvidia conflicts with 3:kmod-nvidia-430.26-1.el7.x86_64
Error: dkms-nvidia conflicts with 3:kmod-nvidia-3.10.0-957.el7.x86_64-430.26-1.el7.x86_64
You could try using --skip-broken to work around the problem
You could try running: rpm -Va --nofiles --nodigest
Ok, this looks like there was already the rpmfusion repo added at some time but the system never got updated. Though I wonder how somebody managed to upgrade the Xserver then.
Please post the output of
yum repolist enabled
Here is the output of ‘yum repolist enabled’
(base) [root@SRHS /]# yum repolist enabled
Loaded plugins: langpacks, product-id, rhnplugin, search-disabled-repos, subscription-manager
This system is receiving updates from RHN Classic or Red Hat Satellite.
repo id repo name status
cuda-10-1-local-10.1.168-418.67 cuda-10-1-local-10.1.168-418.67 79
epel/x86_64 Extra Packages for Enterprise Linux 7 - x86_64 13,343
nux-dextop/x86_64 Nux.Ro RPMs for general desktop use 2,710
nvidia-diag-driver-local-418.67 nvidia-diag-driver-local-418.67 26
rhel-x86_64-server-7 Red Hat Enterprise Linux Server (v. 7 for 64-bit x86_64) 26,158
rpmfusion-free-updates/x86_64 RPM Fusion for EL 7 - Free - Updates 247
rpmfusion-nonfree-updates/x86_64 RPM Fusion for EL 7 - Nonfree - Updates 75
repolist: 42,638
(base) [root@SRHS /]#
Thanks. I tried the command and here is the output.
—> Package nvidia-modprobe.x86_64 3:430.40-1.el7 will be installed
—> Package nvidia-persistenced.x86_64 3:430.40-1.el7 will be installed
—> Package nvidia-settings.x86_64 3:430.40-1.el7 will be installed
—> Package nvidia-xconfig.x86_64 3:430.40-1.el7 will be installed
—> Package xorg-x11-drv-nvidia-cuda.x86_64 3:430.40-1.el7 will be installed
→ Processing Dependency: opencl-filesystem for package: 3:xorg-x11-drv-nvidia-cuda-430.40-1.el7.x86_64
→ Processing Dependency: ocl-icd(x86-64) for package: 3:xorg-x11-drv-nvidia-cuda-430.40-1.el7.x86_64
—> Package xorg-x11-drv-nvidia-cuda-libs.x86_64 3:430.40-1.el7 will be installed
—> Package xorg-x11-drv-nvidia-kmodsrc.x86_64 3:430.40-1.el7 will be installed
—> Package xorg-x11-drv-nvidia-libs.x86_64 3:430.40-1.el7 will be installed
→ Processing Dependency: egl-wayland >= 1.0.0 for package: 3:xorg-x11-drv-nvidia-libs-430.40-1.el7.x86_64
→ Finished Dependency Resolution
Error: Package: 3:xorg-x11-drv-nvidia-libs-430.40-1.el7.x86_64 (rpmfusion-nonfree-updates)
Requires: egl-wayland >= 1.0.0
Error: Package: 3:xorg-x11-drv-nvidia-cuda-430.40-1.el7.x86_64 (rpmfusion-nonfree-updates)
Requires: ocl-icd(x86-64)
Error: Package: 3:xorg-x11-drv-nvidia-cuda-430.40-1.el7.x86_64 (rpmfusion-nonfree-updates)
Requires: opencl-filesystem
You could try using --skip-broken to work around the problem
You could try running: rpm -Va --nofiles --nodigest
Yes this seems like someone tried to install cuda 10 which is a metapackage consisting of cuda-toolkit and the nvidia driver. I guess you should remove all cuda/nvidia packages/repos using
[url]Installation Guide Linux :: CUDA Toolkit Documentation
Afterwards, reinstall the driver from the rpmfusion repo, then download the cuda 10.1 rpm and add the repo to your system (first three instructions steps on download page) and then don’t install the nvidia driver and “cuda” but only “cuda-toolkit-10-1”.
This should give you a clean, updatable sytem.
I uninstalled all the nvidia and CUDA drivers by following the instructions in the document. I looked up rpmfusion repo at this link Howto/NVIDIA - RPM Fusion
Are these the command I need to run to install NVidia drivers and then the CUDA 10.1 toolkit? Just want to verify before I proceed.
On another note… After uninstalling the drivers, yum update was able to download all the packages fine. Should I just update the server first before installing the nvidia driver and cuda 10.1 toolkit?