Video driver goes south after a kernel update

I support several software developers - all of them have dual or three head workstations running either NVIDIA 315, NVIDIA 510, or NVIDIA 620 video cards. Ever since we upgraded to CentOS 7 from CentOS 6, we have always had to reinstall the NVIDIA video driver after any yum updates that included a new kernel. Re-installing the video driver just became a given, and an expected last step every time I did a yum update that included a new kernel.
Ever since the CentOS 3.10.0-1160.15.2.el7.x86_64 kernel came out, when I go to re-install the video driver, maybe 20% of the time, re-installing the video driver really goes south - it spawns a couple thousand “sh” processes, and it maxes out the RAM and CPU until the machine starts really crawling and eventually crashes.
To get the machine back to a working state I have to drop it down to run level three, re-enable the nouveau video driver, delete /etc/X11/xorg.conf, and reboot. I tried to uninstall the NVIDIA driver using the --uninstall switch, but any running of that driver kills the machine again.
After I can boot into X windows with the CentOS nouveau driver, I can then reinstall the NVIDIA video driver once again. Any customizations the user had in their video settings will be lost and they’ll have to start from scratch again.
These are the video drivers I’m using for each card:
315 card NVIDIA-Linux-x86_64-390.138.run
510 card NVIDIA-Linux-x86_64-410.73.run
620 card NVIDIA-Linux-x86_64-450.66.run

Any suggestions how to make this process less painful?
Thanks, PG

Please try switching to a driver repo like rpmfusion:
https://rpmfusion.org/Configuration
https://rpmfusion.org/Howto/NVIDIA

1 Like

It would be preferable to get the NVIDIA drivers to work correctly. These systems are on a network that isn’t connected to the internet and external repos make things a lot more complicated than it should be.
Do you have any tricks to get the NVIDIA drivers to work better?

Three possibilities:

  1. since you’re reinstalling the same driver, you can use the -K switch to just rebuild the kernel modules.
  2. download and install dkms manually, e.g.
    https://centos.pkgs.org/7/epel-x86_64/dkms-2.8.4-1.el7.noarch.rpm.html
    then set up the systemd service https://github.com/shawfdong/hyades/wiki/DKMS-on-CentOS-7 and run the installer with --dkms option so the driver gets autocompiled on kernel change
  3. build and use your own akmod packages https://blog.christophersmart.com/tag/akmod/
1 Like

I tried rpmfusion on a test system that was connected to the internet and could use the rpmfusion repos, and the installation worked fine. That particular system had kernels going back to 3.10.0-1127, plus the most recent 3.10.0-1160 kernels.
When I tried it on the production network, which is using the 3.10.0-1160 kernels (nothing prior to that), the install choked with an error about needing a kernel dependency of less than 3.10.0-1128.
Does that imply that rpmfusion video driver is meant for CentOS 7.8 or earlier? (the 3.10.0-1160 is CentOS 7.9) Is there a newer version coming out?

I’m testing the “Three possibilities” response - the 3rd is kind of over my head - I tried the other two but had sketchy results. I want to try a few more times before claiming success or failure. If I got this to be more reliable than what’s been happening, that in itself would be a success of sorts.

Looking at the rpmfusion repo contents, it looks like they indeed dropped support for centos 7 which is a pity.
Though if you’re getting “sketchy results” with only compiling the kernel modules, makes me wonder what’s going on in detail.

One common denominator I’m seeing between the updates that fail and the ones that work, the workstations that were updated from CentOS 7.8 to 7.9 seem likely to fail (‘rpmquery kernel’ shows both 3.10.0-1127 and 3.10.0-1160 kernels), and the ones where the initial install was 7.9 seem to work fine. I just tried a different tactic on an updated 7.8 - I removed the NVIDIA driver before doing yum update, and reinstalled the driver after finishing and that one went fine. Not sure if that’s a one off or a trend. Have to try some more and see how it goes.

Ok, here’s what should be the final update.
DKMS was not installed on any of these machines before. I tried installing DKMS on some test machines and then doing a yum update, but they still crapped out.
I tried running the NVIDIA driver with the --dkms switch before running yum update and that seemed to do the trick. The machines I did that on came up without the need to reinstall the video driver after a yum update.
So, it looks like the key is that the video driver has to be run with the --dkms switch prior to doing an update - then all is well.
Thanks generix for your help - you gave very useful suggestions, and it worked out in the end.

Ok, seems it wasn’t clear, the procedure is

  1. install dkms
  2. run the runfile installer with --dkms option to register with dkms
  3. on any subsequent kernel update, the driver get autocompiled (no further driver installation necessary)