A “yum update” to take the system from Centos 7.6 to Centos 7.7 fails. Ending in this:
→ Finished Dependency Resolution
Error: Could not find suitable Nvidia kernel module version for kernel kernel-3.10.0-862.el7.x86_64 and driver 3:nvidia-driver-latest-418.87.00-2.el7.x86_64
Currently installed versions of key packages:
cuda-10.1.243-1.x86_64
dkms-2.7.1-1.el7.noarch
kmod-nvidia-latest-dkms-418.87.00-2.el7.x86_64
nvidia-driver-latest-418.87.00-2.el7.x86_64
kernel-3.10.0-862.el7.x86_64
kernel-3.10.0-957.27.2.el7.x86_64
Currently running kernel 3.10.0-957.27.2.el7.x86_64
Centos 7.7 kernel will be kernel.x86_64 0:3.10.0-1062.1.1.el7
nvidia-smi
Mon Sep 23 16:02:58 2019
±----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE… Off | 00000000:00:08.0 Off | 0 |
| N/A 33C P0 36W / 250W | 0MiB / 32480MiB | 3% Default |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+
That is generally expected. The GPU driver is compiled against the specific kernel that you are running. If you update the kernel, by definition you break the GPU driver install.
Simplest approach at this point is to just reinstall the driver, or reinstall CUDA.
If the GPU driver is registered with dkms, dkms may fix this, but there are possible issues that can trip that up, such as an incompatibility between the driver and the kernel which prevents successful compilation. It might also just require a restart to trigger dkms to rebuild the GPU driver interface.