NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running

Hello, I am having a problem with nvidia-smi. I know this has been posted about several times before, but I can’t find an answer that solves my problem.

We had everything set up and running fine and then we rebooted the machine and now it can’t find the NVIDIA driver.

nvidia-smi returns “NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running”.

My GPUs, as told by lspci | grep -i nvidia are:
17:00.0 VGA compatible controller: NVIDIA Corporation Device 2204 (rev a1)
17:00.1 Audio device: NVIDIA Corporation Device 1aef (rev a1)
65:00.0 VGA compatible controller: NVIDIA Corporation Device 2204 (rev a1)
65:00.1 Audio device: NVIDIA Corporation Device 1aef (rev a1)

dpkg -l | grep nvidia gives:

ii  gpustat                                    0.6.0-1                               all          pretty nvidia device monitor
ii  libnvidia-cfg1-455:amd64                   455.45.01-0ubuntu1                    amd64        NVIDIA binary OpenGL/GLX configuration library
ii  libnvidia-common-450                       450.172.01-0ubuntu1                   all          Shared files used by the NVIDIA libraries
ii  libnvidia-common-455                       455.45.01-0ubuntu1                    all          Shared files used by the NVIDIA libraries
ii  libnvidia-common-460                       460.106.00-0ubuntu1                   all          Shared files used by the NVIDIA libraries
rc  libnvidia-compute-450:amd64                450.51.05-0ubuntu1                    amd64        NVIDIA libcompute package
ii  libnvidia-compute-455:amd64                455.45.01-0ubuntu1                    amd64        NVIDIA libcompute package
ii  libnvidia-container-tools                  1.8.1-1                               amd64        NVIDIA container runtime library (command-line tools)
ii  libnvidia-container1:amd64                 1.8.1-1                               amd64        NVIDIA container runtime library
ii  libnvidia-decode-455:amd64                 455.45.01-0ubuntu1                    amd64        NVIDIA Video Decoding runtime libraries
ii  libnvidia-encode-455:amd64                 455.45.01-0ubuntu1                    amd64        NVENC Video Encoding runtime library
ii  libnvidia-extra-455:amd64                  455.45.01-0ubuntu1                    amd64        Extra libraries for the NVIDIA driver
ii  libnvidia-fbc1-455:amd64                   455.45.01-0ubuntu1                    amd64        NVIDIA OpenGL-based Framebuffer Capture runtime library
ii  libnvidia-gl-455:amd64                     455.45.01-0ubuntu1                    amd64        NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii  libnvidia-ifr1-455:amd64                   455.45.01-0ubuntu1                    amd64        NVIDIA OpenGL-based Inband Frame Readback runtime library
ii  libnvidia-ml-dev                           10.1.243-3                            amd64        NVIDIA Management Library (NVML) development files
rc  nvidia-compute-utils-450                   450.51.05-0ubuntu1                    amd64        NVIDIA compute utilities
ii  nvidia-compute-utils-455                   455.45.01-0ubuntu1                    amd64        NVIDIA compute utilities
ii  nvidia-container-runtime                   3.8.1-1                               all          NVIDIA container runtime
ii  nvidia-container-toolkit                   1.8.1-1                               amd64        NVIDIA container runtime hook
ii  nvidia-cuda-dev                            10.1.243-3                            amd64        NVIDIA CUDA development files
ii  nvidia-cuda-doc                            10.1.243-3                            all          NVIDIA CUDA and OpenCL documentation
ii  nvidia-cuda-gdb                            10.1.243-3                            amd64        NVIDIA CUDA Debugger (GDB)
ii  nvidia-cuda-toolkit                        10.1.243-3                            amd64        NVIDIA CUDA development toolkit
rc  nvidia-dkms-450                            450.51.05-0ubuntu1                    amd64        NVIDIA DKMS package
ii  nvidia-dkms-455                            455.45.01-0ubuntu1                    amd64        NVIDIA DKMS package
ii  nvidia-driver-455                          455.45.01-0ubuntu1                    amd64        NVIDIA driver metapackage
rc  nvidia-kernel-common-450                   450.51.05-0ubuntu1                    amd64        Shared files used with the kernel module
ii  nvidia-kernel-common-455                   455.45.01-0ubuntu1                    amd64        Shared files used with the kernel module
ii  nvidia-kernel-source-455                   455.45.01-0ubuntu1                    amd64        NVIDIA kernel source package
ii  nvidia-modprobe                            510.47.03-0ubuntu1                    amd64        Load the NVIDIA kernel driver and create device files
ii  nvidia-opencl-dev:amd64                    10.1.243-3                            amd64        NVIDIA OpenCL development files
ii  nvidia-prime                               0.8.14                                all          Tools to enable NVIDIA's Prime
ii  nvidia-profiler                            10.1.243-3                            amd64        NVIDIA Profiler for CUDA and OpenCL
ii  nvidia-settings                            510.47.03-0ubuntu1                    amd64        Tool for configuring the NVIDIA graphics driver
ii  nvidia-utils-455                           455.45.01-0ubuntu1                    amd64        NVIDIA driver support binaries
ii  nvidia-visual-profiler                     10.1.243-3                            amd64        NVIDIA Visual Profiler for CUDA and OpenCL
ii  screen-resolution-extra                    0.18build1                            all          Extension for the nvidia-settings control panel
ii  xserver-xorg-video-nvidia-455              455.45.01-0ubuntu1                    amd64        NVIDIA binary Xorg driver

I can’t work out why it would break just on reboot, without updating/installing anything.
Thanks!

From the output, it says that driver version 455.45 is installed along with CUDA 10.1. However, there is an actively installed lib that should be used with higher driver version 460 (libnvidia-common-460). You may need to upgrade your driver or remove the library *)

*) Removing that single lib does not always work so a better chance is to upgrade the driver