NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running

If you installed the NVIDIA driver from .run files or bundled driver from CUDA Toolkit, the driver may be lost when you upgrade your Linux kernel. You should reinstall the NVIDIA driver. You can install the driver with dkms option on:

sudo sh NVIDIA-Linux-x86_64-470.xx.xx.run --dkms

–dkms
nvidia-installer can optionally register the NVIDIA kernel module sources, if installed, with DKMS, then build and install a kernel module using the DKMS-registered sources. This will allow the DKMS infrastructure to automatically build a new kernel module when changing kernels. During installation, if DKMS is detected, nvidia-installer will ask the user if they wish to register the module with DKMS; the default response is ‘no’. This option will bypass the detection of DKMS, and cause the installer to attempt a DKMS-based installation regardless of whether DKMS is present.


Simpler solution (Ubuntu only): install NVIDIA driver from PPA:

  1. Uninstall the NVIDIA drivers installed from .run files or bundled driver from CUDA Toolkit

  2. Add PPA graphics-drivers:

    sudo add-apt-repository ppa:graphics-drivers/ppa --yes
    sudo apt update
    
  3. Install NVIDIA driver from PPA:

    sudo apt install nvidia-driver-470  # or nvidia-driver-495
    
  4. (Optional) Mark the driver as hold to prevent auto-upgrading (since it is a server):

    dpkg-query -W --showformat='${Package} ${Status}\n' | grep -v deinstall | awk '{ print $1 }' | \
        grep -E 'nvidia.*-[0-9]+$' | \
        xargs -r -L 1 sudo apt-mark hold
    

The driver will be persisted when you change your Linux kernel.

5 Likes