"Nvidia-smi has failed " on Ubuntu 20.04

Hi everyone,

I was recently asked to install the latest Nvidia driver (510) to use nvidia-smi command line on Ubuntu 20.04.

I encountered this error output when launching nvidia-smi :

$ nvidia-smi

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

$ sudo modprobe nvidia

modprobe: ERROR: could not insert 'nvidia': No such device

$ grep nvidia /etc/modprobe.d/* /lib/modprobe.d/*
/etc/modprobe.d/blacklist-framebuffer.conf:blacklist nvidiafb
/etc/modprobe.d/blacklist-nouveau-nvidiafb.conf:blacklist nvidiafb
/lib/modprobe.d/nvidia-kms.conf:# This file was generated by nvidia-prime
/lib/modprobe.d/nvidia-kms.conf:options nvidia-drm modeset=1

$ dkms status
nvidia, 510.54, 5.14.0-1029-oem, x86_64: installed

As suggested on forums, I have tried in vain to :

  • reinstall other drivers versions
  • remove some blacklists files from /etc/modprobe.d
  • update initramfs
  • prime-select Nvidia
  • set gcc version to the older one (11 → 9.4)
  • reboot after each of those previous steps.

Could anyone help me please? Here is the bug report :

nvidia-bug-report.log.gz (16.4 MB)

In the bug report, it seems that this error is printed over and over :

Mar 23 06:23:06 apprenti kernel: [73379.624826] nvidia-nvlink: Unregistered the Nvlink Core, major device number 508
Mar 23 06:23:07 apprenti kernel: [73380.109484] nvidia-nvlink: Nvlink Core is being initialized, major device number 508
Mar 23 06:23:07 apprenti kernel: [73380.110910] NVRM: The NVIDIA GPU 0000:01:00.0
Mar 23 06:23:07 apprenti kernel: [73380.110910] NVRM: (PCI ID: 10de:2204) installed in this system has
Mar 23 06:23:07 apprenti kernel: [73380.110910] NVRM: fallen off the bus and is not responding to commands.
Mar 23 06:23:07 apprenti kernel: [73380.111022] NVRM: The NVIDIA probe routine failed for 1 device(s).
Mar 23 06:23:07 apprenti kernel: [73380.111024] NVRM: None of the NVIDIA devices were initialized.

Due to the log flood, the cause is invisible. Please check if using a 5.10 kernel helps:
https://forums.developer.nvidia.com/t/ubuntu-20-04-4-hp-zbook-studio-g8-mobile-workstation-driver-fails/208836/3?u=generix

Hi,

It finally works! I have replaced my 5.14.0 Linux kernel with a 5.10.0 one.
I have updated the Nvidia drivers with no error this time.

The link you gave was very clear. Thank you so much!