"Nvidia-smi has failed ..." after new installation of nvidia drivers on Ubuntu 18.04

Hi,

I had recently updated my Nvidia drivers (version 510) on my Linux 18.04 laptop. I encountered this error output when launching nvidia-smi :

$nvidia-smi

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

As suggested on forums, I have tried in vain to :

  • check if Secure Boot was disabled
  • reinstall other drivers versions
  • remove some blacklists files from /etc/modprobe.d
  • update initramfs
  • prime-select Nvidia
  • reboot after each of those previous steps.

Can anyone help me to fix this error please ?

nvidia-bug-report.log.gz (243.6 KB)

Please try to load it manually
sudo modprobe nvidia
and post the errors that are given.

Thank for your quick response ! Here is the output :

modprobe: ERROR: could not insert ‘nvidia’: Invalid argument

You switched your system compiler to gcc 8.4 but should be gcc 7.5. Please switch back to the correct gcc version and purge/reinstall the driver afterwards.

1 Like

You are right. Now, I have changed my gcc version to the older one :

$ gcc --version
gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

And I have done the following steps :

sudo apt-get purge nvidia-*
sudo apt-get update
sudo apt-get autoremov
sudo update-initramfs -u
sudo reboot
sudo apt install nvidia-driver-510
sudo reboot

Unfortunately, I still get the same error with nvidia-smi. Here is the new bug repport.

nvidia-bug-report.log.gz (245.4 KB)

According to the logs, it was loading once after being compiled

Feb 28 16:52:21 laptop-monnier kernel: [ 589.888979] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
Feb 28 16:52:21 laptop-monnier kernel: [ 589.888980] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 1
Feb 28 16:52:21 laptop-monnier kernel: [ 589.890758] [drm] [nvidia-drm] [GPU ID 0x00000100] Unloading driver

Do you still get the same error when loading it manually?

Excuse me, by “loading it manually,” do you mean installing the driver with a .run file?

Also, please post thee output of

grep nvidia /etc/modprobe.d/* /lib/modprobe.d/*

Loading manually
sudo modprobe nvidia

Thank you, I still have the same output :

$ sudo modprobe nvidia

modprobe: ERROR: could not insert ‘nvidia’: Invalid argument

$ grep nvidia /etc/modprobe.d/* /lib/modprobe.d/*

/etc/modprobe.d/nsight-privilege.conf:options nvidia "NVreg_RestrictProfilingToAdminUsers=0
/lib/modprobe.d/nvidia-kms.conf:# This file was generated by nvidia-prime
/lib/modprobe.d/nvidia-kms.conf:options nvidia-drm modeset=1

Please blacklist nvidiafb

1 Like

Furthermore, the file /etc/modprobe.d/nsight-privilege.conf seems to be corrupt, please remove it.

1 Like

The nvidia-smi is working now !

So the error came from :

  • an incorrect version of gcc
  • a corrupted file and a missing blacklist in the /etc/modprob.d folder

Thank you so much for your great help and your advices !