Nvidia-driver-470: nvidia-smi shows "No devices were found" on PC Ubuntu 18.04

== The problem ==
After upgrading to nvidia-driver-470, our Cuda applications are not working, and nvidia-smi is not finding any GPU device:

me@bimba:~$ nvidia-smi
No devices were found

The exact same issue happened on 2 separate PCs.
Previously we were using nvidia-driver-465 and it was working fine.

After upgrading to nvidia-driver-470, I couldn’t downgrade back to 465, because the 465 package in the apt repo is a transitional package for 470:

me@bimba:~$ apt show nvidia-driver-465
Package: nvidia-driver-465
Version: 470.57.02-0ubuntu0.18.04.1
Source: nvidia-graphics-drivers-470
APT-Sources: http://il.archive.ubuntu.com/ubuntu bionic-updates/restricted amd64 Packages
Description: Transitional package for nvidia-driver-470

In order to downgrade, I followed these nvidia instructions to add another apt repo which contains nvidia-driver-465, and after the downgrade to 465 everything was working again.

== Nvidia Bug Report ==
nvidia-bug-report.log.gz (534.1 KB)

== System information ==

  • OS:
me@bimba:~$ uname -a
Linux bimba 5.4.0-80-generic #90~18.04.1-Ubuntu SMP Tue Jul 13 19:40:02 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

me@bimba:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.5 LTS
Release:        18.04
Codename:       bionic
  • GPU:
me@bimba:~$ lspci | grep -i VGA
01:00.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1)
  • Installed Nvidia packages:
me@bimba:~$ apt list --installed | grep nvidia

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

libnvidia-cfg1-470/unknown,now 470.57.02-0ubuntu1 amd64 [installed,automatic]
libnvidia-common-465/unknown,now 465.19.01-0ubuntu1 all [installed,auto-removable]
libnvidia-common-470/unknown,now 470.57.02-0ubuntu1 all [installed,automatic]
libnvidia-compute-470/unknown,now 470.57.02-0ubuntu1 amd64 [installed,automatic]
libnvidia-decode-470/unknown,now 470.57.02-0ubuntu1 amd64 [installed,automatic]
libnvidia-encode-470/unknown,now 470.57.02-0ubuntu1 amd64 [installed,automatic]
libnvidia-extra-470/unknown,now 470.57.02-0ubuntu1 amd64 [installed,automatic]
libnvidia-fbc1-470/unknown,now 470.57.02-0ubuntu1 amd64 [installed,automatic]
libnvidia-gl-470/unknown,now 470.57.02-0ubuntu1 amd64 [installed,automatic]
libnvidia-ifr1-470/unknown,now 470.57.02-0ubuntu1 amd64 [installed,automatic]
nvidia-compute-utils-470/unknown,now 470.57.02-0ubuntu1 amd64 [installed,automatic]
nvidia-dkms-470/unknown,now 470.57.02-0ubuntu1 amd64 [installed,automatic]
nvidia-driver-470/unknown,now 470.57.02-0ubuntu1 amd64 [installed]
nvidia-kernel-common-470/unknown,now 470.57.02-0ubuntu1 amd64 [installed,automatic]
nvidia-kernel-source-470/unknown,now 470.57.02-0ubuntu1 amd64 [installed,automatic]
nvidia-prime/bionic-updates,bionic-updates,now 0.8.16~0.18.04.1 all [installed,automatic]
nvidia-settings/unknown,now 470.57.02-0ubuntu1 amd64 [installed,automatic]
nvidia-utils-470/unknown,now 470.57.02-0ubuntu1 amd64 [installed,automatic]
xserver-xorg-video-nvidia-470/unknown,now 470.57.02-0ubuntu1 amd64 [installed,automatic]

== Questions ==

  1. Please let me know if any additional information is needed
  2. Why was nvidia-driver-465 removed from the http://il.archive.ubuntu.com/ubuntu apt repo and replaced with a 470 transitional package?
  3. Most importantly - what should I do to make nvidia-driver-470 work?

For comparison, here is an Nvidia Bug Report from the same PC, after downgrading back to nvidia-driver-465:
nvidia-bug-report.log.gz (279.4 KB)

Everything is working now (with 465):

me@bimba:~$ nvidia-smi 
Sun Aug 22 18:30:14 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01    Driver Version: 465.19.01    CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
|  0%   28C    P8    N/A /  72W |     91MiB /  4032MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1528      G   /usr/lib/xorg/Xorg                 36MiB |
|    0   N/A  N/A      4517      G   /usr/bin/sddm-greeter              52MiB |
+-----------------------------------------------------------------------------+

I am still experiencing this problem.
Driver nvidia-driver-470 is not working (nvidia-smi shows "No devices were found").
And nvidia-driver-465 redirects to the (non-working) nvidia-driver-470:

Package: nvidia-driver-465
Version: 470.103.01-0ubuntu0.18.04.1
APT-Sources: http://us.archive.ubuntu.com/ubuntu bionic-updates/restricted amd64 Packages
Description: Transitional package for nvidia-driver-470

I would very much appreciate any help.

That’s a known bug, was fixed in 470.63-470.86 and then broke again. Please check if the 460 server driver is still available in ubuntu 18.04 repos. Otherwise, i guess you’ll have to install a working version using the runfile installer.

Also, please check if a bios update fixes it.

Thanks @generix.

So far, there is a repo with 465.19.01 availalble, and that’s been working for me.
Hoever, our project is being developed on an nvidia-driver-470 environment (and we’ll probably want to want to upgrade to a newer version soon), and having some of the PCs held back to an older version is causing complications for us.

The BIOS update is an interesting idea. I’ll try it.
Thanks again.

I have a problem, and nvidia does not allow to post,

mine is this:
I installed alma9.3 (I guess very similar to centos),
and now I have probloem of nvidia,

I tried installing epel-release and nvidia-detect,
but when I type nvidia-detect it shows nothing.

nvidia-smi:
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.