NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running

Hi,
Last week I installed several conda envs with different versions of pytorch and cuda with commands from official pytorch website. Now I can’t run any nvidia docker or access GPUs. I removed all conda envs and did reboot.

I am running into

nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

after the reboot. Most solutions that I checked didn’t help, providing all outputs from other forum solutions:

lspci -v | grep VGA
01:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1) (prog-if 00 [VGA controller])
21:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1) (prog-if 00 [VGA controller])
4b:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1) (prog-if 00 [VGA controller])
4c:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1) (prog-if 00 [VGA controller])
(base) igor@myserver:~$ sudo dkms status
(base) igor@myserver:~$ sudo dpkg -l | grep nvidia
ii  libnvidia-compute-535-server:amd64     535.161.08-0ubuntu2.22.04.1             amd64        NVIDIA libcompute package
ii  libnvidia-container-tools              1.14.6-1                                amd64        NVIDIA container runtime library (command-line tools)
ii  libnvidia-container1:amd64             1.14.6-1                                amd64        NVIDIA container runtime library
ii  nvidia-container-toolkit               1.14.6-1                                amd64        NVIDIA Container toolkit
ii  nvidia-container-toolkit-base          1.14.6-1                                amd64        NVIDIA Container Toolkit Base
ii  nvidia-utils-535-server                535.161.08-0ubuntu2.22.04.1             amd64        NVIDIA Server Driver support binaries

The log.
nvidia-bug-report.log.gz (158.2 KB)

Nvidia installer log
nvidia-installer.log (41.7 KB)

Please advice

 sudo apt-get install nvidia-driver-535

and reboot
fixed the problem

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.