NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver

maxzapletin · August 7, 2022, 7:45am

Hey, after reboot i recieved this message NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver.
5.15.0-43-generic #46~20.04.1-Ubuntu
x86_64 GNU/Linux

nvidia-bug-report.log (343.1 KB)

You might want to run ‘apt --fix-broken install’ to correct these.
The following packages have unmet dependencies:
nvidia-dkms-510 : Depends: nvidia-kernel-common-510 (>= 510.85.02) but 510.73.08-0ubuntu1 is to be installed
nvidia-docker2 : Depends: nvidia-container-toolkit (>= 1.10.0-1) but 1.9.0-1 is to be installed
nvidia-driver-510 : Depends: nvidia-kernel-common-510 (>= 510.85.02) but 510.73.08-0ubuntu1 is to be installed
E: Unmet dependencies. Try ‘apt --fix-broken install’ with no packages (or specify a solution).

maxzapletin · August 7, 2022, 7:54am

I would like to know if you can see something inside the log, that i can prevent the same problem it in te future

generix · August 7, 2022, 12:46pm

It’s a package manager issue, none of that will be caught in the nvidia-bug-report.log.

maxzapletin · August 7, 2022, 1:33pm

So, it’s happend after reboot, i run apt --fix-broken install and its solved the problem,
So the problem was just because ubuntu package manager?

maxzapletin · August 7, 2022, 1:50pm

I found the log file of package manager, Inside the log i saw that every X-time he start to unpacking nvidia-drivers
The question is why it happend, and how i can prevent it
term.log (28.8 KB)

generix · August 7, 2022, 4:52pm

 trying to overwrite '/usr/bin/nvidia-powerd', which is also in package nvidia-compute-utils-510 510.73.08-0ubuntu1

Looks like a packaging bug, two packages contain the same file.

maxzapletin · August 9, 2022, 8:14am

Could you please check another logs file from another computers please
nvidia-bug-report (1).log (252.6 KB)

maxzapletin · August 9, 2022, 8:18am

nvidia-bug-report.log (264.6 KB)

maxzapletin · August 9, 2022, 8:46am

also, another server with this erorr watchdog: BUG: soft lockup - CPU#10 stuck for 52s! [irq/145-nvidia:802]
nvidia-bug-report.log (2.6 MB)

generix · August 9, 2022, 8:55am

The first two are missing the kernel modules, check
dkms status
The third is crashing due to gpu errors. Might be due to overheating or the gpu is damaged.

maxzapletin · August 9, 2022, 9:05am

So, on the first two computers Im installed Nvidia-drivers with run file, and its solved the problem
But this issue happend when I run new AI algorithm on GPU

generix · August 9, 2022, 9:27am

Please use cuda gpu memtest to check the video memory
https://github.com/ComputationalRadiationPhysics/cuda_memtest

maxzapletin · August 10, 2022, 1:10pm

Hello again the third one is probably crashed " nvidia-smi
Unable to determine the device handle for GPU 0000:01:00.0: Unknown Error "
nvidia-bug-report.log (1.2 MB)

maxzapletin · August 21, 2022, 7:47pm

Any suggestion? The server still doesn’t work

generix · August 23, 2022, 12:37pm

Of course not. Please check if the gpu works in another system, if not, replace.

Topic		Replies	Views
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver - after update from Ubuntu 20.04 to 22.04 Linux	6	2160	December 15, 2022
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver Linux kernel , driver	3	742	February 13, 2023
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver Linux	4	792	October 12, 2021
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running General Topics and Other SDKs ubuntu , linux-driver	0	581	November 29, 2023
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver Linux driver	2	764	May 23, 2023
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver Linux cuda , ubuntu	4	2059	May 4, 2021
nvidia-smi error NVIDIA-SMI has failed because it could not communicate with driver General	2	1728	July 9, 2019
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver Linux	9	8158	October 14, 2021
VIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running Linux ubuntu , linux	5	67593	March 8, 2023
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver CUDA Setup and Installation nvidia-smi	1	622	March 24, 2025

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver

Related topics