I got error “NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.”
When I checked current driver with ~$ ubuntu-drivers list , the result is following.
vidia-driver-535-open, (kernel modules provided by linux-modules-nvidia-535-open-oem-22.04c)
nvidia-driver-525-server, (kernel modules provided by nvidia-dkms-525-server)
nvidia-driver-525-open, (kernel modules provided by linux-modules-nvidia-525-open-oem-22.04c)
nvidia-driver-525, (kernel modules provided by linux-modules-nvidia-525-oem-22.04c)
nvidia-driver-535-server, (kernel modules provided by nvidia-dkms-535-server)
nvidia-driver-535, (kernel modules provided by linux-modules-nvidia-535-oem-22.04c)
nvidia-driver-535-server-open, (kernel modules provided by nvidia-dkms-535-server-open)
When I checked current compilation tool with ~$ nvcc -V , the result is following.
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Current situation are Ubuntu 22.04.3 LTS , GA compatible controller: NVIDIA Corporation Device 27bb (rev a1)
Could you team me solution to success “nvidia-smi”?
Hello @user36321 and welcome to the NVIDIA developer forums.
nvidia-bug-report.sh and attach the output to this thread.
Thank you for Reply.
When I run “sudo nvidia-bug-report.sh” ,the output was following.
nvidia-bug-report.sh will now collect information about your
system and create the file ‘nvidia-bug-report.log.gz’ in the current
directory. It may take several seconds to run. In some
cases, it may hang trying to capture data generated dynamically
by the Linux kernel and/or the NVIDIA kernel module. While
the bug report log file will be incomplete if this happens, it
may still contain enough data to diagnose your problem.
If nvidia-bug-report.sh hangs, consider running with the --safe-mode
and --extra-system-data command line arguments.
Please include the ‘nvidia-bug-report.log.gz’ log file when reporting
your bug via the NVIDIA Linux forum (see forums.developer.nvidia.com)
or by sending email to ‘email@example.com’.
By delivering ‘nvidia-bug-report.log.gz’ to NVIDIA, you acknowledge
and agree that personal information may inadvertently be included in
the output. Notwithstanding the foregoing, NVIDIA will use the
output only for the purpose of investigating your reported issue.
Running nvidia-bug-report.sh… complete.
Can you catch anything to solve from this ?
nvidia-bug-report.log.gz (134.8 KB)
sorry for my misunderstanding.
I attached output file.
can you anything from this ?
Have a look at the output of :
Aug 22 10:35:10 taihi-Precision-5680 kernel: [ 3.899959] nvidia-nvlink: Nvlink Core is being initialized, major device number 505
Aug 22 10:35:10 taihi-Precision-5680 kernel: [ 3.951518] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 525.105.17 Tue Mar 28 18:02:59 UTC 2023
Aug 22 10:35:10 taihi-Precision-5680 kernel: [ 3.977321] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 525.105.17 Tue Mar 28 22:18:37 UTC 2023
Aug 22 10:35:10 taihi-Precision-5680 kernel: [ 4.003195] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
Aug 22 10:35:11 taihi-Precision-5680 kernel: [ 4.947435] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 1
Aug 22 10:35:11 taihi-Precision-5680 kernel: [ 5.149596] nvidia-uvm: Loaded the UVM driver, major device number 503.
Aug 22 10:44:54 taihi-Precision-5680 kernel: [ 588.333187] NVRM: API mismatch: the client has the version 525.125.06, but
Aug 22 10:44:54 taihi-Precision-5680 kernel: [ 588.333187] NVRM: this kernel module has the version 525.105.17. Please
Aug 22 10:44:54 taihi-Precision-5680 kernel: [ 588.333187] NVRM: make sure that this kernel module and all NVIDIA driver
Aug 22 10:44:54 taihi-Precision-5680 kernel: [ 588.333187] NVRM: components have the same version.
Aug 22 10:58:46 taihi-Precision-5680 kernel: [ 5.127135] nvidia-nvlink: Nvlink Core is being initialized, major device number 505
Aug 22 10:58:46 taihi-Precision-5680 kernel: [ 5.176767] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 535.86.05 Fri Jul 14 20:46:33 UTC 2023
Aug 22 10:58:46 taihi-Precision-5680 kernel: [ 5.197558] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 535.86.05 Fri Jul 14 20:20:58 UTC 2023
Aug 22 10:58:46 taihi-Precision-5680 kernel: [ 5.230160] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
Aug 22 10:58:47 taihi-Precision-5680 kernel: [ 6.144563] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 1
Aug 22 10:58:47 taihi-Precision-5680 kernel: [ 6.454275] nvidia-uvm: Loaded the UVM driver, major device number 503.
Aug 22 10:58:49 taihi-Precision-5680 kernel: [ 7.624670] NVRM: API mismatch: the client has the version 535.98, but
Aug 22 10:58:49 taihi-Precision-5680 kernel: [ 7.624670] NVRM: this kernel module has the version 535.86.05. Please
Aug 22 10:58:49 taihi-Precision-5680 kernel: [ 7.624670] NVRM: make sure that this kernel module and all NVIDIA driver
Aug 22 10:58:49 taihi-Precision-5680 kernel: [ 7.624670] NVRM: components have the same version.
You have 4 different NVIDIA drivers installed. I think you should fix that first.
Please purge ALL NVIDIA drivers cleanly. That means reboot into console mode, unload all NVIDIA related kernel modules, then remove ALL NVIDIA related driver packages. The README file of the downloadable driver packages has detailed instructions and a list of files to look for.
Then reboot again, again into console mode.
Then install ONE correct driver for your system. Either you install the recommended proprietary NVIDIA driver through the Ubuntu Software Center’s Third Party application tab, or you download it as a
.run file from Official Drivers | NVIDIA directly, it should be
535.104.05 as we speak.
During installation make sure to follow instructions exactly, especially if you have secure boot enabled and need to authenticate the driver.
I hope this will help you resolve your issues.
Sorry for the late reply. Is “sudo apt-get purge nvidia-*” the command to clean the Nvidia driver?
After running this, when I run “cat /proc/driver/nvidia/version”, it returns that no such file or directory exists.
Is this a situation where the drive is clean?
Sorry for the amateur question
No worries, and definitely not an amateur question.
First make sure you are not using any driver modules. The Linux installation guide has a paragraph “Before you begin” which you should follow, even for the purge.
Then make sure no NVIDIA modules are loaded anymore, use
lsmod | grep nvidia to check that. If there are still modules loaded, unload then with
modprope -r or
After that do the purge command you mentioned.
I hope that helps!
When I run “lsmod | grep nvidia”, there is no response, so I understand that the NVIDIA module is not loaded.
“sudo apt update”
“sudo apt install nvidia-driver-535”
Should I avoid using the method of loading the Driver?
After running “sudo apt update”, it says that there are 36 packages that can be upgraded. Will it be a problem if I leave it as is?
I saw a post that says it’s better not to run .run files, but is that a problem?
In fact, I tried running the downloaded .run file using “sudo sh”, but an error message popped up asking me to stop the x server before installation, so I couldn’t complete it. It would be helpful if you could tell me the installation steps.
You can use the
sudo apt version to install the driver. Either that or the installation through the Third-Party Apps tab of the Software Center in Ubuntu. Both will install the driver package included as part of the distribution, which should work without issues.
If you use the software center option, make sure to use the proprietary driver and NOT Open Source kernel modules.
If you use the
sudo apt version, you need to be in terminal console mode only otherwise you will likely get the same X server error message.
After installation make sure to reboot! But I mentioned this already Aug 29th.
- When I input “ubuntu-drivers devices”, the following was returned.
It seems that the extra driver has not been deleted yet. Could you please tell me the correct way to delete it?
== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
vendor: NVIDIA Corporation
driver : nvidia-driver-525-open - third-party non-free
driver : nvidia-driver-525 - third-party non-free
driver : nvidia-driver-535 - third-party non-free recommended
driver : nvidia-driver-535-open - third-party non-free
driver : xserver-xorg-video-nouveau - distro free builtin
== /sys/devices/pci0000:00/0000:00:14.3 ==
vendor : Intel Corporation
driver : oem-fix-misc-cnl-backport-iwlwifi-helper - third-party free
== /sys/devices/pci0000:00/0000:00:1f.4 ==
modalias : pci:v00008086d000051A3sv00001028sd00000C11bc0Csc05i00
vendor : Intel Corporation
driver : oem-somerville-muk-meta - third-party free
== /sys/devices/pci0000:00/0000:00:05.0 ==
vendor : Intel Corporation
driver : libcamhal-ipu6ep0 - third-party free
・”You can use the sudo apt version to install the driver.”
→Does this refer to the command “sudo apt install nvidia-driver-535”?
・”If you use the software center option, make sure to use the proprietary driver and NOT Open Source kernel modules.”
→I don’t understand this part, so could you please explain it in more detail?
・Is terminal console mode a screen that can be opened with Ctrl+Alt+F4?
No, the command
ubuntu-drivers devices' only lists the drivers that are available as part of the Ubuntu distribution and that can be installed using the Ubuntu package manager. It does not tell you which driver version actually is installed. If nvidia-smi`works as expected it will show the currently installed driver version.
If you installed the driver through Ubuntu the packaging manager, you should be able to check with
sudo apt list --install to see which driver version was installed.
If you used a different installation option then one way to check for just one driver version being present is to look in
/usr/lib/x86_64-linux-gnu and check the files and symbolic links with
libnvidia in their names to see if they only contain one version number. For example for me on one system this looks like this in
lrwxrwxrwx 1 root root 26 libnvidia-cfg.so.1 -> libnvidia-cfg.so.525.85.05
lrwxrwxrwx 1 root root 26 libnvidia-cfg.so.525.85.05
525.85.05 are the only file versions present.
Yes. But the recommended way is through “Software & Updates - Additional Drivers”.
There are driver packages with
openin their names. Do NOT install those. Use “proprietary, tested”
Yes and no. It will look the same, but if you simply switch from the window manager to the console window the graphics driver is still loaded and cannot easily be replaced without issues.
You need to reboot directly into this terminal only, text only mode, not into graphical mode to avoid graphics drivers to be loaded. You can find lots of instructions on that topic online.
Sorry for late reply.
I tried to install novidia-driver-535 on Software & Update as attachment , but I am not yet able to run “nvidia-smi”
On console mode , I runned “sudo apt-get purge nvidia-*” ,but the result was following.
Loading package list… Done
Creating dependency tree… Done
Reading status information… Done
E: Package nvidia-bug-report.log not found
E: No matching packages found for ‘nvidia-bug-report.log’
E: No packages found with regular expression ‘nvidia-bug-report.log’
E: Package nvidia-bug-report.log.gz not found
E: No matching packages found for ‘nvidia-bug-report.log.gz’
E: No packages were found with regular expression ‘nvidia-bug-report.log.gz’
Is the purge method of the nividia driver wrong?
I am lost. The
apt-get command will check in known package names for any that match the expression
nvidia-*, but it will never match the name of a local file. Are you sure you used the command as you wrote it and not accidentally by listing the local file names? You could also try this instead:
sudo apt-get remove --purge '^nvidia-.*'
When I run “sudo apt-get remove --purge ‘^nvidia-.*’” , I didn’t see error messages , so perhaps , I feel purge done well. However , I saw following messages after reboot.
コマンド ‘nvidia-smi’ が見つかりません。次の方法でインストールできます:
sudo apt install nvidia-utils-390 # version 390.157-0ubuntu0.22.04.1, or
sudo apt install nvidia-utils-418-server # version 418.226.00-0ubuntu5~0.22.04.1
sudo apt install nvidia-utils-450-server # version 450.236.01-0ubuntu0.22.04.1
sudo apt install nvidia-utils-470 # version 470.182.03-0ubuntu0.22.04.1
sudo apt install nvidia-utils-470-server # version 470.182.03-0ubuntu0.22.04.1
sudo apt install nvidia-utils-510 # version 510.108.03-0ubuntu0.22.04.1
sudo apt install nvidia-utils-515 # version 515.105.01-0ubuntu0.22.04.1
sudo apt install nvidia-utils-515-server # version 515.105.01-0ubuntu0.22.04.1
sudo apt install nvidia-utils-525 # version 525.105.17-0ubuntu0.22.04.1
sudo apt install nvidia-utils-525-server # version 525.105.17-0ubuntu0.22.04.1
sudo apt install nvidia-utils-530 # version 530.41.03-0ubuntu0.22.04.2
sudo apt install nvidia-utils-510-server # version 510.47.03-0ubuntu3
sudo apt install nvidia-340 # version 340.108-0ubuntu2
sudo apt install nvidia-utils-435 # version 435.21-0ubuntu7
sudo apt install nvidia-utils-440 # version 440.82+really.440.64-0ubuntu6
After that , I tried to install novidia-driver-535 on Software & Update by same perivous method ,again , but I saw same message again as following.
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
My procedure is bad ?