I have finally bought a computer with a Nvidia graphics card, and I cannot find why the drivers are not working, even though the card is recognized by the OS and the recommended driver 525 is installed.
$ nvidia-smi
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
ERROR: nvidia-settings could not find the registry key file or the X server is
not accessible. This file should have been installed along with this
driver at
/usr/share/nvidia/nvidia-application-profiles-key-documentation. The
application profiles will continue to work, but values cannot be
prepopulated or validated, and will not be listed in the help text.
Please see the README for possible values and descriptions.
I have already tried to purge and reinstall the drivers and to prime-select.
If needed, here is the bug report: nvidia-bug-report.log (598.7 KB)
Is there anything I can do?
Thank you in advance for your help.
There are no kernel modules installed, also secure boot is enabled.
Please disable secure boot in bios, the try reinstalling kernel headers
sudo apt install --reinstall linux-headers-$(uname -r)
afterwards, please post the output of
dkms status
Thank you for your answer. I have reinstalled the kernel headers and disabled secure boot.
The output is:
$ dkms status
nvidia-srv/525.105.17, 5.19.0-40-generic, x86_64: installed
The outputs of nvidia-smi and nvidia-settings are normal now.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:0B:00.0 Off | N/A |
| 33% 31C P0 33W / 170W | 0MiB / 12288MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
However, I can access the desktop environment with TeamViewer only. The screen that is plugged to the HDMI port of the graphics card stays black at boot. The screen isn’t even detected by Ubuntu, and this was not the case before.
Do you have any idea why? Should I downgrade to a previous version of the driver?
What worked for me for ubuntu 22.04 with kernel 6.2.0 for CUDA 11,8
is :
reinstalling the kernel headers using
sudo apt reinstall linux-headers-$(uname -r)
manually installing the nvidia-driver-525 from the apt manager
sudo apt install nvidia-driver-525
For some reason, the driver version 520 that comes with the CUDA 11.8 toolkit doesn’t work with the new upgraded kernel.
And then manually installing the CUDA toolkit through the run file instead of the debian package ( THIS IS OPTIONAL) - for only folks who need the cuda toolkit
How can we install Nvidia Drivers, CUDA packages, CUDNN packages on Azure NC A100 v4 VM Ubuntu 22.04 Linux OS with GPU capabilities? I have tried installing different versions on my VM including nvidia-driver-535, nvidia-driver-550, nvidia-driver-535-server, etc. But each time am facing an issue:
nvidia-smi
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
I have read all the blogs pertaining to reinstallation of kernels and disabling secure boot. I have already taken care of all these steps. Looking forward to get some support and guidance from Nvidia Team.
Nope, I remembered clearly, I rebooted after disabling secureboot. Here is the blog that I followed. Is there any other way to disable the secure boot in VM?
Hi, i have a similar problem. I have disabled secure-boot but nvidia-smi isn’t working and the gpu gets listed as “Nvidia corporation device” nvidia-bug-report.log.gz (89.9 KB)
The kernel modules are missing, please try reinstalling kernel headers sudo apt install --reinstall linux-headers-$(uname -r)
afterwards, please post the output of dkms status
and dpkg -l |grep nvidia