Nvidia driver is not working on Ubuntu 22.04

Hello,

I have finally bought a computer with a Nvidia graphics card, and I cannot find why the drivers are not working, even though the card is recognized by the OS and the recommended driver 525 is installed.

$ nvidia-smi
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

$ uname -r
5.19.0-40-generic

$ nvidia-settings

ERROR: NVIDIA driver is not loaded

(nvidia-settings:5756): GLib-GObject-CRITICAL **: 23:57:54.696: g_object_unref: assertion ‘G_IS_OBJECT (object)’ failed

** (nvidia-settings:5756): CRITICAL **: 23:57:54.697: ctk_powermode_new: assertion ‘(ctrl_target != NULL) && (ctrl_target->h != NULL)’ failed

ERROR: nvidia-settings could not find the registry key file or the X server is
not accessible. This file should have been installed along with this
driver at
/usr/share/nvidia/nvidia-application-profiles-key-documentation. The
application profiles will continue to work, but values cannot be
prepopulated or validated, and will not be listed in the help text.
Please see the README for possible values and descriptions.

I have already tried to purge and reinstall the drivers and to prime-select.
If needed, here is the bug report:
nvidia-bug-report.log (598.7 KB)

Is there anything I can do?
Thank you in advance for your help.

1 Like

There are no kernel modules installed, also secure boot is enabled.
Please disable secure boot in bios, the try reinstalling kernel headers
sudo apt install --reinstall linux-headers-$(uname -r)
afterwards, please post the output of
dkms status

3 Likes

Thank you for your answer. I have reinstalled the kernel headers and disabled secure boot.
The output is:

$ dkms status
nvidia-srv/525.105.17, 5.19.0-40-generic, x86_64: installed

The outputs of nvidia-smi and nvidia-settings are normal now.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:0B:00.0 Off |                  N/A |
| 33%   31C    P0    33W / 170W |      0MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

However, I can access the desktop environment with TeamViewer only. The screen that is plugged to the HDMI port of the graphics card stays black at boot. The screen isn’t even detected by Ubuntu, and this was not the case before.
Do you have any idea why? Should I downgrade to a previous version of the driver?

1 Like

Rather looks like the driver is loading too late. Please create a ne nvidia-bug-report.log

Here it is:
nvidia-bug-report.log (3.2 MB)

Thank you again for your help!

You have installed a dummy driver. Please delete /usr/share/X11/xorg.conf.d/xorg.conf

What worked for me for ubuntu 22.04 with kernel 6.2.0 for CUDA 11,8
is :
reinstalling the kernel headers using

sudo apt reinstall linux-headers-$(uname -r)

manually installing the nvidia-driver-525 from the apt manager

sudo apt install nvidia-driver-525 

For some reason, the driver version 520 that comes with the CUDA 11.8 toolkit doesn’t work with the new upgraded kernel.
And then manually installing the CUDA toolkit through the run file instead of the debian package ( THIS IS OPTIONAL) - for only folks who need the cuda toolkit

1 Like

How can we install Nvidia Drivers, CUDA packages, CUDNN packages on Azure NC A100 v4 VM Ubuntu 22.04 Linux OS with GPU capabilities? I have tried installing different versions on my VM including nvidia-driver-535, nvidia-driver-550, nvidia-driver-535-server, etc. But each time am facing an issue:
nvidia-smi
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

I have read all the blogs pertaining to reinstallation of kernels and disabling secure boot. I have already taken care of all these steps. Looking forward to get some support and guidance from Nvidia Team.

Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post.

Sure, here is the file for your reference

The VM has secure boot enabled but you don’t have a signed driver. Please disable secure boot for the VM.

Secure Boot is already disabled. See this

Then you forgot to reboot. The kernel says
[ 0.016764] secureboot: Secure boot enabled

Nope, I remembered clearly, I rebooted after disabling secureboot. Here is the blog that I followed. Is there any other way to disable the secure boot in VM?

Those are boot diagnostics and snapshots you disabled. Nothing to do with secure boot, rather turn that back on.
Secure boot:
https://learn.microsoft.com/en-us/azure/virtual-machines/trusted-launch-existing-vm?tabs=portal

Hi, i have a similar problem. I have disabled secure-boot but nvidia-smi isn’t working and the gpu gets listed as “Nvidia corporation device”
nvidia-bug-report.log.gz (89.9 KB)

The kernel modules are missing, please try reinstalling kernel headers
sudo apt install --reinstall linux-headers-$(uname -r)
afterwards, please post the output of
dkms status
and
dpkg -l |grep nvidia

I reinstalled the kernel headers, dkms status doesn’t show anything and dpkg gives the output i attached
dpkg.txt (4.9 KB)

You have installed the no-dkms-server driver, that won’t work. Please use Software&Updates to switch to a normal driver.

1 Like

Thanks, that helped