Unable to install NVIDIA drivers for 3090 on Ubuntu 20.04


I have a 3090 installed on a machine with Ubuntu 20.04. I have been trying to install Nvidia drivers (both manually, using the .run file, and through “Software and Updates”), but cannot get the drivers to work.

This is the my kernel version : 5.15.0-53-generic
I have tried 520, 515 and 510 versions (open-kernel and metapackage versions), as well as 515.76 .run file from the official drivers website.
With 515,520 open-kernel versions - the system outputs this when running nvidia-smi :

NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

And with the other non open-kernel versions the system boots to a black screen.

I have tried several things, including the solution from this thread : Cannot get nvidia driver (520, 515, 515-open, or 510) working in Ubuntu 22.10 , but to no avail.

I have blacklisted Noveau and all the other steps mentioned in the thread as well, but none of them work. (I have not yet disabled Secure Boot, though)

What process should I follow to install nvidia-drivers 515 or above on my system?

Here is a bit more additional information.

Currently, I have the 515 open-kernel drivers installed and it boots up correctly, but upon running nvidia-smi I get this error; this happens with 520 open-kernel version as well. :

Unable to determine the device handle for GPU 0000:01:00.0: Not Found

There is no /etc/modprobe.d/nvidia-graphics-drivers-kms.conf on my system.

This is the contents of /lib/modprobe.d/nvidia-kms.conf :

This file was generated by nvidia-prime

Set value to 1 to enable modesetting

options nvidia-drm modeset=1

The output of nvidia-settings :

ERROR: A query to find an object was unsuccessful
ERROR: Unable to load info from any available system
(nvidia-settings:9774): GLib-GObject-CRITICAL **: 10:06:17.428: g_object_unref: assertion ‘G_IS_OBJECT (object)’ failed
** Message: 10:06:17.430: PRIME: Requires offloading
** Message: 10:06:17.430: PRIME: is it supported? yes
** Message: 10:06:17.447: PRIME: Usage: /usr/bin/prime-select nvidia|intel|on-demand|query
** Message: 10:06:17.447: PRIME: on-demand mode: “1”
** Message: 10:06:17.447: PRIME: is “on-demand” mode supported? yes

The output of dmesg | grep nvidia :

[    1.759777] nvidia: loading out-of-tree module taints kernel.
[    1.762247] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[    1.804698] nvidia-nvlink: Nvlink Core is being initialized, major device number 510
[    1.805227] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[    1.862602] nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for x86_64  515.65.01  Release Build  (dvs-builder@U16-T11-05-2)  Wed Jul 20 13:43:59 UTC 2022
[    1.946579] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[    4.143063] NVRM: Open nvidia.ko is only ready for use on Data Center GPUs.
[    4.143070] NVRM: To force use of Open nvidia.ko on other GPUs, see the
[    4.424820] [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice
[    4.424912] [drm:nv_drm_probe_devices [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to register device
[    4.442684] audit: type=1400 audit(1670014530.519:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=897 comm="apparmor_parser"
[    4.442688] audit: type=1400 audit(1670014530.519:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=897 comm="apparmor_parser"
[    4.503574] nvidia-uvm: Loaded the UVM driver, major device number 507.

Any help would be appreciated!

I am attaching the nvidia-bug-report and nvidia installer logs here :
nvidia-installer.log (42.5 KB)
nvidia-bug-report.log (3.9 MB)

I have recently purged my system and tried to reinstall the drivers. This is the procedure I followed :

  1. purge everything nvidia related using apt-get
  2. Uninstall Nouveau drivers
  3. install nvidia-drivers-525 using apt

I rebooted the machine and the GUI seems to be broken, this is what I can see on boot :

(Please note that this happened with 520 and 510 non open-kernel versions as well)

I have ssh open on the machine, so I can remotely ssh in and use the terminal.
nvidia-smi runs properly, this is the output :

| NVIDIA-SMI 525.60.11    Driver Version: 525.60.11    CUDA Version: 12.0     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  Off |
|  0%   39C    P8    23W / 450W |      1MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |

| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|  No running processes found                                                 |

Unfortunately, any operation I run on the GPU leaves the machine hanging, whether i run it from base or through an ngc docker. (even though torch,tensorflow etc can see that cuda and gpu is available)
For eg, a simple torch.rand(1).to(“cuda”) runs indefinitely.

I am attaching the nvidia-bug-report and installer logs here :
nvidia-bug-report.log (2.3 MB)
nvidia-installer.log (42.5 KB)

Thank you!