Unable to load Nvidia Driver for Ubuntu 20.04 LTS

$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
$ ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:v000010DEd000024B8sv00001028sd00000A69bc03sc00i00
vendor   : NVIDIA Corporation
driver   : nvidia-driver-515-open - distro non-free recommended
driver   : nvidia-driver-510-server - distro non-free
driver   : nvidia-driver-470-server - distro non-free
driver   : nvidia-driver-460 - third-party non-free
driver   : nvidia-driver-495 - third-party non-free
driver   : nvidia-driver-470 - third-party non-free
driver   : nvidia-driver-515 - third-party non-free
driver   : nvidia-driver-515-server - distro non-free
driver   : nvidia-driver-510 - third-party non-free
driver   : nvidia-driver-520 - third-party non-free
driver   : xserver-xorg-video-nouveau - distro free builtin

== /sys/devices/pci0000:00/0000:00:1f.4 ==
modalias : pci:v00008086d000043A3sv00001028sd00000A69bc0Csc05i00
vendor   : Intel Corporation
driver   : oem-somerville-blastoise-meta - third-party free

== /sys/devices/virtual/dmi/id ==
modalias : dmi:bvnDellInc.:bvr1.15.2:bd09/08/2022:br1.15:svnDellInc.:pnPrecision7560:pvr:rvnDellInc.:rn01C06K:rvrA01:cvnDellInc.:ct10:cvr:sku0A69:
driver   : oem-somerville-meta - third-party free
driver   : oem-release - third-party free
$ systemctl status nvidia-persistenced
● nvidia-persistenced.service - NVIDIA Persistence Daemon
     Loaded: loaded (/lib/systemd/system/nvidia-persistenced.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Fri 2022-10-14 09:19:21 CEST; 5min ago
    Process: 2153 ExecStart=/usr/bin/nvidia-persistenced --verbose (code=exited, status=1/FAILURE)
    Process: 2435 ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced/* (code=exited, status=0/SUCCESS)

From what I found on the internet, I have tried the following -

  1. tried, uninstalling and reinstalling the driver on multiple kernels → did not work
5.11.20-051120-generic  # Whole system had issues
5.13.19-051319-generic 
5.15.25-051525-generic
5.17.0-051700-generic
  1. disabled secureboot and reinstalled → did not work
  2. upgraded the bios to latest version → did not work
  3. sudo prime-select nvidia → did not work

please find attached the nvidia-bug-report. Any assistance is highly appreciated.
Thanks in advance!
nvidia-bug-report.log.gz (205.8 KB)

You’re running a non-standard kernel with missing or incompatible headers so the driver doesn’t compile. Please return to the standard 5.15 kernel.

Like i said, I used the 5.15 too…

That’s not the stock kernel. That would look like 5.15.0-50-generic
sudo apt install --install-recommends linux-generic-hwe-20.04

Hi there, even after installing the above kernel, I still have the same issue.
PFA the bug report.
nvidia-bug-report.log.gz (428.2 KB)

Hi there, could you please advise on the next steps?

The driver is installed and working fine now, the nvidia gpu is in on-demand mode now.
prime-select should also work now.

You are right. It is weird, last i checked, i was still getting the same error. Now i can see the proper output. Anyways, thank you very much for your help!

Hi, I have a similar issue running on 18.04.6 LTS (Bionic Beaver), the problem surfaced few days ago when an user upgraded GCC, now the NVIDIA driver does no load.

This is a Lambda Labs workstation, and I have tried to uninstall and reinstalled the Lambda Stack for deep learning that is supposed to covered the NVIDIA drivers with all the deep learning modules like cuda, tensorflow and pytorch.

I have been researching on this issue for couple of days, and any helps would be greatly appreciated. I have checked that secure boot is disabled, and nvidia is not blacklisted in modprobe.d, and I have seen and tried other suggestions I found in other posts but nothing has worked so far. As far as I can tell, the compiled driver is 515.65.01 and GCC is 9.4.0

NVRM version: NVIDIA UNIX x86_64 Kernel Module 515.65.01 Wed Jul 20 14:00:58 UTC 2022
GCC version: gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~18.04)

Here is the output from /var/lib/gpu-manager.log

log_file: /var/log/gpu-manager.log
last_boot_file: /var/lib/ubuntu-drivers-common/last_gfx_boot
new_boot_file: /var/lib/ubuntu-drivers-common/last_gfx_boot
can’t access /opt/amdgpu-pro/bin/amdgpu-pro-px
Looking for nvidia modules in /lib/modules/5.0.0-37-generic/updates/dkms
Found nvidia module: nvidia.ko
Looking for amdgpu modules in /lib/modules/5.0.0-37-generic/updates/dkms
Is nvidia loaded? yes
Was nvidia unloaded? no
Is nvidia blacklisted? no
Is intel loaded? no
Is radeon loaded? no
Is radeon blacklisted? no
Is amdgpu loaded? no
Is amdgpu blacklisted? no
Is amdgpu versioned? no
Is amdgpu pro stack? no
Is nouveau loaded? no
Is nouveau blacklisted? yes
Is nvidia kernel module available? yes
Is amdgpu kernel module available? no
Vendor/Device Id: 10de:1e04
BusID “PCI:104@0:0:0”
Is boot vga? yes
Vendor/Device Id: 10de:1e04
BusID “PCI:26@0:0:0”
Is boot vga? no
can’t access /etc/u-d-c-nvidia-runtimepm-override file
Found json file: /usr/share/doc/nvidia-driver-495-server/supported-gpus.json
File /usr/share/doc/nvidia-driver-495-server/supported-gpus.json not found
Is nvidia runtime pm supported for “0x1e04”? yes
Trying to create new file: /run/nvidia_runtimepm_supported
Checking power status in /proc/driver/nvidia/gpus/0000:1a:00.0/power
Runtime D3 status: ?
Is nvidia runtime pm enabled for “0x1e04”? no
Vendor/Device Id: 10de:1e04
BusID “PCI:25@0:0:0”
Is boot vga? no
can’t access /etc/u-d-c-nvidia-runtimepm-override file
Found json file: /usr/share/doc/nvidia-driver-495-server/supported-gpus.json
File /usr/share/doc/nvidia-driver-495-server/supported-gpus.json not found
Is nvidia runtime pm supported for “0x1e04”? yes
Trying to create new file: /run/nvidia_runtimepm_supported
Checking power status in /proc/driver/nvidia/gpus/0000:19:00.0/power
Runtime D3 status: Disabled by default
Is nvidia runtime pm enabled for “0x1e04”? no
Vendor/Device Id: 10de:1e04
BusID “PCI:103@0:0:0”
Is boot vga? no
can’t access /etc/u-d-c-nvidia-runtimepm-override file
Found json file: /usr/share/doc/nvidia-driver-495-server/supported-gpus.json
File /usr/share/doc/nvidia-driver-495-server/supported-gpus.json not found
Is nvidia runtime pm supported for “0x1e04”? yes
Trying to create new file: /run/nvidia_runtimepm_supported
Checking power status in /proc/driver/nvidia/gpus/0000:67:00.0/power
Runtime D3 status: ?
Is nvidia runtime pm enabled for “0x1e04”? no
Skipping “/dev/dri/card3”, driven by “nvidia-drm”
Skipping “/dev/dri/card2”, driven by “nvidia-drm”
Skipping “/dev/dri/card1”, driven by “nvidia-drm”
Skipping “/dev/dri/card0”, driven by “nvidia-drm”
Skipping “/dev/dri/card3”, driven by “nvidia-drm”
Skipping “/dev/dri/card2”, driven by “nvidia-drm”
Skipping “/dev/dri/card1”, driven by “nvidia-drm”
Skipping “/dev/dri/card0”, driven by “nvidia-drm”
Skipping “/dev/dri/card3”, driven by “nvidia-drm”
Skipping “/dev/dri/card2”, driven by “nvidia-drm”
Skipping “/dev/dri/card1”, driven by “nvidia-drm”
Skipping “/dev/dri/card0”, driven by “nvidia-drm”
Skipping “/dev/dri/card3”, driven by “nvidia-drm”
Skipping “/dev/dri/card2”, driven by “nvidia-drm”
Skipping “/dev/dri/card1”, driven by “nvidia-drm”
Skipping “/dev/dri/card0”, driven by “nvidia-drm”
Does it require offloading? no
last cards number = 4
Has amd? no
Has intel? no
Has nvidia? yes
How many cards? 4
Has the system changed? No
Unsupported discrete card vendor: 10de
Nothing to do

Here is the output from ubuntu-drivers devices:
WARNING:root:_pkg_get_support nvidia-driver-515-server: package has invalid Support PBheader, cannot determine support level
WARNING:root:_pkg_get_support nvidia-driver-510-server: package has invalid Support PBheader, cannot determine support level
WARNING:root:_pkg_get_support nvidia-driver-515: package has invalid Support PBheader, cannot determine support level
== /sys/devices/pci0000:16/0000:16:00.0/0000:17:00.0/0000:18:10.0/0000:1a:00.0 ==
modalias : pci:v000010DEd00001E04sv00001462sd00003712bc03sc00i00
vendor : NVIDIA Corporation
driver : nvidia-driver-418-server - distro non-free
driver : nvidia-driver-515-server - distro non-free
driver : nvidia-driver-470-server - distro non-free
driver : nvidia-driver-450-server - distro non-free
driver : nvidia-driver-520 - distro non-free recommended
driver : nvidia-driver-510-server - distro non-free
driver : nvidia-driver-515 - third-party non-free
driver : xserver-xorg-video-nouveau - distro free builtin

Here is the output from modinfo nvidia

filename: /lib/modules/5.0.0-37-generic/updates/dkms/nvidia.ko
firmware: nvidia/515.65.01/gsp.bin
alias: char-major-195-*
version: 515.65.01
supported: external
license: NVIDIA
srcversion: 8049D44E2C1B08F41E1B8A6
alias: pci:v000010DEdsvsdbc06sc80i00
alias: pci:v000010DEdsvsdbc03sc02i00
alias: pci:v000010DEdsvsdbc03sc00i00
depends: drm
retpoline: Y
name: nvidia
vermagic: 5.0.0-37-generic SMP mod_unload
parm: NvSwitchRegDwords:NvSwitch regkey (charp)
parm: NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid…] (charp)
parm: NVreg_ResmanDebugLevel:int
parm: NVreg_RmLogonRC:int
parm: NVreg_ModifyDeviceFiles:int
parm: NVreg_DeviceFileUID:int
parm: NVreg_DeviceFileGID:int
parm: NVreg_DeviceFileMode:int
parm: NVreg_InitializeSystemMemoryAllocations:int
parm: NVreg_UsePageAttributeTable:int
parm: NVreg_EnablePCIeGen3:int
parm: NVreg_EnableMSI:int
parm: NVreg_TCEBypassMode:int
parm: NVreg_EnableStreamMemOPs:int
parm: NVreg_RestrictProfilingToAdminUsers:int
parm: NVreg_PreserveVideoMemoryAllocations:int
parm: NVreg_EnableS0ixPowerManagement:int
parm: NVreg_S0ixPowerManagementVideoMemoryThreshold:int
parm: NVreg_DynamicPowerManagement:int
parm: NVreg_DynamicPowerManagementVideoMemoryThreshold:int
parm: NVreg_EnableGpuFirmware:int
parm: NVreg_EnableGpuFirmwareLogs:int
parm: NVreg_OpenRmEnableUnsupportedGpus:int
parm: NVreg_EnableUserNUMAManagement:int
parm: NVreg_MemoryPoolSize:int
parm: NVreg_KMallocHeapMaxSize:int
parm: NVreg_VMallocHeapMaxSize:int
parm: NVreg_IgnoreMMIOCheck:int
parm: NVreg_NvLinkDisable:int
parm: NVreg_EnablePCIERelaxedOrderingMode:int
parm: NVreg_RegisterPCIDriver:int
parm: NVreg_EnableDbgBreakpoint:int
parm: NVreg_RegistryDwords:charp
parm: NVreg_RegistryDwordsPerDevice:charp
parm: NVreg_RmMsg:charp
parm: NVreg_GpuBlacklist:charp
parm: NVreg_TemporaryFilePath:charp
parm: NVreg_ExcludedGpus:charp
parm: NVreg_DmaRemapPeerMmio:int
parm: rm_firmware_active:charp

Here is the output from systemctl status nvidia-persistenced

● nvidia-persistenced.service - NVIDIA Persistence Daemon
** Loaded: loaded (/lib/systemd/system/nvidia-persistenced.service; enabled; vendor preset: enabled)**
** Active: failed (Result: exit-code) since Thu 2022-10-27 15:15:29 PDT; 18h ago**
** Process: 1699 ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced (code=exited, status=0/SUCCESS)**
** Process: 1697 ExecStart=/usr/bin/nvidia-persistenced --user nvidia-persistenced --persistence-mode --verbose (code=exited, status=1/FAILURE)**

Oct 27 15:15:29 arvand.usc.edu nvidia-persistenced[1698]: Started (1698)
Oct 27 15:15:29 arvand.usc.edu nvidia-persistenced[1697]: nvidia-persistenced failed to initialize. Check syslog for more details.
Oct 27 15:15:29 arvand.usc.edu nvidia-persistenced[1698]: Failed to query NVIDIA devices. Please ensure that the NVIDIA device files (/dev/nvidia) exist, and that user 122 has read and write permissions for those files.*
Oct 27 15:15:29 arvand.usc.edu systemd[1]: nvidia-persistenced.service: Control process exited, code=exited status=1
Oct 27 15:15:29 arvand.usc.edu nvidia-persistenced[1698]: PID file unlocked.
Oct 27 15:15:29 arvand.usc.edu nvidia-persistenced[1698]: PID file closed.
Oct 27 15:15:29 arvand.usc.edu nvidia-persistenced[1698]: The daemon no longer has permission to remove its runtime data directory /var/run/nvidia-persistenced
Oct 27 15:15:29 arvand.usc.edu nvidia-persistenced[1698]: Shutdown (1698)
Oct 27 15:15:29 arvand.usc.edu systemd[1]: nvidia-persistenced.service: Failed with result ‘exit-code’.
Oct 27 15:15:29 arvand.usc.edu systemd[1]: Failed to start NVIDIA Persistence Daemon.

nvidia-bug-report.log (931.6 KB)

Please open a new thread, this is about to be closed.
In your case, the nvidia driver is installed fine, loads without errors but is then inaccessible for some reason.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.