ThinkPad P16s NVIDIA RTX A500 Linux ERROR: NVIDIA driver is not loaded

Hi,
I bought a new laptop ThinkPad P16s. There is NVIDIA RTX A500 graphics card.
And I can’t make it work.

OS: Fresh intallation of Kubuntu 24.04
Driver version (now): 555.58.02 from ubuntu repository

$ uname -r
6.8.0-40-generic

$ nvidia-settings 

ERROR: NVIDIA driver is not loaded

(nvidia-settings:8016): GLib-GObject-CRITICAL **: 13:08:38.574: g_object_unref: assertion 'G_IS_OBJECT (object)' failed


** (nvidia-settings:8016): CRITICAL **: 13:08:38.575: ctk_powermode_new: assertion '(ctrl_target != NULL) && (ctrl_target->h != NULL)' failed

ERROR: nvidia-settings could not find the registry key file or the X server is not accessible. This file should have been installed along with this driver at
       /usr/share/nvidia/nvidia-application-profiles-key-documentation. The application profiles will continue to work, but values cannot be prepopulated or validated, and will not be
       listed in the help text. Please see the README for possible values and descriptions.

** Message: 13:08:38.599: PRIME: Requires offloading
** Message: 13:08:38.599: PRIME: is it supported? yes
** Message: 13:08:38.639: PRIME: Usage: /usr/bin/prime-select nvidia|intel|on-demand|query
** Message: 13:08:38.639: PRIME: on-demand mode: "1"
** Message: 13:08:38.639: PRIME: is "on-demand" mode supported? yes

$ nvidia-smi
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

$ lspci
00:00.0 Host bridge: Intel Corporation Raptor Lake-P 6p+8e cores Host Bridge/DRAM Controller
00:02.0 VGA compatible controller: Intel Corporation Raptor Lake-P [Iris Xe Graphics] (rev 04)
00:04.0 Signal processing controller: Intel Corporation Raptor Lake Dynamic Platform and Thermal Framework Processor Participant
00:06.0 PCI bridge: Intel Corporation Raptor Lake PCIe 4.0 Graphics Port
00:06.2 PCI bridge: Intel Corporation Device a73d
00:07.0 PCI bridge: Intel Corporation Raptor Lake-P Thunderbolt 4 PCI Express Root Port #0
00:07.2 PCI bridge: Intel Corporation Raptor Lake-P Thunderbolt 4 PCI Express Root Port #2
00:0d.0 USB controller: Intel Corporation Raptor Lake-P Thunderbolt 4 USB Controller
00:0d.2 USB controller: Intel Corporation Raptor Lake-P Thunderbolt 4 NHI #0
00:0d.3 USB controller: Intel Corporation Raptor Lake-P Thunderbolt 4 NHI #1
00:14.0 USB controller: Intel Corporation Alder Lake PCH USB 3.2 xHCI Host Controller (rev 01)
00:14.2 RAM memory: Intel Corporation Alder Lake PCH Shared SRAM (rev 01)
00:14.3 Network controller: Intel Corporation Raptor Lake PCH CNVi WiFi (rev 01)
00:15.0 Serial bus controller: Intel Corporation Alder Lake PCH Serial IO I2C Controller #0 (rev 01)
00:16.0 Communication controller: Intel Corporation Alder Lake PCH HECI Controller (rev 01)
00:16.3 Serial controller: Intel Corporation Alder Lake AMT SOL Redirection (rev 01)
00:1f.0 ISA bridge: Intel Corporation Raptor Lake LPC/eSPI Controller (rev 01)
00:1f.3 Audio device: Intel Corporation Raptor Lake-P/U/H cAVS (rev 01)
00:1f.4 SMBus: Intel Corporation Alder Lake PCH-P SMBus Host Controller (rev 01)
00:1f.5 Serial bus controller: Intel Corporation Alder Lake-P PCH SPI Controller (rev 01)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (23) I219-LM (rev 01)
02:00.0 Non-Volatile memory controller: KIOXIA Corporation NVMe SSD Controller XG8 (rev 01)
03:00.0 3D controller: NVIDIA Corporation GA107GLM [RTX A500 Laptop GPU] (rev a1)
$ sudo dmesg | grep 0000:03:00.0
[    1.028175] pci 0000:03:00.0: [10de:25bb] type 00 class 0x030200 PCIe Endpoint
[    1.028185] pci 0000:03:00.0: BAR 0 [mem 0xbd000000-0xbdffffff]
[    1.028191] pci 0000:03:00.0: BAR 1 [mem 0x6000000000-0x60ffffffff 64bit pref]
[    1.028198] pci 0000:03:00.0: BAR 3 [mem 0x6100000000-0x6101ffffff 64bit pref]
[    1.028201] pci 0000:03:00.0: BAR 5 [io  0x2000-0x207f]
[    1.028205] pci 0000:03:00.0: ROM [mem 0xfff80000-0xffffffff pref]
[    1.028251] pci 0000:03:00.0: PME# supported from D0 D3hot
[    1.028310] pci 0000:03:00.0: 63.012 Gb/s available PCIe bandwidth, limited by 16.0 GT/s PCIe x4 link at 0000:00:06.2 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
[    1.551004] pci 0000:03:00.0: ROM [mem 0xfff80000-0xffffffff pref]: can't claim; no compatible bridge window
[    1.551442] pci 0000:03:00.0: ROM [mem size 0x00080000 pref]: can't assign; no space
[    1.551443] pci 0000:03:00.0: ROM [mem size 0x00080000 pref]: failed to assign
[    1.553090] pci 0000:03:00.0: Adding to iommu group 14
[    4.916863] nvidia 0000:03:00.0: enabling device (0000 -> 0003)
[    7.821785] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:03:00.0 on minor 0
[    9.298368] bbswitch: Found discrete VGA device 0000:03:00.0: \_SB_.PC00.PEG2.PEGP
[    9.298549] bbswitch: Succesfully loaded. Discrete card 0000:03:00.0 is on

I often see these messages in journals:

libvirtd[1875]: internal error: Unknown PCI header type ‘127’ for device ‘0000:03:00.0’

(udev-worker)[19138]: nvidia: Process ‘/sbin/modprobe -r nvidia-modeset’ failed with exit code 1.

and if selected intel (or on-demand) in prime-select:

kernel: nvidia 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible

when selected nvidia, I don’t see these messages.

kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 509
kernel: 
kernel: nvidia 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
kernel: nvidia 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
kernel: NVRM: The NVIDIA GPU 0000:03:00.0
        NVRM: (PCI ID: 10de:25bb) installed in this system has
        NVRM: fallen off the bus and is not responding to commands.
kernel: nvidia: probe of 0000:03:00.0 failed with error -1
kernel: NVRM: The NVIDIA probe routine failed for 1 device(s).
kernel: NVRM: None of the NVIDIA devices were initialized.
kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 509
systemd[1]: bumblebeed.service: Deactivated successfully.
systemd[1]: Stopped bumblebeed.service - Bumblebee C Daemon.
(udev-worker)[30239]: nvidia: Process '/sbin/modprobe nvidia-modeset' failed with exit code 1.
kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 509

What I tried:

  1. add ibt=off option
     prime-select intel
     prime-select nvidia
     reboot
  1. purge all nvidia packets and install again
  2. install different driver versions: 535, 540, 550, 555, 560(cuda)
  3. ubuntu-drivers autoinstall
  4. apt install --fix-broken
    dkms remove -m nvidia/<version>
    dkms install -m nvidia/<version>
  1. enable and disable Secure Boot in UEFI. And:
sudo mokutil --import /var/lib/shim-signed/mok/MOK.der

For some reason, I can’t upload nvidia-bug-report. Strange…

Could you help me to find solution?

Best regards,

Tried to install NVIDIA-Linux-x86_64-550.107.02.run file from official site. /var/log/nvidia-intaller.log contains this:

ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if another driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA device(s), or no NVIDIA device installed in this system is supported by this NVIDIA Linux graphics driver release.

Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.
-> Kernel module load error: No such device
-> Kernel messages:
                exe="/usr/bin/dbus-daemon" sauid=101 hostname=? addr=? terminal=?'
[   27.421785] audit: type=1107 audit(1724057584.244:373): pid=1520 uid=101 auid=4294967295 ses=4294967295 subj=unconfined msg='apparmor="DENIED" operation="dbus_signal"  bus="system" path="/org/freedesktop/login1" interface="org.freedesktop.login1.Manager" member="UserRemoved" name=":1.9" mask="receive" pid=3040 label="snap.teams-for-linux.teams-for-linux" peer_pid=1550 peer_label="unconfined"
                exe="/usr/bin/dbus-daemon" sauid=101 hostname=? addr=? terminal=?'
[   27.482870] audit: type=1326 audit(1724057584.306:374): auid=1000 uid=1000 gid=1000 ses=4 subj=snap.slack.slack pid=3510 comm="slack" exe="/snap/slack/158/usr/lib/slack/slack" sig=0 arch=c000003e syscall=92 compat=0 ip=0x7cc54c08d3b7 code=0x50000
[   44.413592] audit: type=1326 audit(1724057601.236:375): auid=1000 uid=1000 gid=1000 ses=4 subj=snap.slack.slack pid=3510 comm="slack" exe="/snap/slack/158/usr/lib/slack/slack" sig=0 arch=c000003e syscall=92 compat=0 ip=0x7cc54c08d3b7 code=0x50000
[  223.312588] workqueue: acpi_ec_event_processor hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND
[  465.450777] audit: type=1400 audit(1724058022.356:376): apparmor="DENIED" operation="capable" class="cap" profile="/usr/lib/snapd/snap-confine" pid=6497 comm="snap-confine" capability=12  capname="net_admin"
[  465.450808] audit: type=1400 audit(1724058022.356:377): apparmor="DENIED" operation="capable" class="cap" profile="/usr/lib/snapd/snap-confine" pid=6497 comm="snap-confine" capability=38  capname="perfmon"
[  465.521262] audit: type=1400 audit(1724058022.427:378): apparmor="DENIED" operation="open" class="file" profile="snap-update-ns.firmware-updater" name="/proc/6525/maps" pid=6525 comm="5" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0
[  465.643900] audit: type=1400 audit(1724058022.549:379): apparmor="DENIED" operation="open" class="file" profile="snap.firmware-updater.firmware-notifier" name="/proc/sys/vm/max_map_count" pid=6497 comm="firmware-notifi" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0
[  484.809188] workqueue: pm_runtime_work hogged CPU for >10000us 8 times, consider switching to WQ_UNBOUND
[  533.960926] workqueue: acpi_ec_event_processor hogged CPU for >10000us 8 times, consider switching to WQ_UNBOUND
[  813.213814] audit: type=1326 audit(1724058370.134:380): auid=1000 uid=1000 gid=1000 ses=4 subj=snap.slack.slack pid=3510 comm="slack" exe="/snap/slack/158/usr/lib/slack/slack" sig=0 arch=c000003e syscall=92 compat=0 ip=0x7cc54c08d3b7 code=0x50000
[ 1453.549058] workqueue: acpi_ec_event_processor hogged CPU for >10000us 16 times, consider switching to WQ_UNBOUND
[ 1958.828760] VFIO - User Level meta-driver version: 0.3
[ 1958.946431] nvidia-nvlink: Nvlink Core is being initialized, major device number 507

[ 1959.503987] nvidia 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
[ 1959.504060] NVRM: The NVIDIA GPU 0000:03:00.0
               NVRM: (PCI ID: 10de:25bb) installed in this system has
               NVRM: fallen off the bus and is not responding to commands.
[ 1959.504094] nvidia: probe of 0000:03:00.0 failed with error -1
[ 1959.504126] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 1959.504127] NVRM: None of the NVIDIA devices were initialized.
[ 1959.504655] nvidia-nvlink: Unregistered Nvlink Core, major device number 507
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

/var/log/gpu-manager.log:

log_file: /var/log/gpu-manager.log
last_boot_file: /var/lib/ubuntu-drivers-common/last_gfx_boot
new_boot_file: /var/lib/ubuntu-drivers-common/last_gfx_boot
can't access /run/u-d-c-nvidia-was-loaded file
can't access /opt/amdgpu-pro/bin/amdgpu-pro-px
Looking for nvidia modules in /lib/modules/6.8.0-1011-nvidia/kernel
Looking for nvidia modules in /lib/modules/6.8.0-1011-nvidia/updates/dkms
Looking for amdgpu modules in /lib/modules/6.8.0-1011-nvidia/kernel
Looking for amdgpu modules in /lib/modules/6.8.0-1011-nvidia/updates/dkms
Is nvidia loaded? no
Was nvidia unloaded? no
Is nvidia blacklisted? no
Is intel loaded? yes
Is radeon loaded? no
Is radeon blacklisted? no
Is amdgpu loaded? no
Is amdgpu blacklisted? no
Is amdgpu versioned? no
Is amdgpu pro stack? no
Is nouveau loaded? no
Is nouveau blacklisted? yes
Is nvidia kernel module available? no
Is amdgpu kernel module available? no
Vendor/Device Id: 8086:a7a0
BusID "PCI:0@0:2:0"
Is boot vga? yes
Vendor/Device Id: 10de:25bb
BusID "PCI:3@0:0:0"
can't open /sys/bus/pci/devices/0000:03:00.0/boot_vga
Is boot vga? no
Error: can't access /sys/bus/pci/devices/0000:03:00.0/driver
The device is not bound to any driver.
can't open /sys/bus/pci/devices/0000:03:00.0/boot_vga
Chassis type: "10"
Laptop detected
/etc/u-d-c-nvidia-runtimepm-override found. Will try runtimepm if the kernel supports it.
Linux 6.8 detected.
Is nvidia runtime pm supported for "0x25bb"? yes
Trying to create new file: /run/nvidia_runtimepm_supported
Checking power status in /proc/driver/nvidia/gpus/0000:03:00.0/power
Error while opening /proc/driver/nvidia/gpus/0000:03:00.0/power
Is nvidia runtime pm enabled for "0x25bb"? no
Skipping "/dev/dri/card1", driven by "i915"
Skipping "/dev/dri/card1", driven by "i915"
Skipping "/dev/dri/card1", driven by "i915"
Found "/dev/dri/card1", driven by "i915"
output 0:
	card1-HDMI-A-1
output 1:
	card1-eDP-1
Number of connected outputs for /dev/dri/card1: 2
Does it require offloading? yes
last cards number = 2
Has amd? no
Has intel? yes
Has nvidia? yes
How many cards? 2
Loading nvidia with "no" parameters
Has the system changed? No
can't access /run/u-d-c-nvidia-drm-was-loaded file
can't open /sys/module/nvidia/version
Takes 10000ms to wait for nvidia udev rules completed.
Intel IGP detected
Desktop system detected
or laptop with open drivers
Nothing to do

Finally, I did it. What helped:

Remove all packets with “nvidia” in its name.

This command

sudo apt purge '^nvidia-.*'

is not working. It leaves many packets in the system. For example, packets for 32 bit. Like “libnvidia-decode-555:i386”

I just get list of packets this way:

dpkg -l | grep -i nvidia

and removed them all.

Then

sudo apt autoremove

and install a new driver:

sudo apt install nvidia-driver-550

And now it is working. So If you have ThinkPad P16s Gen2 (Intel i7 + RTX A500) - nvidia driver version 550 works in Ubuntu 24.04.