Intermittent Nvidia failures using Optimus Manager+bbswitch (Manjaro)

Hi,

For the last few months I’ve been suffering from intermittent failures to use Optimus Manager to switch to hybrid/Nvidia mode on my ThinkPad P72.

When I switch, a blank screen comes up (this is normal) followed (when it fails) by the following text:

Jan 12 09:42:57 Putty4-7manjaro kernel: [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice
Jan 12 09:42:57 Putty4-7manjaro kernel: [drm:nv_drm_probe_devices [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to register device

This comes with an ‘RmInitAdapter failed!’ error buried in journalctl.

My understanding is this kind of error is usually caused by a low level driver problem or a hardware failure. Since the GPU continues to work consistently well on Windows, and works fine on Manjaro when it does load properly, my understanding is this probably isn’t a hardware problem.

I suspect due the intermittent nature of the problem on Manjaro, there is some kind of race condition going on but not sure.

This happens on all kernels from 4.19 to 5.15. I am currently using the 5.10 kernel as my home kernel. I’ve attached the bug report file.
nvidia-bug-report.log.gz (138.5 KB)

I believe the problem actually first happens at boot, well before the point the user requests switching. I get the following sort of thing in the logs at those boots where it turns out that attempts to switch fail:

Jan 14 19:44:12 Putty4-7manjaro kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  495.44  Fri Oct 22 06:13:12 UTC 2021
Jan 14 19:44:12 Putty4-7manjaro systemd-udevd[276]: nvidia: Process '/usr/bin/bash -c '/usr/bin/mknod -Z -m 666 /dev/nvidiactl c $(grep n>
Jan 14 19:44:13 Putty4-7manjaro systemd-udevd[309]: nvidia: Process '/usr/bin/bash -c '/usr/bin/mknod -Z -m 666 /dev/nvidiactl c $(grep n>
Jan 14 19:44:13 Putty4-7manjaro systemd-udevd[276]: nvidia: Process '/usr/bin/bash -c 'for i in $(cat /proc/driver/nvidia/gpus/*/informat>
Jan 14 19:44:13 Putty4-7manjaro systemd-udevd[309]: nvidia: Process '/usr/bin/bash -c 'for i in $(cat /proc/driver/nvidia/gpus/*/informat>
Jan 14 19:44:13 Putty4-7manjaro systemd-udevd[276]: nvidia: Process '/bin/mknod -m 666 /dev/nvidiactl c 195 255' failed with exit code 1.
Jan 14 19:44:13 Putty4-7manjaro systemd-udevd[276]: nvidia: Process '/bin/mknod -m 666 /dev/nvidia0   c 195 0' failed with exit code 1.

...

Jan 14 19:44:17 Putty4-7manjaro kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x31:0xffff:2449)
Jan 14 19:44:17 Putty4-7manjaro kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Jan 14 19:44:22 Putty4-7manjaro kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x31:0xffff:2449)
Jan 14 19:44:22 Putty4-7manjaro kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Jan 14 19:44:26 Putty4-7manjaro kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x31:0xffff:2449)
Jan 14 19:44:26 Putty4-7manjaro kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Jan 14 19:44:27 Putty4-7manjaro systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.

....

Jan 14 19:44:31 Putty4-7manjaro kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x31:0xffff:2449)
Jan 14 19:44:31 Putty4-7manjaro kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Jan 14 19:44:32 Putty4-7manjaro lightdm[996]: [12] INFO: # Xorg post-start hook
Jan 14 19:44:32 Putty4-7manjaro lightdm[996]: [26] INFO: Running /etc/optimus-manager/xsetup-integrated.sh
Jan 14 19:44:32 Putty4-7manjaro lightdm[996]: [36] INFO: Writing state {'type': 'done', 'switch_id': '20220114T194406', 'current_mode': '>
Jan 14 19:44:32 Putty4-7manjaro lightdm[996]: [36] INFO: Xorg post-start hook completed successfully.
Jan 14 19:44:17 Putty4-7manjaro kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x31:0xffff:2449)
Jan 14 19:44:17 Putty4-7manjaro kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Jan 14 19:44:22 Putty4-7manjaro kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x31:0xffff:2449)
Jan 14 19:44:22 Putty4-7manjaro kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Jan 14 19:44:26 Putty4-7manjaro kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x31:0xffff:2449)
Jan 14 19:44:26 Putty4-7manjaro kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Jan 14 19:44:27 Putty4-7manjaro systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
Jan 14 19:44:27 Putty4-7manjaro audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher com>
Jan 14 19:44:27 Putty4-7manjaro kernel: audit: type=1131 audit(1642189467.579:60): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=N>
Jan 14 19:44:31 Putty4-7manjaro kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x31:0xffff:2449)
Jan 14 19:44:31 Putty4-7manjaro kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Jan 14 19:44:32 Putty4-7manjaro lightdm[996]: [12] INFO: # Xorg post-start hook
Jan 14 19:44:32 Putty4-7manjaro lightdm[996]: [26] INFO: Running /etc/optimus-manager/xsetup-integrated.sh
Jan 14 19:44:32 Putty4-7manjaro lightdm[996]: [36] INFO: Writing state {'type': 'done', 'switch_id': '20220114T194406', 'current_mode': '>
Jan 14 19:44:32 Putty4-7manjaro lightdm[996]: [36] INFO: Xorg post-start hook completed successfully.

To add more complications, sometimes Optimus Manager loads the integrated profile but when I check the bbswitch status (cat /proc/acpi/bbswitch) it gives me ON, and I know the Nvidia card is draining power. Most often the Nvidia module is not loaded though.

Any help or pointers would be much appreciated :slight_smile: .

In case helpful to others, I solved the problem. TLP - the power saving utility - was somehow interfering and causing intermittent problems with the Nvidia card. I have now disabled TLP.