Failing to load Nvidia driver

Got a hybrid laptop (Dell Precision 5570) with both integrated graphics and an Nvidia gpu.
I’m using Arch linux installed from scratch. Intel drivers work, nvidia doesn’t.

One thing I’m noticing is that for the life of me I can’t modprobe the nvidia module. The first attempt to modprobe crashes, and any subsequent attempt just hangs.

$ cat /proc/modules | grep -e nvidia 
nvidia 46981120 1 - Loading 0x0000000000000000 (POE+)

$ lsmod  | grep nvidia
nvidia              45367296  1

$ sudo modprobe nvidia -vv
modprobe: INFO: custom logging function 0x5625b6a76af0 registered
insmod /lib/modules/5.18.12-arch1-1/extramodules/nvidia.ko.xz 
<HANGS HERE>

Looking at messages

$ sudo dmesg | grep -e nvidia -e gpu 
[    2.082063] nvidia: loading out-of-tree module taints kernel.
[    2.082068] nvidia: module license 'NVIDIA' taints kernel.
[    2.091968] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[    2.180020] nvidia-nvlink: Nvlink Core is being initialized, major device number 235
[    2.180047] traps: Missing ENDBR: _nv011433rm+0x0/0x10 [nvidia]
[    2.180481] RIP: 0010:_nv011433rm+0x0/0x10 [nvidia]
[    2.180781]  ? _nv034913rm+0x20/0x20 [nvidia 0ed3e84150ca9bfc7e573a7fd9bcd2632fd3bcda]
[    2.180997]  _nv011431rm+0x24/0xe0 [nvidia 0ed3e84150ca9bfc7e573a7fd9bcd2632fd3bcda]
[    2.181215]  _nv034914rm+0xe/0xa0 [nvidia 0ed3e84150ca9bfc7e573a7fd9bcd2632fd3bcda]
[    2.181422]  _nv034917rm+0x1d/0x30 [nvidia 0ed3e84150ca9bfc7e573a7fd9bcd2632fd3bcda]
[    2.181632]  _nv034919rm+0x2f/0x40 [nvidia 0ed3e84150ca9bfc7e573a7fd9bcd2632fd3bcda]
[    2.181843]  _nv015567rm+0x15/0x70 [nvidia 0ed3e84150ca9bfc7e573a7fd9bcd2632fd3bcda]
[    2.181994]  _nv000642rm+0x9/0x20 [nvidia 0ed3e84150ca9bfc7e573a7fd9bcd2632fd3bcda]
[    2.182152]  rm_init_rm+0x17/0x60 [nvidia 0ed3e84150ca9bfc7e573a7fd9bcd2632fd3bcda]
[    2.182315]  nvidia_init_module+0x242/0x613 [nvidia 0ed3e84150ca9bfc7e573a7fd9bcd2632fd3bcda]
[    2.182470]  ? nvidia_init_module+0x613/0x613 [nvidia 0ed3e84150ca9bfc7e573a7fd9bcd2632fd3bcda]
[    2.182622]  nvidia_frontend_init_module+0x50/0x91 [nvidia 0ed3e84150ca9bfc7e573a7fd9bcd2632fd3bcda]
[    2.182774]  ? nvidia_init_module+0x613/0x613 [nvidia 0ed3e84150ca9bfc7e573a7fd9bcd2632fd3bcda]
[    2.183063] Modules linked in: pcc_cpufreq(-) acpi_cpufreq(-) nvidia(POE+) uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc dm_multipath dm_mod sg crypto_user fuse bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 rtsx_pci_sdmmc serio_raw atkbd mmc_core libps2 vivaldi_fmap nvme crc32c_intel xhci_pci rtsx_pci nvme_core xhci_pci_renesas i8042 serio

Not sure what to make of this…

$ lspci -k | grep -A 2 -E "(VGA|3D)
00:02.0 VGA compatible controller: Intel Corporation Alder Lake-P Integrated Graphics Controller (rev 0c)
	Subsystem: Dell Device 0b1a
	Kernel driver in use: i915
--
01:00.0 3D controller: NVIDIA Corporation GA107GLM [RTX A2000 8GB Laptop GPU] (rev a1)
	Subsystem: Dell Device 0b1a
	Kernel modules: nouveau, nvidia_drm, nvidia
$ pacman -Q | grep -i -e mesa -e nvidia -e "^linux" -e "intel "
lib32-mesa 22.1.3-1
lib32-nvidia-utils 515.57-1
linux 5.18.12.arch1-1
linux-api-headers 5.17.5-2
linux-firmware 20220708.be7798e-1
linux-firmware-whence 20220708.be7798e-1
linux-headers 5.18.12.arch1-1
mesa 22.1.3-1
mesa-utils 8.5.0-2
nvidia 515.57-6
nvidia-utils 515.57-1
vulkan-intel 22.1.3-1
xf86-video-intel 1:2.99.917+916+g31486f40-2

Attaching partial nvidia-bug-report.sh:
nvidia-bug-report.log.gz (81.1 KB)

Please set kernel parameter
ibt=off

I’m facing the same problem with kernel parameter bti=off

Made bug report in --safe-mode because nvidia-smi hangs up
nvidia-bug-report.log.gz (939.3 KB)

Edited to add:

switching from nvidia-installer-dkms to nvidia-open-dkms makes the driver load in combination with bti=off, but still the system is hardly usable: e.g. starting “ardour” and trying to maximize the screen will always make the display go completely black, and the only way I can recover is to power down the laptop and reboot.

ibt, not bti
kernel: NVRM objClInitPcieChipset: *** Chipset Setup Function Error!
Never seen that before

Also, please see this:
https://forums.developer.nvidia.com/t/470-141-03-1-xorg-1-20-13-3-segfault-after-update/224253/9?u=generix

Ok, I’ve cleaned the old build files, and used ibt instead of bti (sorry about that!).

The driver at first does seem to load, but the display going black when making some application windows “too large” remains.

With the nvidia-dkms, dmesg contains an error about DRM only being ready for Data Center GPUs.
(sorry I didn’t copy it)

nvidia-smi doesn’t seem to work at all:

Unable to determine the device handle for GPU 0000:01:00.0: Not Found

With nvidia-open-dkms, nvidia-smi does seem to work, but I still get a completely black screen that requires powerdown/reboot when making a window too large in some programs. (Could it be related to the drm error? I know very little about graphics cards unfortunately…)

this is the log of the-system-that-appears-to-work-until-it-doesnt
nvidia-bug-report.log.gz (312.9 KB)

Ugh… I see the build again contains all these weird errors. I must triple-check…

I think the black screen is a bug in xfce instead.
I switched to KDE, and so far everything seems to work now. Sorry for all the noise!