Summary
I always get a kernel NULL pointer dereference when unloading the nvidia drm
module.
Description
This is a laptop device. configured in an Optimus arrangement and Prime Render Offload with Intel as the main gpu.
When nvidia-drm
is loaded with modesetting, the system will never reach deep enough c-states and will waste from 2-5 watts of power on whatever power mode it is (idle, plugged in, battery, etc…) even if the nvidia driver (is said to be) turned off. Hence, I always load it with no modesetting (i.e., default parameters).
But when I use it as graphics processor, it always shows screen tearing (in fullscreen mode) as if there is no VSYNC and there is no support for the Vulkan extension VK_EXT_external_memory_dma_buf
. So, my plan was to let the laptop run with modesetting off most of the time and when it needs to use NVIDIA as a graphics processor, I would unload and reload it with modesetting. When the application is done, unload and reload it with modesetting off.
Additional information
This being CachyOS, they provide a bit more flavors on their kernel builds, one having built the latest release candidate version of the kernel and either being built by clang or gcc.
The nomenclature of their kernel names like 6.12.0-rc6-1-cachyos-rc-gcc
means it is based on the 6th rc of the 6.12 Linux Kernel
, first package release
, built with cachyos patches
, release candidate version
, built with gcc
.
And, I used the prepackaged dkms module using the linux-cachyos-rc-gcc-nvidia-open
package.
This bug is reproducible even with the clang built versions.
This bug is NOT reproducible on the proprietary variant of the kernel module.
This bug is NOT reproducible on the latest lts kernel version (as of this writing, 6.6.59).
Reproduction Steps
- Use the nvidia open gpu kernel modules. This bug does not occur on the Proprietary version.
- Blacklist
nvidia-drm
to prevent it from being loaded up from boot. - Load
nvidia-drm
with default parameters (i.e., modeset=0 fbdev=0). - Wait for a minute.
- Unload
nvidia-drm
(e.g., modprobe -r nvidia-drm) - Wait for a minute.
- Load
nvidia-drm
with modesetting on (i.e., modeset=1 fbdev=0). - Wait for a minute.
- Unload
nvidia-drm
(e.g., modprobe -r nvidia-drm)
Nvidia Module Version
565.57.01 (NVIDIA Open GPU Kernel Modules)
Does not occur with the Proprietary variant
Other Information
Key | Value |
---|---|
NVIDIA GPU | NVIDIA GeForce RTX 3050 Laptop GPU |
Linux Distro | CachyOS |
Linux Kernel version | 6.12.0-rc6-1-cachyos-rc-gcc |
Architecture | x86_64 |
Hardware | GIGABYTE G5 GD (11th Gen Intel) |
Desktop Environment | KDE Plasma Wayland 6.2.80 |
Kernel Log:
[ 196.807698] nvidia_modeset: module uses symbols nvidia_get_rm_ops from proprietary module nvidia, inheriting taint.
[ 196.815581] nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for x86_64 565.57.01 Release Build (notroot@7db9e82b58f7) Mon Nov 4 15:48:40 UTC 2024
[ 196.828019] nvidia_drm: module uses symbols nvKmsKapiF32ToF16 from proprietary module nvidia_modeset, inheriting taint.
[ 196.830152] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[ 198.737455] [drm] Initialized nvidia-drm 0.0.0 for 0000:01:00.0 on minor 1
[ 198.737782] nvidia 0000:01:00.0: [drm] Cannot find any crtc or sizes
[ 198.738113] Registered the nv-hotplug-helper DRM client.
[ 241.344496] [drm] [nvidia-drm] [GPU ID 0x00000100] Unloading driver
[ 241.747380] nvidia-modeset: Unloading
[ 241.827699] nvidia_modeset: module uses symbols nvidia_get_rm_ops from proprietary module nvidia, inheriting taint.
[ 241.847639] nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for x86_64 565.57.01 Release Build (notroot@7db9e82b58f7) Mon Nov 4 15:48:40 UTC 2024
[ 241.858849] nvidia_drm: module uses symbols nvKmsKapiF32ToF16 from proprietary module nvidia_modeset, inheriting taint.
[ 241.861882] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[ 241.861886] [drm] Initialized nvidia-drm 0.0.0 for 0000:01:00.0 on minor 1
[ 241.861890] Failed to initialize the nv-hotplug-helper DRM client.
[ 268.583029] BUG: kernel NULL pointer dereference, address: 00000000000000a8
[ 268.583035] #PF: supervisor read access in kernel mode
[ 268.583036] #PF: error_code(0x0000) - not-present page
[ 268.583037] PGD 0 P4D 0
[ 268.583039] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 268.583041] CPU: 8 UID: 0 PID: 70806 Comm: modprobe Tainted: P S U OE 6.12.0-rc6-1-cachyos-rc-gcc #1 4192acde4edb66f2f1f68c607ab66f58138591f5
[ 268.583044] Tainted: [P]=PROPRIETARY_MODULE, [S]=CPU_OUT_OF_SPEC, [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[ 268.583045] Hardware name: GIGABYTE G5 GD/G5 GD, BIOS FB10 03/22/2022
[ 268.583046] RIP: 0010:drm_client_dev_unregister+0xd/0xf0
[ 268.583051] Code: dd 4c 00 49 c7 c4 f4 ff ff ff eb 87 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 47 30 <8b> 80 a8 00 00 00 23 47 68 a8 02 75 05 c3 cc cc cc cc 41 57 41 56
[ 268.583052] RSP: 0018:ffffa9b2a9d8bc70 EFLAGS: 00010246
[ 268.583054] RAX: 0000000000000000 RBX: ffff998cee754000 RCX: 0000000000000002
[ 268.583055] RDX: 0000000000000000 RSI: ffffa9b2a9d8bcd0 RDI: ffff998cee754000
[ 268.583056] RBP: ffff998cee754000 R08: 000000000000006d R09: ffffa9b2a9d8bcc8
[ 268.583057] R10: fefefefefefefeff R11: 0000000000000037 R12: 0000000000000800
[ 268.583057] R13: 00000000000000b0 R14: 0000000000000000 R15: 0000000000000000
[ 268.583058] FS: 00007dc0ba996740(0000) GS:ffff998eef600000(0000) knlGS:0000000000000000
[ 268.583059] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 268.583060] CR2: 00000000000000a8 CR3: 0000000298c76002 CR4: 0000000000f72ef0
[ 268.583062] PKRU: 55555554
[ 268.583063] Call Trace:
[ 268.583064] <TASK>
[ 268.583067] ? __die_body.cold+0x8/0x12
[ 268.583069] ? page_fault_oops+0x15a/0x2e0
[ 268.583072] ? exc_page_fault+0x81/0x190
[ 268.583075] ? asm_exc_page_fault+0x26/0x30
[ 268.583079] ? drm_client_dev_unregister+0xd/0xf0
[ 268.583081] drm_dev_unregister+0x21/0x1c0
[ 268.583084] nv_drm_remove_devices+0x2d/0x60 [nvidia_drm 713ad65fe3ef08e6e23794e19a16790721d8c08f]
[ 268.583097] __do_sys_delete_module+0x1d1/0x310
[ 268.583100] do_syscall_64+0x82/0x190
[ 268.583103] ? __x64_sys_openat+0x1f5/0x230
[ 268.583105] ? syscall_exit_to_user_mode+0x10/0x210
[ 268.583107] ? do_syscall_64+0x8e/0x190
[ 268.583109] ? __x64_sys_openat+0x1f5/0x230
[ 268.583110] ? syscall_exit_to_user_mode+0x10/0x210
[ 268.583112] ? do_syscall_64+0x8e/0x190
[ 268.583113] ? syscall_exit_to_user_mode+0x10/0x210
[ 268.583115] ? do_syscall_64+0x8e/0x190
[ 268.583117] ? exc_page_fault+0x81/0x190
[ 268.583118] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 268.583120] RIP: 0033:0x7dc0ba2fe26b
[ 268.583167] Code: 73 01 c3 48 8b 0d bd 4a 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8d 4a 0c 00 f7 d8 64 89 01 48
[ 268.583169] RSP: 002b:00007ffcdd9e1e98 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0
[ 268.583170] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007dc0ba2fe26b
[ 268.583171] RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005ef675df0f38
[ 268.583172] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 268.583173] R10: 00007dc0ba36f900 R11: 0000000000000246 R12: 0000000000000000
[ 268.583174] R13: 00007ffcdd9e1ec0 R14: 00005ef675df0ed0 R15: 0000000000000000
[ 268.583175] </TASK>
[ 268.583176] Modules linked in: nvidia_drm(POE-) nvidia_modeset(POE) uhid ccm blowfish_generic blowfish_x86_64 blowfish_common des_generic des3_ede_x86_64 libdes cast5_avx_x86_64 cast5_generic cast_common lrw camellia_generic camellia_aesni_avx2 camellia_aesni_avx_x86_64 camellia_x86_64 twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common serpent_avx2 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic xts snd_seq_dummy snd_hrtimer snd_seq rfcomm snd_seq_device cmac algif_hash algif_skcipher af_alg bnep vfat fat ext4 mbcache jbd2 pkcs8_key_parser nvidia_uvm(POE) snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel soundwire_cadence snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda_mlink snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_acpi_intel_match soundwire_generic_allocation snd_soc_acpi soundwire_bus snd_soc_avs snd_soc_hda_codec snd_hda_ext_core snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine
[ 268.583211] snd_hda_codec_hdmi intel_rapl_msr intel_rapl_common iwlmvm intel_uncore_frequency snd_hda_codec_realtek intel_uncore_frequency_common snd_hda_codec_generic intel_tcc_cooling snd_hda_scodec_component x86_pkg_temp_thermal intel_powerclamp joydev coretemp mac80211 mousedev kvm_intel libarc4 snd_hda_intel ptp pps_core snd_intel_dspcfg uvcvideo snd_intel_sdw_acpi btusb kvm videobuf2_vmalloc btrtl snd_hda_codec uvc btintel videobuf2_memops iwlwifi hid_multitouch videobuf2_v4l2 snd_hda_core btbcm hid_generic videobuf2_common rapl btmtk snd_hwdep mei_pxp mei_hdcp ee1004 snd_pcm videodev r8169 intel_cstate bluetooth snd_timer cfg80211 realtek i2c_i801 mc intel_lpss_pci intel_pmc_core spi_nor mdio_devres mei_me i2c_smbus snd intel_lpss i2c_hid_acpi intel_hid pmt_telemetry crc16 intel_uncore psmouse pcspkr mtd i2c_mux libphy soundcore intel_vsec mei rfkill idma64 i2c_hid sparse_keymap pmt_class pinctrl_tigerlake acpi_pad mac_hid nvidia(POE) i2c_dev crypto_user loop nfnetlink zram 842_decompress 842_compress
[ 268.583258] lz4hc_compress lz4_compress ip_tables x_tables btrfs blake2b_generic xor raid6_pq xe drm_ttm_helper gpu_sched drm_suballoc_helper drm_gpuvm drm_exec xfs libcrc32c crc32c_generic dm_crypt cbc encrypted_keys trusted asn1_encoder tee dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel serio_raw sha512_ssse3 atkbd sha256_ssse3 libps2 sha1_ssse3 sdhci_pci aesni_intel vivaldi_fmap gf128mul nvme cqhci sdhci crypto_simd nvme_core i8042 spi_intel_pci cryptd mmc_core spi_intel nvme_auth serio i915 i2c_algo_bit drm_buddy video mxm_wmi wmi ttm intel_gtt drm_display_helper cec
[ 268.583288] Unloaded tainted modules: nvidia_modeset(POE):1 nvidia_drm(POE):1 [last unloaded: nvidia_modeset(POE)]
[ 268.583293] CR2: 00000000000000a8
[ 268.583294] ---[ end trace 0000000000000000 ]---
[ 268.583295] RIP: 0010:drm_client_dev_unregister+0xd/0xf0
[ 268.583297] Code: dd 4c 00 49 c7 c4 f4 ff ff ff eb 87 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 47 30 <8b> 80 a8 00 00 00 23 47 68 a8 02 75 05 c3 cc cc cc cc 41 57 41 56
[ 268.583298] RSP: 0018:ffffa9b2a9d8bc70 EFLAGS: 00010246
[ 268.583300] RAX: 0000000000000000 RBX: ffff998cee754000 RCX: 0000000000000002
[ 268.583300] RDX: 0000000000000000 RSI: ffffa9b2a9d8bcd0 RDI: ffff998cee754000
[ 268.583301] RBP: ffff998cee754000 R08: 000000000000006d R09: ffffa9b2a9d8bcc8
[ 268.583302] R10: fefefefefefefeff R11: 0000000000000037 R12: 0000000000000800
[ 268.583303] R13: 00000000000000b0 R14: 0000000000000000 R15: 0000000000000000
[ 268.583304] FS: 00007dc0ba996740(0000) GS:ffff998eef600000(0000) knlGS:0000000000000000
[ 268.583305] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 268.583306] CR2: 00000000000000a8 CR3: 0000000298c76002 CR4: 0000000000f72ef0
[ 268.583307] PKRU: 55555554
[ 268.583307] note: modprobe[70806] exited with irqs disabled
Contents of /etc/modprobe.d/nvidia.conf
:
options nvidia NVreg_EnableGpuFirmware=1
options nvidia NVreg_EnablePCIeGen3=1
options nvidia NVreg_UsePageAttributeTable=1
options nvidia NVreg_InitializeSystemMemoryAllocations=0
options nvidia NVreg_DynamicPowerManagementVideoMemoryThreshold=2097152
options nvidia NVreg_DynamicPowerManagement=2
options nvidia NVreg_EnableS0ixPowerManagement=1
options nvidia NVreg_EnableResizableBar=1
blacklist nvidia_drm
blacklist nvidia_modeset