RTX 3060 Ti; Driver 550.54.14; Fedora Kinoite 39 6.7.6-200.fc39.x86_64; Kernel oops on boot and no display

I have an RTX 3060 Ti that works fine with the 545.29.06-3.fc39.x86_64 driver. The recent update allows me to upgrade to the 550.54.14 driver. However, when doing so, the system will have no displays available and my three monitors shut off due to no signal. I left the system running for several hours but with no change. Since I use an atomic distribution, I was able to boot back into a working snapshot and look at the boot logs. In the boot logs I find that the Nvidia driver ran into a kernel oops during a page fault. This looks like a driver issue but I am not sure since it works for most other people. I tried enabling SSH and booting into the faulty update, but it looks like the SSH server does not start up for me to run nvidia-bug-report.sh. I’m happy to provide additional information on request if needed.

Boot log

See this pastebin.

I’ve booted into the broken update but removed the kernel arguments that disable noveau. The system boots into SDDM but I switched to a tty and ran nvidia-bug-report.sh. Not sure how useful it is but it hopefully provides some additional context.

nvidia-bug-report.log.gz (107.8 KB)

My laptop started randomly freezing during shutdown after upgrading to 550.54.14.

Environment:

  • Arch Linux
  • Kernel 6.7.6
  • 3080 Ti
Feb 29 21:43:40 Arch kernel: BUG: unable to handle page fault for address: ffff9e6ba9515fe8
Feb 29 21:43:40 Arch kernel: #PF: supervisor write access in kernel mode
Feb 29 21:43:40 Arch kernel: #PF: error_code(0x0003) - permissions violation
Feb 29 21:43:40 Arch kernel: PGD d7e801067 P4D d7e801067 PUD 101e63063 PMD 129415063 PTE 8000000129515121
Feb 29 21:43:40 Arch kernel: Oops: 0003 [#1] PREEMPT SMP NOPTI
Feb 29 21:43:40 Arch kernel: CPU: 0 PID: 10 Comm: kworker/0:1 Tainted: P     U     OE      6.7.6-arch1-1 #1 92d1e939a2710641cdadd5e5b8601f67b3474c0a
Feb 29 21:43:40 Arch kernel: Hardware name: LENOVO 21DECTO1WW/21DECTO1WW, BIOS N3JET37W (1.21 ) 11/07/2023
Feb 29 21:43:40 Arch kernel: Workqueue: kacpi_notify acpi_os_execute_deferred
Feb 29 21:43:40 Arch kernel: RIP: 0010:_nv044009rm+0x10/0x30 [nvidia]
Feb 29 21:43:40 Arch kernel: Code: 00 00 00 00 00 0f 1f 44 00 00 66 0f 1f 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 66 0f 1f 00 48 83 ec 08 48 83 ed 10 48 8d 7d 08 <48> c7 45 08 00 00 00 00 e8 b3 4d 6f ff 48 8b 45 08 48 83 c4 08 48
Feb 29 21:43:40 Arch kernel: RSP: 0018:ffffb30980117d18 EFLAGS: 00010282
Feb 29 21:43:40 Arch kernel: RAX: 0000000000000000 RBX: ffffb30982a6f8e8 RCX: ffff9e7abfa33b68
Feb 29 21:43:40 Arch kernel: RDX: ffff9e6b8217cfc8 RSI: 00000000000000c0 RDI: ffff9e6ba9515fe8
Feb 29 21:43:40 Arch kernel: RBP: ffff9e6ba9515fe0 R08: 6e6d5e686f62606a R09: ffff9e6b801b5e80
Feb 29 21:43:40 Arch kernel: R10: 000000000000000d R11: fefefefefefefeff R12: 0000000000000004
Feb 29 21:43:40 Arch kernel: R13: 0000000000000000 R14: ffffb30982a31008 R15: ffff9e6b9afd8008
Feb 29 21:43:40 Arch kernel: FS:  0000000000000000(0000) GS:ffff9e7abfa00000(0000) knlGS:0000000000000000
Feb 29 21:43:40 Arch kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 29 21:43:40 Arch kernel: CR2: ffff9e6ba9515fe8 CR3: 0000000d7da20000 CR4: 0000000000f50ef0
Feb 29 21:43:40 Arch kernel: PKRU: 55555554
Feb 29 21:43:40 Arch kernel: Call Trace:
Feb 29 21:43:40 Arch kernel:  <TASK>
Feb 29 21:43:40 Arch kernel:  ? __die+0x23/0x70
Feb 29 21:43:40 Arch kernel:  ? page_fault_oops+0x171/0x4e0
Feb 29 21:43:40 Arch kernel:  ? exc_page_fault+0x175/0x180
Feb 29 21:43:40 Arch kernel:  ? asm_exc_page_fault+0x26/0x30
Feb 29 21:43:40 Arch kernel:  ? _nv044009rm+0x10/0x30 [nvidia a7c378ddf345fae4282d34224554cf31cbf56665]
Feb 29 21:43:40 Arch kernel:  _nv014559rm+0x4d/0x90 [nvidia a7c378ddf345fae4282d34224554cf31cbf56665]
Feb 29 21:43:40 Arch kernel:  _nv049696rm+0x18/0x60 [nvidia a7c378ddf345fae4282d34224554cf31cbf56665]
Feb 29 21:43:40 Arch kernel:  _nv026805rm+0x61/0x90 [nvidia a7c378ddf345fae4282d34224554cf31cbf56665]
Feb 29 21:43:40 Arch kernel:  rm_acpi_nvpcf_notify+0x1c/0xe0 [nvidia a7c378ddf345fae4282d34224554cf31cbf56665]
Feb 29 21:43:40 Arch kernel:  ? psi_task_switch+0xd6/0x230
Feb 29 21:43:40 Arch kernel:  ? __switch_to_asm+0x3e/0x70
Feb 29 21:43:40 Arch kernel:  ? finish_task_switch.isra.0+0x94/0x2f0
Feb 29 21:43:40 Arch kernel:  ? __schedule+0x3ef/0x1410
Feb 29 21:43:40 Arch kernel:  acpi_ev_notify_dispatch+0x4b/0x70
Feb 29 21:43:40 Arch kernel:  acpi_os_execute_deferred+0x17/0x30
Feb 29 21:43:40 Arch kernel:  process_one_work+0x178/0x350
Feb 29 21:43:40 Arch kernel:  worker_thread+0x30f/0x450
Feb 29 21:43:40 Arch kernel:  ? __pfx_worker_thread+0x10/0x10
Feb 29 21:43:40 Arch kernel:  kthread+0xe5/0x120
Feb 29 21:43:40 Arch kernel:  ? __pfx_kthread+0x10/0x10
Feb 29 21:43:40 Arch kernel:  ret_from_fork+0x31/0x50
Feb 29 21:43:40 Arch kernel:  ? __pfx_kthread+0x10/0x10
Feb 29 21:43:40 Arch kernel:  ret_from_fork_asm+0x1b/0x30
Feb 29 21:43:40 Arch kernel:  </TASK>
Feb 29 21:43:40 Arch kernel: Modules linked in: typec_displayport rfcomm ccm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device cmac algif_hash algif_skcipher af_alg bnep nls_iso8859_1 snd_ctl_led snd_soc_skl_hda_dsp snd_soc_hdac_hdmi snd_soc_intel_hda_dsp_common snd_sof_probes snd_hda_codec_realtek snd_hda_codec_generic snd_soc_dmic snd_sof_pci_intel_tgl snd_sof_intel_hda_common ext4 soundwire_intel snd_sof_intel_hda_mlink soundwire_cadence crc32c_generic snd_sof_intel_hda mbcache snd_sof_pci jbd2 snd_sof_xtensa_dsp snd_sof snd_sof_utils iwlmvm snd_soc_hdac_hda intel_uncore_frequency snd_hda_ext_core intel_uncore_frequency_common snd_soc_acpi_intel_match intel_tcc_cooling snd_soc_acpi soundwire_generic_allocation mac80211 soundwire_bus x86_pkg_temp_thermal intel_powerclamp snd_soc_core libarc4 btusb coretemp btrtl snd_compress ac97_bus uvcvideo snd_hda_codec_hdmi btintel snd_pcm_dmaengine videobuf2_vmalloc iwlwifi kvm_intel btbcm iTCO_wdt snd_hda_intel uvc intel_pmc_bxt btmtk snd_intel_dspcfg videobuf2_memops thinkpad_acpi
Feb 29 21:43:40 Arch kernel:  iTCO_vendor_support snd_intel_sdw_acpi videobuf2_v4l2 bluetooth cfg80211 kvm igc processor_thermal_device_pci ledtrig_audio snd_hda_codec hid_multitouch videodev i2c_i801 mei_wdt mei_hdcp spi_nor mei_pxp ptp think_lmi pmt_telemetry processor_thermal_device ecdh_generic platform_profile irqbypass intel_rapl_msr pmt_class rapl intel_cstate videobuf2_common processor_thermal_wt_hint intel_uncore psmouse pcspkr firmware_attributes_class mtd wmi_bmof thunderbolt pps_core i2c_smbus snd_hda_core mei_me intel_lpss_pci processor_thermal_rfim mc snd_hwdep ucsi_acpi processor_thermal_rapl intel_lpss mei intel_rapl_common snd_pcm typec_ucsi idma64 crc16 processor_thermal_wt_req snd_timer processor_thermal_power_floor typec processor_thermal_mbox rfkill joydev roles wacom mousedev igen6_edac intel_vsec snd i2c_hid_acpi int3403_thermal soundcore i2c_hid int340x_thermal_zone int3400_thermal intel_pmc_core acpi_tad pinctrl_tigerlake
 acpi_thermal_rel acpi_pad mac_hid vfat fat uinput nfsv3 nfs_acl nfs lockd grace sunrpc
Feb 29 21:43:40 Arch kernel:  fscache netfs ipheth i2c_dev sg fuse crypto_user loop nfnetlink ip_tables x_tables zfs(POE) spl(OE) sr_mod cdrom hid_generic usbhid usb_storage dm_crypt cbc encrypted_keys trusted asn1_encoder tee dm_mod crct10
dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel rtsx_pci_sdmmc serio_raw mmc_core atkbd crypto_simd nvme libps2 cryptd spi_intel_pci vivaldi_
fmap nvme_core spi_intel xhci_pci rtsx_pci nvme_auth xhci_pci_renesas i8042 serio nvidia_drm(POE) nvidia_uvm(POE) nvidia_modeset(POE) nvidia(POE) i915 i2c_algo_bit drm_buddy video wmi ttm intel_gtt drm_display_helper cec
Feb 29 21:43:40 Arch kernel: CR2: ffff9e6ba9515fe8
Feb 29 21:43:40 Arch kernel: ---[ end trace 0000000000000000 ]---
Feb 29 21:43:40 Arch kernel: RIP: 0010:_nv044009rm+0x10/0x30 [nvidia]
Feb 29 21:43:40 Arch kernel: Code: 00 00 00 00 00 0f 1f 44 00 00 66 0f 1f 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 66 0f 1f 00 48 83 ec 08 48 83 ed 10 48 8d 7d 08 <48> c7 45 08 00 00 00 00 e8 b3 4d 6f ff 48 8b 45 08 48 83 c4 08 48
Feb 29 21:43:40 Arch kernel: RSP: 0018:ffffb30980117d18 EFLAGS: 00010282
Feb 29 21:43:40 Arch kernel: RAX: 0000000000000000 RBX: ffffb30982a6f8e8 RCX: ffff9e7abfa33b68
Feb 29 21:43:40 Arch kernel: RDX: ffff9e6b8217cfc8 RSI: 00000000000000c0 RDI: ffff9e6ba9515fe8
Feb 29 21:43:40 Arch kernel: RBP: ffff9e6ba9515fe0 R08: 6e6d5e686f62606a R09: ffff9e6b801b5e80
Feb 29 21:43:40 Arch kernel: R10: 000000000000000d R11: fefefefefefefeff R12: 0000000000000004
Feb 29 21:43:40 Arch kernel: R13: 0000000000000000 R14: ffffb30982a31008 R15: ffff9e6b9afd8008
Feb 29 21:43:40 Arch kernel: FS:  0000000000000000(0000) GS:ffff9e7abfa00000(0000) knlGS:0000000000000000
Feb 29 21:43:40 Arch kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 29 21:43:40 Arch kernel: CR2: ffff9e6ba9515fe8 CR3: 0000000d7da20000 CR4: 0000000000f50ef0
Feb 29 21:43:40 Arch kernel: PKRU: 55555554
Feb 29 21:43:40 Arch kernel: note: kworker/0:1[10] exited with irqs disabled

I tried downgrading to 535.129.03-1.fc39.x86_64 just to make sure it’s not something else that’s causing ANY Nvidia driver upgrade to fail. It works fine. So it’s definitely the 550 driver that’s at fault. Today there’s a kernel upgrade for 6.7.7 and the bug was still present.

I spoke a bit too soon. Games on the 535 driver is PAINFULLY slow. So it’s not great. At least it boots though, so what I said before still stands.

On fedora, please try setting nvidia-drm.fbdev=0 with the 550 driver.

Thanks for the suggestion. Unfortunately it didn’t work :(

I had submitted a bug report to Nvidia via their email linux-bugs@nvidia.com. Turns out it was a bug with my monitor: Acer XZ272.

The issue seemed to be rectified since 550.67. I am currently using 550.76. So this issue is effectively resolved.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.