560 release feedback & discussion

That is correct. This bug was introduced with the first 560 beta version, as far as I know.

Hi @emerg.reanimator
Please confirm if system crashes every time, you resume the system, or it happens randomly.
@alyxk
Could you also please confirm the same along with nvidia bug report and system model information.

Hi @traceld
Thanks for reporting issue, I will try to duplicate issue locally and will get back to you if required any additional information.

Can confirm same results here with Steam under X11. Programs fail to open their windows when using 560 from graphics drivers PPA (both the version they packaged on the 24th, as well as the version from the 30th of last month). Apps like vkcube and vkgears continue to work though.

Backleveling to 555 allow the games to run fine again.

I didnā€™t report it because I assumed the package had introduced itā€™s own 32-bit issues, but maybe it warrants further investigation/reporting.

@amrits

Yes, I can confirm that this issue is easily reproducible at multiple machines equipped with different Nvidia GPUs.
nvidia-bug-report.log.gz (1.0 MB)

dmesg output

[ 521.603390] ------------[ cut here ]------------
[ 521.603395] WARNING: CPU: 3 PID: 19425 at include/linux/rwsem.h:80 follow_pte+0x1f0/0x220
[ 521.603411] Modules linked in: xfs rfcomm snd_seq_dummy snd_hrtimer vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(POE) nvidia(POE) nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables qrtr bnep snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel sunrpc soundwire_cadence snd_sof_intel_hda_common snd_sof_intel_hda_mlink snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp binfmt_misc snd_sof snd_sof_utils soundwire_generic_allocation soundwire_bus snd_soc_avs snd_hda_codec_hdmi iwlmvm snd_soc_hda_codec snd_soc_skl snd_soc_hdac_hda snd_hda_ext_core snd_soc_sst_ipc snd_soc_sst_dsp snd_ctl_led snd_soc_acpi_intel_match intel_uncore_frequency snd_soc_acpi intel_uncore_frequency_common snd_hda_codec_realtek mac80211 spi_nor snd_soc_core intel_tcc_cooling snd_hda_codec_generic lis3lv02d_i2c mtd lis3lv02d ee1004 snd_hda_scodec_component
[ 521.603486] mei_wdt iTCO_wdt mei_pxp mei_hdcp snd_compress intel_pmc_bxt ac97_bus x86_pkg_temp_thermal intel_powerclamp snd_pcm_dmaengine iTCO_vendor_support r8153_ecm dell_laptop coretemp cdc_ether intel_rapl_msr libarc4 snd_hda_intel usbnet kvm_intel uvcvideo snd_intel_dspcfg snd_intel_sdw_acpi dell_smm_hwmon snd_usb_audio uvc kvm snd_hda_codec btusb videobuf2_vmalloc snd_usbmidi_lib videobuf2_memops snd_hda_core btrtl videobuf2_v4l2 iwlwifi snd_ump snd_seq btintel videobuf2_common snd_hwdep snd_rawmidi btbcm rapl btmtk snd_seq_device r8152 dell_wmi intel_cstate bluetooth typec_displayport cdc_acm videodev mii snd_pcm cfg80211 intel_uncore mc dell_smbios pcspkr dell_wmi_sysman snd_timer dcdbas processor_thermal_device_pci_legacy firmware_attributes_class dell_wmi_descriptor wmi_bmof intel_wmi_thunderbolt snd processor_thermal_device i2c_i801 vfat spi_intel_pci processor_thermal_wt_hint soundcore fat spi_intel rfkill i2c_smbus processor_thermal_rfim processor_thermal_rapl mei_me intel_rapl_common thunderbolt
[ 521.603562] processor_thermal_wt_req processor_thermal_power_floor mei processor_thermal_mbox idma64 intel_pch_thermal intel_soc_dts_iosf int3403_thermal int340x_thermal_zone intel_pmc_core intel_vsec pmt_telemetry int3400_thermal pmt_class acpi_thermal_rel dell_smo8800 acpi_pad intel_hid sparse_keymap joydev loop nfnetlink zram uas usb_storage i915 i2c_algo_bit drm_buddy nvme crct10dif_pclmul ttm rtsx_pci_sdmmc crc32_pclmul crc32c_intel polyval_clmulni mmc_core drm_display_helper nvme_core polyval_generic video ghash_clmulni_intel mxm_wmi hid_multitouch sha512_ssse3 ucsi_acpi sha256_ssse3 typec_ucsi sha1_ssse3 rtsx_pci nvme_auth cec typec i2c_hid_acpi i2c_hid wmi pinctrl_cannonlake serio_raw ip6_tables ip_tables fuse i2c_dev
[ 521.603620] CPU: 3 PID: 19425 Comm: nvidia-sleep.sh Tainted: P OE 6.10.6-200.fc40.x86_64 #1
[ 521.603623] Hardware name: Dell Inc. XPS 15 7590/0T8KGX, BIOS 1.28.0 04/08/2024
[ 521.603625] RIP: 0010:follow_pte+0x1f0/0x220
[ 521.603631] Code: cc cc cc 48 81 e2 00 00 00 c0 48 09 c2 48 f7 d2 48 85 fa 75 20 e8 a0 f4 ff ff 48 8b 35 59 28 86 01 48 81 e6 00 00 00 c0 eb 89 <0f> 0b 48 3b 1f 0f 83 42 fe ff ff bd ea ff ff ff eb b2 49 8b 3c 24
[ 521.603635] RSP: 0018:ffffa5b890c3b790 EFLAGS: 00010246
[ 521.603638] RAX: 0000000000000000 RBX: 00007febdc025000 RCX: ffffa5b890c3b7d8
[ 521.603639] RDX: ffffa5b890c3b7d0 RSI: 00007febdc025000 RDI: ffff96cd428fa4d0
[ 521.603640] RBP: ffffa5b890c3b818 R08: ffffa5b890c3b970 R09: 0000000000000000
[ 521.603641] R10: 0000000000000002 R11: 0000000000000042 R12: ffffa5b890c3b7d8
[ 521.603643] R13: ffffa5b890c3b7d0 R14: ffff96cccd33f900 R15: 0000000000000000
[ 521.603644] FS: 00007f6587089740(0000) GS:ffff96dbfc180000(0000) knlGS:0000000000000000
[ 521.603646] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 521.603647] CR2: 00007f6587274650 CR3: 0000000373a78006 CR4: 00000000003706f0
[ 521.603649] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 521.603650] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 521.603651] Call Trace:
[ 521.603655]
[ 521.603656] ? follow_pte+0x1f0/0x220
[ 521.603658] ? __warn.cold+0x8e/0xe8
[ 521.603667] ? follow_pte+0x1f0/0x220
[ 521.603678] ? report_bug+0xff/0x140
[ 521.603685] ? handle_bug+0x3c/0x80
[ 521.603689] ? exc_invalid_op+0x17/0x70
[ 521.603691] ? asm_exc_invalid_op+0x1a/0x20
[ 521.603696] ? follow_pte+0x1f0/0x220
[ 521.603698] follow_phys+0x49/0x110
[ 521.603704] untrack_pfn+0x55/0x120
[ 521.603706] unmap_single_vma+0xa6/0xe0
[ 521.603710] zap_page_range_single+0x122/0x1d0
[ 521.603713] unmap_mapping_range+0x116/0x140
[ 521.603717] nv_revoke_gpu_mappings_locked+0x47/0x70 [nvidia]
[ 521.606799] nv_set_system_power_state+0x1cd/0x470 [nvidia]
[ 521.609041] nv_procfs_write_suspend+0x105/0x1b0 [nvidia]
[ 521.610648] proc_reg_write+0x5a/0xa0
[ 521.610661] vfs_write+0xf5/0x460
[ 521.610668] ksys_write+0x6d/0xf0
[ 521.610670] do_syscall_64+0x82/0x160
[ 521.610675] ? do_syscall_64+0x8e/0x160
[ 521.610677] ? avc_has_perm_noaudit+0x6b/0xf0
[ 521.610710] ? _copy_to_user+0x24/0x40
[ 521.610737] ? cp_new_stat+0x131/0x170
[ 521.610744] ? __do_sys_newfstat+0x68/0x70
[ 521.610748] ? syscall_exit_to_user_mode+0x72/0x220
[ 521.610757] ? do_syscall_64+0x8e/0x160
[ 521.610779] ? __count_memcg_events+0x75/0x130
[ 521.610787] ? count_memcg_events.constprop.0+0x1a/0x30
[ 521.610792] ? handle_mm_fault+0x1f0/0x300
[ 521.610802] ? do_user_addr_fault+0x36c/0x620
[ 521.610807] ? clear_bhb_loop+0x25/0x80
[ 521.610814] ? clear_bhb_loop+0x25/0x80
[ 521.610816] ? clear_bhb_loop+0x25/0x80
[ 521.610818] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 521.610843] RIP: 0033:0x7f658719a984
[ 521.610927] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d c5 06 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89
[ 521.610930] RSP: 002b:00007fff50df7a68 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[ 521.610934] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007f658719a984
[ 521.610936] RDX: 0000000000000008 RSI: 00005632d155d1a0 RDI: 0000000000000001
[ 521.610937] RBP: 00007fff50df7a90 R08: 0000000000000410 R09: 0000000000000001
[ 521.610938] R10: 0000000000000004 R11: 0000000000000202 R12: 0000000000000008
[ 521.610940] R13: 00005632d155d1a0 R14: 00007f65872745c0 R15: 00007f6587271f00
[ 521.610945]
[ 521.610946] ā€”[ end trace 0000000000000000 ]ā€”

Iā€™m also experiencing this issue when attempting to run Fallout: Capital Punishment (a modpack for Fallout: New Vegas) using Proton experimental. Same error message and line number. The game usually boots all the way to the main menu but crashes with the error within the first few seconds of user input.

Vanilla Fallout: New Vegas seems to run fine. The game runs fine under Windows.

Hardware Information

  • CPU: AMD Ryzen 5950X
  • GPU: NVIDIA GeForce RTX 3080 12GB (EVGA FTW3)

Software Information

  • OS: Debian testing
  • Kernel: Linux 6.10.6-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.10.6-1 (2024-08-19) x86_64 GNU/Linux
  • Driver version: 560.35.03

I tested on openSUSE Tumbleweed with the 560 drivers from the CUDA repo provided by Nvidia and it works fine. Games seem to work and everything is working that I tested. D4, PoE, Helldivers 2. That is with a 4090.

I also tested on a system with two 3090s in SLI and it does not work and it is also openSUSE Tumbleweed using the CUDA repo to install the 560 driver.

I have 2x 4k monitors (PG42UQ, 27GN950) with Display port, when I try to connect my tv, that is an 75un81006lb.beufljp with my hdmi, All 3 screens goes black. If i put in the hdmi, when computer is on, both monitors freezes. If I have a video in the background, I can still hear the sound from it, but frozen screens.

This is on an RTX 4090 oc edition from Zotac. So hdmi 1.4a as far as i know. Tv has hdmi 2.0. The cable is an Supra High Speed HDMI.

Start to belive this is an driver issue at this point. Dont see reason for all screen either freeze, go black.

Computer posts, i can see everyting until linux boots, then I sit there with 3 black screens if the hdmi is connected.

Running CachyOs, with latest kernel and nvidia driver.

Xorg or Wayland?
I use a hdmi-2.1 TV and 4 4k monitors. Works in Xorg but not Wayland currently.

Intessting. Going to check Xorg. Even it wont be helping me at all, when I use wayland as the dailie driver

Ever since the first 560 Driver I have begun to have intermittent crashes that completely freeze the screen. The system is accessible over SSH at first, but will eventually completely lock up with all fans running at maximum. This is with an NVIDIA 3060Ti, running on OpenSUSE Tumbleweed, Gnome+Wayland.

The following consistently shows up in the kernel logs:

Sep 03 15:56:54 localhost.localdomain kernel: sched: RT throttling activated
Sep 03 15:56:55 localhost.localdomain kernel: NVRM: GPU at PCI:0000:06:00: GPU-ddbd4b65-668c-effe-5b8e-3d22c291c61c
Sep 03 15:56:55 localhost.localdomain kernel: NVRM: Xid (PCI:0000:06:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
Sep 03 15:56:55 localhost.localdomain kernel: NVRM: GPU 0000:06:00.0: GPU has fallen off the bus.
Sep 03 15:56:55 localhost.localdomain kernel: NVRM: _issueRpcAndWait: rpcSendMessage failed with status 0x0000000f for fn 78!
Sep 03 15:56:55 localhost.localdomain kernel: NVRM: nvCheckOkFailedNoLog: Check failed: GPU lost from the bus [NV_ERR_GPU_IS_LOST] (0x0000000F) returned fro>
Sep 03 15:56:55 localhost.localdomain kernel: NVRM: _issueRpcAndWait: rpcSendMessage failed with status 0x0000000f for fn 78!
Sep 03 15:56:55 localhost.localdomain kernel: NVRM: nvCheckOkFailedNoLog: Check failed: GPU lost from the bus [NV_ERR_GPU_IS_LOST] (0x0000000F) returned fro>
Sep 03 15:56:55 localhost.localdomain kernel: NVRM: _issueRpcAndWait: rpcSendMessage failed with status 0x0000000f for fn 78!
Sep 03 15:56:55 localhost.localdomain kernel: NVRM: nvCheckOkFailedNoLog: Check failed: GPU lost from the bus [NV_ERR_GPU_IS_LOST] (0x0000000F) returned fro>
Sep 03 15:56:55 localhost.localdomain kernel: NVRM: RmLogGpuCrash: RmLogGpuCrash: failed to save GPU crash data
Sep 03 15:56:55 localhost.localdomain kernel: NVRM: _kgspLogRpcSanityCheckFailure: GPU0 sanity check failed 0xf waiting for RPC response from GSP. Expected >
Sep 03 15:56:55 localhost.localdomain kernel: NVRM: GPU0 GSP RPC buffer contains function 78 (DUMP_PROTOBUF_COMPONENT) and data 0x0000000000000000 0x0000000>

Additionally, it looks like Iā€™m also experiencing the ā€œGPU Progressā€ issue being spammed in the logs after this error:

Sep 03 16:05:25 localhost.localdomain kernel: nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c67d:0 2:0:4048:4040

Full kernel logs and nvidia-bug-report.sh output attached.
nvidia_crash_logs.txt (143.9 KB)
nvidia-bug-report.log.gz (434.4 KB)

Is HDR enabled? This sounds related to my issue here

Can confirm. I have the latest 560 drivers on my laptop running Fedora KDE 40. I get that exact error trying to launch Wayland native Vulkan applications on my iGPU.

Thereā€™s a regression in the 560 driver. I was not having this issue on the previous driver version, but just to check, Iā€™ll go back to the previous driver version and update this post with my findings.

The logs are attached and it does show the issue, but to briefly describe the issue, a program Iā€™m using crashes now.

nvidia-bug-report.log.gz (274.8 KB)

Thanks @emerg.reanimator
We have similar issue reported internally with bug ID [4675920] and actively working upon it.

Hello,

Iā€™ve been running into an intermittent issue with a dual monitor setup running in Wayland on CachyOS, using DisplayPort on both monitors.
The system, when returning from sleep will either:

A: Only initialize the 1080p monitor and not the 1440p one, which can be corrected by unplugging and replugging the non-detected monitor while in SDDM, or
B: Not initialize either monitor, which forces a hard reset of the system

Additionally, enabling HDR can cause system freezes.

Here are my system specs:

Operating System: CachyOS Linux
KDE Plasma Version: 6.1.4
KDE Frameworks Version: 6.5.0
Qt Version: 6.7.2
Kernel Version: 6.10.8-2-cachyos (64-bit)
Graphics Platform: Wayland
Processors: 16 Ɨ AMD Ryzen 7 7800X3D 8-Core Processor
Memory: 31.0 GiB of RAM
Graphics Processor: NVIDIA GeForce RTX 3080/PCIe/SSE2
Manufacturer: Gigabyte Technology Co., Ltd.
Product Name: B650 GAMING X AX

Attached are the NVIDIA logs immediately after a hard reset.

nvidia-bug-report.log.gz (367.7 KB)

Can you give that repo link, please? I didnā€™t find cuda builds for Tumbleweed

Given my findings, I felt it deserved another follow-up post instead of an edit.

So, after going between different driver versions, Iā€™ve found that these driver versions have this regression: 560.35.03 (latest), 560.31.02 (beta), 560.28.03, and 555.58.02. There could be other drivers with the regression, but these are the only ones I have tested.

This means that my previous assumption that the last version of the driver was fine is wrong. However, the latest production version of the driver, 550.107.02, does not have this issue. This issue eludes me because I couldā€™ve sworn that before, this was fine. The crash in dmesg is being reported as being caused by a segfault in libnvidia-glcore.so, so I donā€™t think this is an issue with the application that is crashing or any of the other programs on my system.

Does anyone have any ideas?

Ever since switching to KDE Wayland with the 555 drivers, Iā€™ve been seeing stuttery fullscreen video playback in Chromium browsers when the Ozone Wayland backend in chrome://flags is enabled. In journalctl I see the following messages get spammed:

Sep 05 13:09:33 brave[154471]: [154471:3:0905/130933.968186:ERROR:client_shared_image.cc(132)] ScopedMapping init failed.
Sep 05 13:09:33 brave[154471]: [154471:3:0905/130933.968211:ERROR:client_shared_image.cc(278)] Unable to create ScopedMapping
Sep 05 13:09:33 brave[154471]: [154471:3:0905/130933.970982:ERROR:client_native_pixmap_dmabuf.cc(45)] Failed to mmap dmabuf: Invalid argument (22)
Sep 05 13:09:33 brave[154471]: [154471:3:0905/130933.971037:ERROR:client_shared_image.cc(146)] Failed to map the buffer.
Sep 05 13:09:33 brave[154471]: [154471:3:0905/130933.971070:ERROR:client_shared_image.cc(132)] ScopedMapping init failed.
Sep 05 13:09:33 brave[154471]: [154471:3:0905/130933.971100:ERROR:client_shared_image.cc(278)] Unable to create ScopedMapping
Sep 05 13:09:33 brave[154471]: [154471:3:0905/130933.972983:ERROR:client_native_pixmap_dmabuf.cc(45)] Failed to mmap dmabuf: Invalid argument (22)
Sep 05 13:09:33 brave[154471]: [154471:3:0905/130933.973021:ERROR:client_shared_image.cc(146)] Failed to map the buffer.
Sep 05 13:09:33 brave[154471]: [154471:3:0905/130933.973052:ERROR:client_shared_image.cc(132)] ScopedMapping init failed.
Sep 05 13:09:33 brave[154471]: [154471:3:0905/130933.973082:ERROR:client_shared_image.cc(278)] Unable to create ScopedMapping
Sep 05 13:09:38 brave[154471]: [154471:3:0905/130938.483222:ERROR:client_native_pixmap_dmabuf.cc(45)] Failed to mmap dmabuf: Invalid argument (22)
Sep 05 13:09:38 brave[154471]: [154471:3:0905/130938.483331:ERROR:client_shared_image.cc(146)] Failed to map the buffer.

This does not happen with my AMD and Intel GPU devices, which leads me to believe itā€™s an Nvidia Wayland issue.

Anyone else seeing this? Iā€™ve been using this Mario Kart 60fps video as a test case but it is very visible when playing any fast paced 60fps video in fullscreen mode: https://www.youtube.com/watch?v=_zPm3SSj6W8

I can now confirm this issue is reproducible consistently on my system (NVIDIA 3060Ti, running on OpenSUSE Tumbleweed, Gnome+Wayland). The length of time sometimes varies, but with the 560.35.03 Driver my system will ALWAYS eventually crash within an hour or so with the same ā€œGPU has fallen off the busā€ error. I am forced to roll back to the 555 driver series.

EDIT: After reading that XID 79 is mostly associated with hardware errors, I re-seated my GPU PCIE and power supply cables. I have had to do this once before, and I thought that the issue was fixedā€¦ somehow, something is working itā€™s way loose over time. I have now gone a couple hours without a crash, ran a stress test, and everything seems fineā€¦