560 release feedback & discussion

I already posted a screenshot showing XWayland is where the memory leak is, atleast on Gnome. Some code is clearly reserving 10% of VRAM.

1 Like

It was also verified by nvidia staff. At least for this driver.

i still have issue that happens on all nvidia drivers on wayland
once game takes all vram that is 4gb , every app that was opend freezes on last state
closing the game that took all vram, does not unfreeze apps sometimes
gtx 1650

dmesg says

[  851.173882] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[  851.174122] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[  851.187355] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[  851.187417] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[  851.206799] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[  851.206847] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[  851.221745] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[  851.221822] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[  851.235929] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[  851.235981] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[  851.249697] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[  851.249754] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[  851.266059] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[  851.267035] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[  851.288792] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[  851.288872] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[  851.303589] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[  851.303667] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[  851.318999] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[  851.319098] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[  851.335089] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[  851.335147] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object

sudo nvidia-bug-report.sh
nvidia-bug-report.log.gz (464.1 KB)

I tried this version on arch with lts and non-lts, and suspend still isnā€™t working.

Computer will suspend fine, but everytime i try to resume, it restarts without resuming.

I will search this thread more to see if i see any solutionsā€¦but can anyone offer any ideas to fix this??

I did have sucess with an older driver version (535), but even the older versions isnā€™t working completely either. (i have to suspend multiple times, until it finally suspends. but then it resumes ok).

Thank you!

Not sure if this has been mentioned before but thereā€™s a general performance issue with Nvidia and KDE. Thereā€™s a workaround but it needs fixing in the drivers.
On X11 with NVIDIA GPU, window and panel resizing is very laggy when any panels are set to Floating or Adapting transparency

Nvidia, Its time for bugfix release atleast? At this state current drivers are needed to be fixed like vram leakā€¦

You have a workaround. Use that for the processes you want until itā€™s fixed.

Thanks for the advice, and the reports from others. I couldnā€™t get 560.35.03 to build for
Linux 6.12-rc1 (of course), so rolled back and will wait until 6.12 is released before
trying again. While Iā€™m here, Iā€™m going to report an error Iā€™ve been seeing, but doesnā€™t
seem to affect the driverā€™s usability.
Thank you NVIDIA for maintaining this driver for Linux, Iā€™m on 6.11 now and 560.35.03 is
running smooth with one error, appended below. I use X11 and run Elite Dangerous Odyssey
with GE-proton9-7. Using Linux Mint 21.3.

[Sun Sep 29 23:12:18 2024] [drm:nv_drm_master_set [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00004b00] Failed to grab modeset ownership

This doesnā€™t affect the driverā€™s operation, seems to work fine.

Bug report attached.

nvidia-bug-report.log.gz (607.8 KB)

Well, surely give workaround for kernel panics :) thanks, oh yes forgot to mentioned explicit sync fixes (workaround) when using kms

Sorry, I misread. I was on the phone :(

the github link wont load for the workaround on ram, timeline error, does anyone know what the workaround was and can they post it?

Thank you for all the reports and attempts to narrow down the issue. I believe there are actually two separate issues tracked here:

* Excessive memory consumption by Xwayland.

* Excessive memory consumption by Wayland compositors, e.g., kwin_wayland.

Iā€™ve looked into the latter issue, and at this point it is well understood. We do not need additional information or reports of reproductions for that issue. See below for more information.

We have not been able to reproduce the issues with Xwayland/X applications with the latest version of Xwayland and latest drivers. If you are still experiencing that particular issue, please share reproduction steps (ideally starting from a clean boot), the amount of persistent memory usage you are seeing and how you are measuring it, and your system details (Run nvidia-bug-report.sh, attach the log it generates, list your Xwayland and compositor version numbers and ideally distro package versions if youā€™re using distro packages).

For the Wayland compositor memory usage issue, there isnā€™t a leak per-se, but the heuristics that decide which memory to retain for performance reasons arenā€™t working optimally when presented with the OpenGL API usage typical of a Wayland compositor. While we work to develop and deploy a driver fix, I can offer this workaround:

* Download this JSON file: [50-limit-free-buffer-pool-in-wayland-compositors.txt](https://github.com/user-attachments/files/17168731/50-limit-free-buffer-pool-in-wayland-compositors.txt).

* Edit it to replace 'kwin_wayland' with the name of your Wayland compositor if necessary.

* Create the directory '/etc/nvidia/nvidia-application-profiles-rc.d' if it doesn't already exist on your system, and place the file there.

* Restart your compositor (Reboot or log out/log back in).

That should resolve this class of memory usage issues within the named application. You can also duplicate the entire rule in the JSON file if you regularly switch between multiple Wayland compositors, e.g:

    {
        "pattern": {
            "feature": "procname",
            "matches": "kwin_wayland"
        },
        "profile": "Limit Free Buffer Pool On Wayland Compositors"
    },
    {
        "pattern": {
            "feature": "procname",
            "matches": "gnome-shell"
        },
        "profile": "Limit Free Buffer Pool On Wayland Compositors"
    }
1 Like

Does anyone have sleep / resume issues with the 560 drivers?

Specs:

  • Arch xfce
  • Ryzen
  • Geforce 1070
  • Up to date updates. Tried LTS kernel and non-LTS.

Heres my nvidia kernel params.

cat /proc/driver/nvidia/params:

cat /proc/driver/nvidia/params 
ResmanDebugLevel: 4294967295
RmLogonRC: 1
ModifyDeviceFiles: 1
DeviceFileUID: 0
DeviceFileGID: 0
DeviceFileMode: 438
InitializeSystemMemoryAllocations: 1
UsePageAttributeTable: 4294967295
EnableMSI: 1
EnablePCIeGen3: 0
MemoryPoolSize: 0
KMallocHeapMaxSize: 0
VMallocHeapMaxSize: 0
IgnoreMMIOCheck: 0
TCEBypassMode: 0
EnableStreamMemOPs: 0
EnableUserNUMAManagement: 1
NvLinkDisable: 0
RmProfilingAdminOnly: 1
PreserveVideoMemoryAllocations: 1
EnableS0ixPowerManagement: 1
S0ixPowerManagementVideoMemoryThreshold: 1024
DynamicPowerManagement: 3
DynamicPowerManagementVideoMemoryThreshold: 200
RegisterPCIDriver: 1
EnablePCIERelaxedOrderingMode: 0
EnableResizableBar: 0
EnableGpuFirmware: 0
EnableGpuFirmwareLogs: 2
EnableDbgBreakpoint: 0
OpenRmEnableUnsupportedGpus: 0
DmaRemapPeerMmio: 1
RegistryDwords: ""
RegistryDwordsPerDevice: ""
RmMsg: ""
GpuBlacklist: ""
TemporaryFilePath: "/nvme/tmp"
ExcludedGpus: ""

and cat /proc/cmdline:

cat /proc/cmdline 
BOOT_IMAGE=/boot/vmlinuz-linux-lts root=UUID=a86b02e9-f9b6-4050-86ee-533ed8b3b2cf rw crashkernel=256M resume=UUID=ab965bb8-72f1-4d4a-964a-5cd84ebb515e transparent_hugepage=never amd_iommu=on iommu=pt rd.driver.pre=vfio-pci video=efifb:off nvidia_drm.modeset=1 nvidia_drm.fbdev=1 nvidia.NVreg_EnableGpuFirmware=0

On the 560 drivers:

  • Behavior is intermittent.
  • Sometimes it will suspend, sometimes it wonā€™t
  • If it does suspend, when I try to resume, it freezes with black screen and reboots after some seconds
  • Whether it suspends or not, it freezes and reboots
  • Doesnā€™t work at all.
  • 100% failure rate

Iā€™m still stuck on the 535 drivers to use suspend / resume!
On the 535 drivers:

  • Doesnā€™t freeze or reboot any
  • It can one try or multiple tries for suspend to work
  • If suspend doesnā€™t work, it brings me back to desktop. I have to suspend until it works
  • Resume works fine 100% of the time

Does anyone have any ideas on how I can troubleshoot this or see any parameters / boot options that might help?

Iā€™ve tried several different kernel params / nvidia options, and nothing is working with the 560 drivers.

I also tried 555 once, and it had similar behavior as the 560 drivers.

Thank you!

2 Likes

GPU fallen off the bus while idleā€¦ strikes again, and Iā€™m affected too unfortunately :/ I have attached the standard nvidia-bug-report log and the output of journalctl -x -b -1 | grep --after-context=10 --before-context=10 NVRM as a txt file for context.

Relevant snippet of journalctl from that txt file so that others may find this post more easily:

Oct 01 16:30:46 victus-ted anacron[25586]: Anacron 2.3 started on 2024-10-01
Oct 01 16:30:46 victus-ted anacron[25586]: Normal exit (0 jobs run)
Oct 01 16:30:46 victus-ted systemd[1]: anacron.service: Deactivated successfully.
ā–‘ā–‘ Subject: Unit succeeded
ā–‘ā–‘ Defined-By: systemd
ā–‘ā–‘ Support: http://www.ubuntu.com/support
ā–‘ā–‘ 
ā–‘ā–‘ The unit anacron.service has successfully entered the 'dead' state.
Oct 01 16:55:25 victus-ted kernel: workqueue: pm_runtime_work hogged CPU for >10000us 32 times, consider switching to WQ_UNBOUND
Oct 01 16:55:28 victus-ted wpa_supplicant[729]: wlo1: WPA: Group rekeying completed with fc:34:97:50:5b:c4 [GTK=CCMP]
Oct 01 16:57:59 victus-ted kernel: NVRM: GPU at PCI:0000:01:00: GPU-5f88870d-224e-6331-b9b8-c9f2aca2f673
Oct 01 16:57:59 victus-ted kernel: NVRM: Xid (PCI:0000:01:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
Oct 01 16:57:59 victus-ted kernel: NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
Oct 01 16:57:59 victus-ted kernel: NVRM: _issueRpcAndWait: rpcSendMessage failed with status 0x0000000f for fn 78!
Oct 01 16:57:59 victus-ted kernel: NVRM: nvCheckOkFailedNoLog: Check failed: GPU lost from the bus [NV_ERR_GPU_IS_LOST] (0x0000000F) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274
Oct 01 16:57:59 victus-ted kernel: NVRM: _issueRpcAndWait: rpcSendMessage failed with status 0x0000000f for fn 78!
Oct 01 16:57:59 victus-ted kernel: NVRM: nvCheckOkFailedNoLog: Check failed: GPU lost from the bus [NV_ERR_GPU_IS_LOST] (0x0000000F) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274
Oct 01 16:57:59 victus-ted kernel: NVRM: _issueRpcAndWait: rpcSendMessage failed with status 0x0000000f for fn 78!
Oct 01 16:57:59 victus-ted kernel: NVRM: nvCheckOkFailedNoLog: Check failed: GPU lost from the bus [NV_ERR_GPU_IS_LOST] (0x0000000F) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274
Oct 01 16:57:59 victus-ted kernel: NVRM: RmLogGpuCrash: RmLogGpuCrash: failed to save GPU crash data
Oct 01 16:57:59 victus-ted kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from pKernelGsp->pSRMetaDescriptor != NULL @ kernel_gsp_tu102.c:1119
Oct 01 16:57:59 victus-ted kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from kgspRestorePowerMgmtState_HAL(pGpu, pKernelGsp) @ gpu_suspend.c:195
Oct 01 16:57:59 victus-ted kernel: NVRM: _gpuGc6ExitStateLoad: GPU is unable to transition from GC6 to D0 state.
Oct 01 16:58:04 victus-ted kernel: NVRM: Error in service of callback 

Output of fastfetch --logo none:

tedliosu@victus-ted
-------------------
OS: Ubuntu jammy 22.04 x86_64
Host: Victus by HP Laptop 16-d0xxx
Kernel: Linux 6.8.0-40-generic
Uptime: 1 hour, 17 mins
Packages: 4092 (dpkg), 5 (flatpak)
Shell: bash 5.1.16
Display (CMN1606): 1920x1080 @ 60 Hz in 16ā€³ [Built-in] *
Display (ASUS VP247): 1920x1080 @ 60 Hz in 24ā€³ [External]
DE: Xfce4 4.16
WM: Xfwm4 (X11)
WM Theme: Greybird-dark-accessibility
Theme: Greybird-dark [GTK2/3/4]
Icons: elementary-xfce-darker [GTK2/3/4]
Font: Sans (12pt) [GTK2/3/4]
Cursor: DMZ-White
Terminal: xfce4-terminal 0.8.10
Terminal Font: Noto Mono (12pt)
CPU: 11th Gen Intel(R) Core(TM) i5-11400H (12) @ 4.50 GHz
GPU 1: NVIDIA GeForce RTX 3050 Mobile [Discrete]
GPU 2: Intel UHD Graphics @ 1.45 GHz [Integrated]
Memory: 3.75 GiB / 30.99 GiB (12%)
Swap: 0 B / 32.00 GiB (0%)
Disk (/): 400.67 GiB / 883.33 GiB (45%) - ext4
Local IP (wlo1): 192.168.50.155/24
Battery (Primary): 100% [AC Connected]
Locale: en_US.UTF-8

nvidia-bug-report.log.gz (1.1 MB)
journalctl_crash_output_fallen_off_bus_10-1-24.txt (37.3 KB)

Iā€™m having the same issues with suspend, Iā€™ve tried a lot and canā€™t find an option that just works.

Using LTS kernel, it seems to get me back into linux but some programs crash in the background.
Using rolling 6.11, I canā€™t resume at all.
Iā€™ve tried nvidia-dkms, nvidia-open-dkms, setting acpi_osi to multiple different windows versions, Changing between s3 and s2idle suspend, changing bios settings to wake using bios/os. These made no difference.

Using fbdev=1, and PreserveMemoryAllocations=1 as well as without that it also caused programs and corruption on resume.

There are no logs in the kernel. It causes a whole system freeze when it doesnā€™t wake on 6.11 and have to hard restart.
Disabling suspend until this is fixed.

For reference using 4080 on arch linux kernel. Running hyprland (wayland.)

In your ā€˜/etc/mkinitcpio.confā€™ do you have anything related to nvidia?

Hi,

today i tryied running ubisoft connect.
2060s, Driver: 560.35.03
i got:

NVRM: Xid (PCI:0000:01:00): 31, pid='<unknown>', name=<unknown>, Ch 00000025, intr 00000000. MMU Fault: ENGINE GRAPHICS GPC1 GPCCLIENT_PROP_0 faulted @ 0x0_00000000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_WRITE

Will try with 555 driver if it is by upgrading to 560.
edit: also happens on 555.58.02

Running openSUSE Tumbleweed 6.11 kernel and right after updating the kennel I couldnā€™t get into Wayland without adding nvidia_drm.modeset=1 nvidia_drm.fbdev=1to grub kernel parameters. The problem is that my in game performance is worse now than with 6.10 kernels. Idk what kind of concrete info might help with this but Iā€™ll try to update with logs and benchmarks when Iā€™m at home. Just wondering if anyone else has encountered anything similar.