570 release feedback & discussion

NVidia, please fix it finally, pleeease. Index of /opensuse/tumbleweed/x86_64 . Here missing gl06 library non 32 bit one. Build is broken of the latest driver, one rpm missing, driver unable to install

pageflip errors and fallen off buss errors
nvidia-bug-report.log.gz (1.8 MB)
plasmakwin.txt (17 Bytes)
dmesg.txt (105.4 KB)
journalctl2.txt (126.0 KB)

| NVIDIA-SMI 570.133.07 Driver Version: 570.133.07 CUDA Version: 12.8

i have a conf file with this in there that kinda helps

options nvidia_drm modeset=1
options nvidia_drm fbdev=1
options nvidia NVreg_EnableGpuFirmware=0
options nvidia NVreg_PreserveVideoMemoryAllocations=1

also this in my sdboot-manage.conf

LINUX_OPTIONS=“zswap.enabled=0 nowatchdog splash pcie_aspm=off”

Kinda helps sometimes, read that here it could work Nvidia GPU has fallen off the bus / Kernel & Hardware / Arch Linux Forums but gpu usage falls off in games

2 Likes

NixOS 25.05.
GPU: Nvidia RTX 3090
Driver: Nvidia 570.133.07 Tested both with open + GSP on and proprietary + GSP off
Kernel: Linux 6.13.7
Compositor: KWin 6.3.3.1 (Wayland session)
DP 2560x1440 @ 240Hz (VRR enabled)
HDMI 1920x1080 @ 60Hz (no VRR)

Scenario:
World of Warcraft running via Wine 10.3 at native Wayland (also reproducible on XWayland) on the 1440p (DP) monitor. Firefox with Twitch stream playing on the HDMI monitor. I’m using nvidia-vaapi for accelerated decode of web content.

Issue:
After playing for a while, the second (HDMI) monitor freezes on a single frame, while the audio from the Twitch stream continues to play. When this happens, journalctl is flooded with repeated pageflip timeout errors.

 Nix kernel: NVRM: nvAssertFailedNoLog: Assertion failed: CliGetEventInfo(rpc_params->hClient, rpc_params->hEvent, &pEvent) @ kernel_gsp.c:466
 Nix kernel: NVRM: _kgspProcessRpcEvent: Failed to process received event 0x1003 (POST_EVENT) from GPU0: status=0x57
 Nix kwin_wayland[1662]: kwin_wayland_drm: Pageflip timed out! This is a bug in the nvidia-drm kernel driver

This issue is rare and usually happens after 3+ hours of continuous playing, making it hard to reproduce quickly.
Also this issue following me from 565 driver at least. My last report is here

nvidia-bug-report.log.gz (787.1 KB)

Same error here “GPU falls off the bus”. Using TB3 PCI interface to an eGPU. Driver 565 worked without issue. 570.86 also has the issue - was hoping the point release would solve this. I think it’s a ASPM issue that aims to save power on a laptop and inadventently powers off the PCIe bus, then it loses the GPU and screen freeze.

1 Like

It happened again:

[128219.370798] Huh VM_FAULT_OOM leaked out to the #PF handler. Retrying PF
[128224.371532] pagefault_out_of_memory: 10573962 callbacks suppressed

From then on, random processes freeze and can only be hard-killed. Usually, plasmashell is the first victim of this problem, Chrome often follows (and usually kills itself instead of freezing). The system usually doesn’t reboot then because processes just become stuck on shutdown, requiring a hard reboot.

It’s getting worse and worse with each driver. My other desktops without NVIDIA and identical OS configuration don’t show this problem.

nvidia-smi shows around 500-700 MB more VRAM used than what I can summarize from the app allocations. It looks like the driver just forgets about memory allocations (both in VRAM and sysmem) and messes up the kernel with that.

nvidia-bug-report.log.gz (1,5 MB)

1 Like

No longer happening to me now on 570.133.07, kernel version 6.13.7!

This bug was reintroduced with 570.133.07-1

Any news on what is going on with the drivers? Its really bad for a while now. We are down to 25-40% in performance now compare to windows as an example. Was done a lot of benchmarking on friday, on stream. I knew about it for a while now. But looks like its not been a change there. I was hoping to see some improvements on the new driver, but its pretty much the same performance as before on the last driver. It did not use to be this bad, and we could say we where around 5% behind windows in November i think we where doing the last benchmarks. So pretty much the community was in shock. Ive spoken about it for a while now, but i think it was missunderstood with RT performance.

I tend to notice performance changes early, due I play games on 4k. You can get away with it on 1440p on 4090, but not on 4k. Loosing about 1/4+ of the gpu performance in 4k, can be a make or break for playing the games in linux right now, if I can atchive fps over 60. Lets say some game where you need to even add FG now, its not even playable. Then Im kind forced over to windows to have a blast playing the games that I want to play.

Even the AMD RX9070 perform better then a RTX4090 in many cases of the testing we have done.

2 Likes

HI @Enzi, On NVBug #4640985, Engineering feedback is that this is not a NVIDIA driver bug but an issue in Unity instead. The fix will have to be provided from the Unity team.

Hi @phoenix91140, Thank you for the report. I will check on openSUSE systems and file a bug. Can you please provide the package name, the command and the error message you see on your systems.

Hi @RyzenDew , Monster Hunter Wilds Vertex explosions issue is being tracked on NVBug # 5124110 internally. Unfortunately, we do not have a fix yet.

Hi @matt-schwartz , NVBug #5089016 remains under investigation.
Thanks

2 Likes

Hi @tobias_rudin , do you mind checking if this is also on an issue with 570.133.07. The driver includes a few fixes for system stalls with multiple monitors after resuming from sleep.

Hi all, For the VRAM leaks issue, I will file an internal bug based on @hurikhan77’s report. Please let me know if you see similar error messages in your logs.

The Xwayland excessive VRAM usage when resizing X11 apps has a workaround at Xwayland VRAM usage is still excessive when resizing X11 apps under wayland. · Issue #126 · NVIDIA/egl-wayland

2 Likes

I got my 5090 FE this weekend and it seems like maybe the NV driver isn’t hooked up for full DisplayPort 2.1 bandwidth? My Samsung G95NC, a 7680x2160p@240hz monitor, only shows 7680x2160p@60hz on Linux while it shows the correct resolution and refresh rate on Windows using the same cable.

I filed a report here: RTX 5090 cannot do 7680x2160p@240hz via DP2.1 on Linux · Issue #816 · NVIDIA/open-gpu-kernel-modules · GitHub as I wasn’t sure the best place to file it. Both the latest stable and the latest Vulkan developer releases are affected.

Linux:

Windows:

nvidia-bug-report.log.gz (1.2 MB)

1 Like

Thank you a lot. Sure, there is rpm file missing


File './x86_64/nvidia-gl-G06-570.133.07-33.1.x86_64.rpm' not found on medium 'https://download.nvidia.com/opensuse/tumbleweed'

Seems that file either not built at all or forgotten to include in repository.

for the first issue in this comment, i am still seeing issues with multiple monitors and waking from sleep. A lot of the time my main monitor does not wake, i have to unplug and plug it back in for the display to restore. 570.133.07

Have you tried switching to console with CTRL+Alt+Fn3 or 4 both with monitor on and off? I am still trying to reproduce it but it seems to be working in some circumstances (still a workaround).

just had a wake from sleep that took minutes and crashed everything that was opened, had the following on dmesg:

[ 8331.178538] [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0
[ 8334.186527] [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0

nvidia-bug-report.log.gz (1.0 MB)

About every 1-6 hours my rightmost of three monitors will freeze. Followed shortly by the entire system crashing, or a recovery.

RTX 3080, 570.133.07, CachyOS Kernel

Mar 24 13:29:02 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: Pageflip timed out! This is a bug in the nvidia-drm kernel driver
Mar 24 13:29:02 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: Please report this at https://forums.developer.nvidia.com/c/gpu-graphics/linux
Mar 24 13:29:02 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: With the output of 'sudo dmesg' and 'journalctl --user-unit plasma-kwin_wayland --boot 0'
Mar 24 13:29:03 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: Pageflip timed out! This is a bug in the nvidia-drm kernel driver
Mar 24 13:29:03 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: Please report this at https://forums.developer.nvidia.com/c/gpu-graphics/linux
Mar 24 13:29:03 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: With the output of 'sudo dmesg' and 'journalctl --user-unit plasma-kwin_wayland --boot 0'
Mar 24 13:29:04 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: Pageflip timed out! This is a bug in the nvidia-drm kernel driver
Mar 24 13:29:04 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: Please report this at https://forums.developer.nvidia.com/c/gpu-graphics/linux
Mar 24 13:29:04 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: With the output of 'sudo dmesg' and 'journalctl --user-unit plasma-kwin_wayland --boot 0'
Mar 24 13:29:05 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: Pageflip timed out! This is a bug in the nvidia-drm kernel driver
Mar 24 13:29:05 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: Please report this at https://forums.developer.nvidia.com/c/gpu-graphics/linux
Mar 24 13:29:05 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: With the output of 'sudo dmesg' and 'journalctl --user-unit plasma-kwin_wayland --boot 0'
Mar 24 12:37:27 p-cachyos systemd[1159]: Starting KDE Window Manager...
Mar 24 12:37:27 p-cachyos systemd[1159]: Started KDE Window Manager.
Mar 24 12:37:27 p-cachyos kwin_wayland[1218]: No backend specified, automatically choosing drm
Mar 24 12:37:28 p-cachyos kwin_wayland_wrapper[1307]: The XKEYBOARD keymap compiler (xkbcomp) reports:
Mar 24 12:37:28 p-cachyos kwin_wayland_wrapper[1307]: > Warning:          Could not resolve keysym XF86RefreshRateToggle
Mar 24 12:37:28 p-cachyos kwin_wayland_wrapper[1307]: > Warning:          Could not resolve keysym XF86Accessibility
Mar 24 12:37:28 p-cachyos kwin_wayland_wrapper[1307]: > Warning:          Could not resolve keysym XF86DoNotDisturb
Mar 24 12:37:28 p-cachyos kwin_wayland_wrapper[1307]: Errors from xkbcomp are not fatal to the X server
Mar 24 12:37:28 p-cachyos kwin_wayland_wrapper[1312]: The XKEYBOARD keymap compiler (xkbcomp) reports:
Mar 24 12:37:28 p-cachyos kwin_wayland_wrapper[1312]: > Warning:          Unsupported maximum keycode 708, clipping.
Mar 24 12:37:28 p-cachyos kwin_wayland_wrapper[1312]: >                   X11 cannot support keycodes above 255.
Mar 24 12:37:28 p-cachyos kwin_wayland_wrapper[1312]: > Warning:          Could not resolve keysym XF86RefreshRateToggle
Mar 24 12:37:28 p-cachyos kwin_wayland_wrapper[1312]: > Warning:          Could not resolve keysym XF86Accessibility
Mar 24 12:37:28 p-cachyos kwin_wayland_wrapper[1312]: > Warning:          Could not resolve keysym XF86DoNotDisturb
Mar 24 12:37:28 p-cachyos kwin_wayland_wrapper[1312]: Errors from xkbcomp are not fatal to the X server
Mar 24 12:37:28 p-cachyos kcminit[1314]: Initializing  "/usr/lib/qt6/plugins/plasma/kcms/systemsettings/kcm_fonts.so"
Mar 24 12:37:28 p-cachyos kcminit[1314]: Initializing  "/usr/lib/qt6/plugins/plasma/kcms/systemsettings/kcm_style.so"
Mar 24 12:37:31 p-cachyos kwin_wayland[1218]: kf.config.core: "\"fsrestore1\" - conversion of \"0,0,0,0\" to QRect failed"
Mar 24 12:37:31 p-cachyos kwin_wayland[1218]: kf.windowsystem: static bool KX11Extras::mapViewport() may only be used on X11
Mar 24 12:37:31 p-cachyos kwin_wayland[1218]: kf.windowsystem: static bool KX11Extras::mapViewport() may only be used on X11
Mar 24 12:37:31 p-cachyos kwin_wayland[1218]: kf.windowsystem: static bool KX11Extras::mapViewport() may only be used on X11
Mar 24 12:37:34 p-cachyos kwin_wayland[1218]: kf.windowsystem: static bool KX11Extras::mapViewport() may only be used on X11
Mar 24 12:37:48 p-cachyos kwin_wayland[1218]: kwin_scene_opengl: 0x502: GL_INVALID_OPERATION error generated.
Mar 24 12:37:48 p-cachyos kwin_wayland[1218]: kwin_scene_opengl: 0x502: GL_INVALID_OPERATION error generated.
Mar 24 12:37:48 p-cachyos kwin_wayland[1218]: kwin_scene_opengl: 0x502: GL_INVALID_OPERATION error generated.
Mar 24 12:37:48 p-cachyos kwin_wayland[1218]: kwin_scene_opengl: 0x502: GL_INVALID_OPERATION error generated.
Mar 24 12:37:48 p-cachyos kwin_wayland[1218]: kwin_scene_opengl: 0x502: GL_INVALID_OPERATION error generated.
Mar 24 12:37:48 p-cachyos kwin_wayland[1218]: kwin_scene_opengl: 0x502: GL_INVALID_OPERATION error generated.
Mar 24 12:49:04 p-cachyos kwin_wayland[1218]: kwin_scene_opengl: 0x502: GL_INVALID_OPERATION error generated.
Mar 24 12:49:05 p-cachyos kwin_wayland[1218]: kwin_scene_opengl: 0x502: GL_INVALID_OPERATION error generated.
Mar 24 12:49:05 p-cachyos kwin_wayland[1218]: kwin_scene_opengl: 0x502: GL_INVALID_OPERATION error generated.
Mar 24 12:49:05 p-cachyos kwin_wayland[1218]: kwin_scene_opengl: 0x502: GL_INVALID_OPERATION error generated.
Mar 24 12:49:05 p-cachyos kwin_wayland[1218]: kwin_scene_opengl: 0x502: GL_INVALID_OPERATION error generated.
Mar 24 13:05:23 p-cachyos kwin_wayland[1218]: kwin_core: Cannot grant a token to KWin::ClientConnection(0x61deba3066a0)
Mar 24 13:06:44 p-cachyos kwin_wayland[1218]: kwin_core: Cannot grant a token to KWin::ClientConnection(0x61deba3066a0)
Mar 24 13:22:25 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: Pageflip timed out! This is a bug in the nvidia-drm kernel driver
Mar 24 13:22:25 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: Please report this at https://forums.developer.nvidia.com/c/gpu-graphics/linux
Mar 24 13:22:25 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: With the output of 'sudo dmesg' and 'journalctl --user-unit plasma-kwin_wayland --boot 0'
Mar 24 13:22:26 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: Pageflip timed out! This is a bug in the nvidia-drm kernel driver
Mar 24 13:22:26 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: Please report this at https://forums.developer.nvidia.com/c/gpu-graphics/linux
Mar 24 13:22:26 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: With the output of 'sudo dmesg' and 'journalctl --user-unit plasma-kwin_wayland --boot 0'
Mar 24 13:22:27 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: Pageflip timed out! This is a bug in the nvidia-drm kernel driver
Mar 24 13:22:27 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: Please report this at https://forums.developer.nvidia.com/c/gpu-graphics/linux
Mar 24 13:22:27 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: With the output of 'sudo dmesg' and 'journalctl --user-unit plasma-kwin_wayland --boot 0'
Mar 24 13:22:28 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: Pageflip timed out! This is a bug in the nvidia-drm kernel driver
Mar 24 13:22:28 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: Please report this at https://forums.developer.nvidia.com/c/gpu-graphics/linux
Mar 24 13:22:28 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: With the output of 'sudo dmesg' and 'journalctl --user-unit plasma-kwin_wayland --boot 0'
Mar 24 13:22:29 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: Pageflip timed out! This is a bug in the nvidia-drm kernel driver
Mar 24 13:22:29 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: Please report this at https://forums.developer.nvidia.com/c/gpu-graphics/linux
Mar 24 13:22:29 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: With the output of 'sudo dmesg' and 'journalctl --user-unit plasma-kwin_wayland --boot 0'
Mar 24 13:22:30 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: Pageflip timed out! This is a bug in the nvidia-drm kernel driver
Mar 24 13:22:30 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: Please report this at https://forums.developer.nvidia.com/c/gpu-graphics/linux
Mar 24 13:22:30 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: With the output of 'sudo dmesg' and 'journalctl --user-unit plasma-kwin_wayland --boot 0'
Mar 24 13:22:31 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: Pageflip timed out! This is a bug in the nvidia-drm kernel driver
Mar 24 13:22:31 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: Please report this at https://forums.developer.nvidia.com/c/gpu-graphics/linux
Mar 24 13:22:31 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: With the output of 'sudo dmesg' and 'journalctl --user-unit plasma-kwin_wayland --boot 0'
Mar 24 13:22:32 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: Pageflip timed out! This is a bug in the nvidia-drm kernel driver
Mar 24 13:22:32 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: Please report this at https://forums.developer.nvidia.com/c/gpu-graphics/linux
Mar 24 13:22:32 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: With the output of 'sudo dmesg' and 'journalctl --user-unit plasma-kwin_wayland --boot 0'
Mar 24 13:22:33 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: Pageflip timed out! This is a bug in the nvidia-drm kernel driver
Mar 24 13:22:33 p-cachyos kwin_wayland[1218]: kwin_wayland_drm: Please report this at https://forums.developer.nvidia.com/c/gpu-graphics/linux
1 Like

@abchauhan. Any news about GPU clock issue ([BUG Report] Idle Power Draw is ASTRONOMICAL with RTX 3090 and GPU is stuck to maximun power state at idle when using multiple monitors)

I just tested suspend with 570.133.07, kernel 6.13.8, Fedora 41; still broken:

mar 24 15:56:34 systemd-sleep[46994]: User sessions remain unfrozen on explicit request ($SYSTEMD_SLEEP_FREEZE_USER_SESSIONS=0).
mar 24 15:56:34 systemd-sleep[46994]: This is not recommended, and might result in unexpected behavior, particularly
mar 24 15:56:34 systemd-sleep[46994]: in suspend-then-hibernate operations or setups with encrypted home directories.
mar 24 15:56:34 systemd-sleep[46994]: Performing sleep operation 'suspend'...
mar 24 15:56:34 kernel: PM: suspend entry (deep)
mar 24 15:56:34 kernel: Filesystems sync: 0.110 seconds
mar 24 15:56:55 kernel: Freezing user space processes
mar 24 15:56:55 kernel: Freezing user space processes failed after 20.006 seconds (1 tasks refusing to freeze, wq_busy=0):
mar 24 15:56:55 kernel: task:gnome-shell     state:R  running task     stack:0     pid:16534 tgid:16534 ppid:16398  flags:0x00004006
mar 24 15:56:55 kernel: Call Trace:
mar 24 15:56:55 kernel:  <TASK>
mar 24 15:56:55 kernel:  ? srso_return_thunk+0x5/0x5f
mar 24 15:56:55 kernel:  ? __schedule+0x2b5/0x5f0
mar 24 15:56:55 kernel:  ? os_acquire_spinlock+0x12/0x30 [nvidia]
mar 24 15:56:55 kernel:  ? srso_return_thunk+0x5/0x5f
mar 24 15:56:55 kernel:  ? sysvec_reschedule_ipi+0x28/0xf0
mar 24 15:56:55 kernel:  ? srso_return_thunk+0x5/0x5f
mar 24 15:56:55 kernel:  ? asm_sysvec_reschedule_ipi+0x1a/0x20
mar 24 15:56:55 kernel:  ? srso_return_thunk+0x5/0x5f
mar 24 15:56:55 kernel:  ? sysvec_apic_timer_interrupt+0xe/0x90
mar 24 15:56:55 kernel:  ? srso_return_thunk+0x5/0x5f
mar 24 15:56:55 kernel:  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
mar 24 15:56:55 kernel:  ? srso_return_thunk+0x5/0x5f
mar 24 15:56:55 kernel:  ? srso_return_thunk+0x5/0x5f
mar 24 15:56:55 kernel:  ? _nv029019rm+0x10/0x10 [nvidia]
mar 24 15:56:55 kernel:  ? _nv029557rm+0x7a/0x130 [nvidia]
mar 24 15:56:55 kernel:  ? srso_return_thunk+0x5/0x5f
mar 24 15:56:55 kernel:  ? _nv046880rm+0xd/0x20 [nvidia]
mar 24 15:56:55 kernel:  ? _nv011587rm+0x119/0x280 [nvidia]
mar 24 15:56:55 kernel:  ? _nv056423rm+0x80/0x170 [nvidia]
mar 24 15:56:55 kernel:  ? _nv038479rm+0x8a/0xd0 [nvidia]
mar 24 15:56:55 kernel:  ? _nv038682rm+0x10a/0x360 [nvidia]
mar 24 15:56:55 kernel:  ? _nv032795rm+0xed/0x1f0 [nvidia]
mar 24 15:56:55 kernel:  ? _nv032795rm+0xbd/0x1f0 [nvidia]
mar 24 15:56:55 kernel:  ? _nv032763rm+0x6f0/0x1240 [nvidia]
mar 24 15:56:55 kernel:  ? _nv024664rm+0x13a0/0x1e80 [nvidia]
mar 24 15:56:55 kernel:  ? _nv013141rm+0x161/0x290 [nvidia]
mar 24 15:56:55 kernel:  ? _nv037426rm+0x1e5/0x4a0 [nvidia]
mar 24 15:56:55 kernel:  ? _nv037426rm+0x18f/0x4a0 [nvidia]
mar 24 15:56:55 kernel:  ? _nv041200rm+0xb67/0xf00 [nvidia]
mar 24 15:56:55 kernel:  ? _nv053350rm+0x28d/0x3a0 [nvidia]
mar 24 15:56:55 kernel:  ? _nv051354rm+0xfd/0x160 [nvidia]
mar 24 15:56:55 kernel:  ? _nv051352rm+0x5c/0x90 [nvidia]
mar 24 15:56:55 kernel:  ? _nv051352rm+0x32/0x90 [nvidia]
mar 24 15:56:55 kernel:  ? _nv013403rm+0x67/0xa0 [nvidia]
mar 24 15:56:55 kernel:  ? _nv013403rm+0x28/0xa0 [nvidia]
mar 24 15:56:55 kernel:  ? rm_kernel_rmapi_op+0x92/0x273 [nvidia]
mar 24 15:56:55 kernel:  ? nvkms_call_rm+0x4d/0x80 [nvidia_modeset]
mar 24 15:56:55 kernel:  ? _nv003120kms+0x4c/0x60 [nvidia_modeset]
mar 24 15:56:55 kernel:  ? _nv000585kms+0xb4/0x110 [nvidia_modeset]
mar 24 15:56:55 kernel:  ? _nv000585kms+0x8e/0x110 [nvidia_modeset]
mar 24 15:56:55 kernel:  ? __nv_drm_gem_nvkms_map+0x6f/0xd0 [nvidia_drm]
mar 24 15:56:55 kernel:  ? __nv_drm_gem_nvkms_mmap+0x16/0x40 [nvidia_drm]
mar 24 15:56:55 kernel:  ? nv_drm_mmap+0xdd/0x160 [nvidia_drm]
mar 24 15:56:55 kernel:  ? __mmap_new_vma+0xe7/0x2f0
mar 24 15:56:55 kernel:  ? vma_merge_new_range+0x75/0x190
mar 24 15:56:55 kernel:  ? __mmap_region+0x8f4/0xb30
mar 24 15:56:55 kernel:  ? kfence_protect+0xa5/0xd0
mar 24 15:56:55 kernel:  ? arch_get_unmapped_area_topdown+0x166/0x3e0
mar 24 15:56:55 kernel:  ? mmap_region+0x78/0xa0
mar 24 15:56:55 kernel:  ? do_mmap+0x499/0x690
mar 24 15:56:55 kernel:  ? ima_file_mmap+0x44/0xe0
mar 24 15:56:55 kernel:  ? vm_mmap_pgoff+0xec/0x1c0
mar 24 15:56:55 kernel:  ? ksys_mmap_pgoff+0x14b/0x220
mar 24 15:56:55 kernel:  ? srso_return_thunk+0x5/0x5f
mar 24 15:56:55 kernel:  ? drm_ioctl_kernel+0xb0/0x100
mar 24 15:56:55 kernel:  ? do_syscall_64+0x82/0x160
mar 24 15:56:55 kernel:  ? drm_ioctl+0x2b7/0x530
mar 24 15:56:55 kernel:  ? __pfx_nv_drm_gem_alloc_nvkms_memory_ioctl+0x10/0x10 [nvidia_drm]
mar 24 15:56:55 kernel:  ? srso_return_thunk+0x5/0x5f
mar 24 15:56:55 kernel:  ? rseq_get_rseq_cs+0x1d/0x220
mar 24 15:56:55 kernel:  ? srso_return_thunk+0x5/0x5f
mar 24 15:56:55 kernel:  ? rseq_ip_fixup+0x8d/0x1d0
mar 24 15:56:55 kernel:  ? srso_return_thunk+0x5/0x5f
mar 24 15:56:55 kernel:  ? objects_lookup+0xa1/0xd0
mar 24 15:56:55 kernel:  ? srso_return_thunk+0x5/0x5f
mar 24 15:56:55 kernel:  ? drm_vma_offset_add+0x33/0x70
mar 24 15:56:55 kernel:  ? srso_return_thunk+0x5/0x5f
mar 24 15:56:55 kernel:  ? __nv_drm_gem_map_nvkms_memory_offset+0x1d/0x70 [nvidia_drm]
mar 24 15:56:55 kernel:  ? srso_return_thunk+0x5/0x5f
mar 24 15:56:55 kernel:  ? nv_drm_gem_map_offset_ioctl+0x4c/0xd0 [nvidia_drm]
mar 24 15:56:55 kernel:  ? srso_return_thunk+0x5/0x5f
mar 24 15:56:55 kernel:  ? drm_ioctl_kernel+0xb0/0x100
mar 24 15:56:55 kernel:  ? srso_return_thunk+0x5/0x5f
mar 24 15:56:55 kernel:  ? __check_object_size.part.0+0x35/0xc0
mar 24 15:56:55 kernel:  ? srso_return_thunk+0x5/0x5f
mar 24 15:56:55 kernel:  ? srso_return_thunk+0x5/0x5f
mar 24 15:56:55 kernel:  ? drm_ioctl+0x2b7/0x530
mar 24 15:56:55 kernel:  ? __pfx_nv_drm_gem_map_offset_ioctl+0x10/0x10 [nvidia_drm]
mar 24 15:56:55 kernel:  ? srso_return_thunk+0x5/0x5f
mar 24 15:56:55 kernel:  ? syscall_exit_to_user_mode+0x10/0x210
mar 24 15:56:55 kernel:  ? srso_return_thunk+0x5/0x5f
mar 24 15:56:55 kernel:  ? do_syscall_64+0x8e/0x160
mar 24 15:56:55 kernel:  ? do_user_addr_fault+0x55a/0x7b0
mar 24 15:56:55 kernel:  ? srso_return_thunk+0x5/0x5f
mar 24 15:56:55 kernel:  ? exc_page_fault+0x7e/0x180
mar 24 15:56:55 kernel:  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
mar 24 15:56:55 kernel:  </TASK>
mar 24 15:56:55 kernel: OOM killer enabled.
mar 24 15:56:55 kernel: Restarting tasks ... done.
mar 24 15:56:55 kernel: random: crng reseeded on system resumption
mar 24 15:56:55 rtkit-daemon[1127]: The canary thread is apparently starving. Taking action.
mar 24 15:56:55 kernel: PM: suspend exit

I ran nvidia-bug-report.sh, but for the first time I saw some errors during its processing:

Skipped Component                   | Details
================================================================================
ibstat output                       | ibstat not found 
--------------------------------------------------------------------------------
mst output                          | mst not found 
--------------------------------------------------------------------------------
nvlsm-bug-report.sh output          | nvlsm-bug-report.sh not found 
--------------------------------------------------------------------------------

nvidia-bug-report.log.gz (1.5 MB)

1 Like

Hello? I literally provided an nvidia-smi output showing excessive and unaccounted VRAM usage. Are you intentionally ignoring the issue?

I’ve seen this too but the VRAM usage is still excessive even factoring in this.