590 release Failed to allocate NVKMS memory for GEM object on Wayland

Graphics session suddenly dies on minimal graphics load. Apparently caused by the failure at allocating nvkms memory ioctl at runtime.
The symptoms: The screen freezes and all the processes relying on the GPU die. Needs a hard reboot to function properly.

It has no specific task/process (at least that I found) that causes the problem.

journalctl logs at log.txt

log.txt (3.1 KB)

nvidia-bug-report.log.gz (515.9 KB)

1 Like

Been experiencing this exact same issue on my fresh install of CachyOS also. Seems to continually happen after an indeterminate amount of time following boot, and is similarly not based on excess graphics load as nothing more than simple browser or desktop applications are being used at the time of each screen freeze. I similarly cannot continue forward whatsoever without a hard reboot which, at best, seems to restart the clock until the issue simply happens again.

1 Like

Hi @got.mule603 @avgalphauser403

Thank you for reporting issue, please confim if there was no such issue on 580 released drivers.

Hi @got.mule603

Can you please also share nvidia bug report, would be good to know if you can track reliable repro steps which will help us to reproduce issue locally.

I can confirm that the problem was not happening at the 580 release.
I tried downgrading but the driver could not be loaded at first instance and I did not have time for further debugging.

I’ll be pending if you need more information about the issue, and happy to share details with you.

Thank you.

Hi @avgalphauser403

I went through attached log from original post and it looks like you encountered below errors in a sequence and I suspect xid 79 errors leading to another issues.

Feb 28 23:32:06 archlinux kernel: NVRM: GPU at PCI:0000:01:00: GPU-1dddfcf3-b0aa-f287-068f-78737c6c0009

  1. Feb 28 23:32:06 archlinux kernel: NVRM: Xid (PCI:0000:01:00): 79, GPU has fallen off the bus.
    Feb 28 23:32:06 archlinux kernel: NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.

  2. Xid (PCI:0000:01:00): 154, GPU recovery action changed from 0x0 (None) to 0x1 (GPU Reset Required)

  3. Failed to allocate NVKMS memory for GEM object on Wayland

Most common reasons for Xid 79 are overheating or lack of power or running older version of SBIOS.

Please update SBIOS, monitor temperatures, reseat power connectors/the card in its slot, check/replace PSU.

Here are a handful of the logs from a similar event that I experienced (though it’s not exactly the same now, so I think I will want to create my own new bug report shortly). I have since done a system update since I previously commented on this.

❯ journalctl --since “5 minutes ago”.txt (8.6 KB)

sudo dmesg.txt (71.8 KB)

❯ journalctl --user-unit plasma-kwin_way.txt (4.1 KB)

Here is another log file that shows the same issues I am having again.

journalctl --since “4 minutes ago”.txt (61.6 KB)