Graphics session suddenly dies on minimal graphics load. Apparently caused by the failure at allocating nvkms memory ioctl at runtime.
The symptoms: The screen freezes and all the processes relying on the GPU die. Needs a hard reboot to function properly.
It has no specific task/process (at least that I found) that causes the problem.
Been experiencing this exact same issue on my fresh install of CachyOS also. Seems to continually happen after an indeterminate amount of time following boot, and is similarly not based on excess graphics load as nothing more than simple browser or desktop applications are being used at the time of each screen freeze. I similarly cannot continue forward whatsoever without a hard reboot which, at best, seems to restart the clock until the issue simply happens again.
Can you please also share nvidia bug report, would be good to know if you can track reliable repro steps which will help us to reproduce issue locally.
I can confirm that the problem was not happening at the 580 release.
I tried downgrading but the driver could not be loaded at first instance and I did not have time for further debugging.
I’ll be pending if you need more information about the issue, and happy to share details with you.
I went through attached log from original post and it looks like you encountered below errors in a sequence and I suspect xid 79 errors leading to another issues.
Feb 28 23:32:06 archlinux kernel: NVRM: GPU at PCI:0000:01:00: GPU-1dddfcf3-b0aa-f287-068f-78737c6c0009
Feb 28 23:32:06 archlinux kernel: NVRM: Xid (PCI:0000:01:00): 79, GPU has fallen off the bus.
Feb 28 23:32:06 archlinux kernel: NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
Xid (PCI:0000:01:00): 154, GPU recovery action changed from 0x0 (None) to 0x1 (GPU Reset Required)
Failed to allocate NVKMS memory for GEM object on Wayland
Most common reasons for Xid 79 are overheating or lack of power or running older version of SBIOS.
Please update SBIOS, monitor temperatures, reseat power connectors/the card in its slot, check/replace PSU.
Here are a handful of the logs from a similar event that I experienced (though it’s not exactly the same now, so I think I will want to create my own new bug report shortly). I have since done a system update since I previously commented on this.