Steam games freezes / crash when using the RTX 4090

I am on ubuntu 22.10 with the 5.19 kernel and using the 525 Nvidia driver. Every time CSGO crashes and I check the dmesg I see this:

NVRM: GPU at PCI:0000:01:00: GPU-db60072d-5d03-b319-8c6c-ff6fc28b962c
[ 1588.693616] NVRM: Xid (PCI:0000:01:00): 13, pid=‘’, name=, Graphics Exception: Class 0x0 Subchannel 0x0 Mismatch
[ 1588.693622] NVRM: Xid (PCI:0000:01:00): 13, pid=‘’, name=, Graphics Exception: ESR 0x4041b0=0x0
[ 1588.693629] NVRM: Xid (PCI:0000:01:00): 13, pid=‘’, name=, Graphics Exception: ESR 0x404000=0x80000002
[ 1588.694186] NVRM: Xid (PCI:0000:01:00): 13, pid=15806, name=csgo_linux64, Graphics Exception: ChID 0033, Class 0000c997, Offset 00000100, Data deaddead
[ 2402.940321] traps: csgo_linux64[15806] general protection fault ip:7f78cf5ec65e sp:7fff45bda3df error:0 in client_client.so[7f78ce200000+1e10000]

When this happens, it looks like the Monitor disconnects (or the video card stops or goes to sleep) then comes back up and I can see the monitor notification about the resolution it is using showing again (Like the system would have rebooted), then it blinks a black screen and comes back to the game ONLY if I alt-tab and come back to it. Anyone looking at me ingame thinks I am AFK but it is going through this process for about 10 seconds. It looks like a crash, but it actually comes back after several seconds, but it feels like the video card went to sleep (I am not in a laptop just in case).

The only changes I have done are the grub parameters here

GRUB_CMDLINE_LINUX=“pcie_aspm=off mitigations=off split_lock_detect=off intel_idle.max_cstate=1”

Which ended up being a collection of things trying to solve other issues with the previous driver version of Nvidia.

What can be done to try to avoid the problem?
nvidia-bug-report.log.gz (440.9 KB)

Here is the latest one a couple of seconds ago:

[ 1588.693613] NVRM: GPU at PCI:0000:01:00: GPU-db60072d-5d03-b319-8c6c-ff6fc28b962c
[ 1588.693616] NVRM: Xid (PCI:0000:01:00): 13, pid=‘’, name=, Graphics Exception: Class 0x0 Subchannel 0x0 Mismatch
[ 1588.693622] NVRM: Xid (PCI:0000:01:00): 13, pid=‘’, name=, Graphics Exception: ESR 0x4041b0=0x0
[ 1588.693629] NVRM: Xid (PCI:0000:01:00): 13, pid=‘’, name=, Graphics Exception: ESR 0x404000=0x80000002
[ 1588.694186] NVRM: Xid (PCI:0000:01:00): 13, pid=15806, name=csgo_linux64, Graphics Exception: ChID 0033, Class 0000c997, Offset 00000100, Data deaddead
[ 2402.940321] traps: csgo_linux64[15806] general protection fault ip:7f78cf5ec65e sp:7fff45bda3df error:0 in client_client.so[7f78ce200000+1e10000]
[ 3525.507037] traps: Compositor[21663] trap invalid opcode ip:560c3ff9e236 sp:7f9af99f8a58 error:0 in chrome[560c3f9df000+9fc8000]
[ 3549.668229] NVRM: Xid (PCI:0000:01:00): 32, pid=21747, name=csgo_linux64, Channel ID 00000033 intr0 00040000
[ 3549.668919] NVRM: Xid (PCI:0000:01:00): 32, pid=21747, name=csgo_linux64, Channel ID 00000033 intr0 00040000
[ 3584.011681] NVRM: Going over RM unhandled interrupt threshold for irq 236
[ 3584.743438] NVRM: Going over RM unhandled interrupt threshold for irq 236

And here is the report in the original question.

OKay it looks like all games are having this issue after a couple of minutes of playtime. For example cyberpunk was working fine, but after the 525.60 update I get this after about 3 minutes:

[ 4123.266915] NVRM: GPU at PCI:0000:01:00: GPU-db60072d-5d03-b319-8c6c-ff6fc28b962c
[ 4123.266917] NVRM: Xid (PCI:0000:01:00): 109, pid=31275, name=GameThread, Ch 00000036, errorString CTX SWITCH TIMEOUT, Info 0xac01a

I confirm [on RTX 4070 Ti] these errors over and over again:

[ 1229.609217] NVRM: Xid (PCI:0000:26:00): 109, pid=5071, name=GameThread, Ch 0000005e, errorString CTX SWITCH TIMEOUT, Info 0x13c02d

[ 1692.448837] NVRM: Xid (PCI:0000:26:00): 109, pid=16165, name=GameThread, Ch 00000056, errorString CTX SWITCH TIMEOUT, Info 0x13c02a

[ 1697.111028] NVRM: Going over RM unhandled interrupt threshold for irq 73
[ 1697.927829] NVRM: Going over RM unhandled interrupt threshold for irq 73
[ 2338.493966] NVRM: Xid (PCI:0000:26:00): 109, pid=25249, name=GameThread, Ch 00000056, errorString CTX SWITCH TIMEOUT, Info 0x2c02a

[ 2339.765164] NVRM: Going over RM unhandled interrupt threshold for irq 73

Hi ext73. Actually after the last 2 updates since December, Nvidia ended up fixing this. You can see my newest tests here https://youtube.com/@xtremelinux

You will find that not only with the latest kernel of Ubuntu but also with the latest drivers of Nvidia, I stopped getting several Xid issues in general. Last time CSGO crashed for me was in the beginning of January to give you an idea. Before that, I was unable to play even a single round that lasted more than 3 minutes.

I have also included other videos to optimized the video card, make the gaming experience (or rendering experience) better and benchmark the results to help with learning together how we can help each other with this new hardware.

Hi thanks for the info. Unfortunately, I still have these errors on the RTXC 4070 Ti - mainly Cyberpunk 2077…

This is under my optimized kernel builds. Built under Clang/LLVM + Maple+ LRU patches, etc. …

Linux version 6.1.11-ext73-101.11-ryzen-3 (root@ext73-kernel) (Ubuntu clang version 15.0.7, Ubuntu LLD 15.0.7) #8 SMP PREEMPT_DYNAMIC Thu Feb 9 23:21:16 CET

Could be related to your custom kernel build then. The tests I did on the videos were using default Ubuntu from

https://kernel.ubuntu.com/~kernel-ppa/mainline/

And the parameters were mentioned at Linux Gaming | Gaming Performance Impact of Kernel Parameters - YouTube

While the Nvidia driver is either from the Nvidia PPA or, in the particular case that I have right now, from the RUN file from Nvidia.com if that helps.

I don’t think the problem is solved - and it gets worse under the latest linux-firmware and the latest VKD3D

XID 13 and 109