Multiple CUDA/RTX/Vulkan application crashing with Xid (13,109) errors

530.41.03 metro exodus is still crashing. C’mon…

“Current releases in branch 525 and 530 do have the fix incorporated, hence issue is still observed.”

Do you mean it is incorporated and not working, or that it isn’t incorporated yet?
Just asking for clarification.

honestly I think he meant “Current releases in branch 525 and 530 do NOT have the fix incorporated, hence issue is still observed.”

2 Likes

My bad, I meant to say that Current releases in branch 525 and 530 do NOT have the fix incorporated, hence issue is still observed.

1 Like

I’ve tried out 520.56.06 with DXVK and CoD MW2 Remastered and the game is still crashing with Xid109 ‘CTX SWITCH TIMEOUT’.
Either the fix isn’t working or there are more than one issue that is causing this.

This appears for me, to be the same crash impacting the same games as the bug detailed here:-

and a second thread created when the first was closed without a fix (but I can only post one link)… search “fault-pde-access-type-read-bug-still-not-fixed/199123”

Only the dmesg output has changed slightly from earlier drivers.

I am in the process of switching to AMD as this has not been fixed since 2019.

Perhaps our the Nvidia representative here would be kind enough to tell us what became of the internal bug number assigned to these issues, (bug number 2432712) - mentioned in the previous posts.

1 Like

This is the only mention of Xid 109 that I can find.

I am running a cuda application in the background that uses ~ 10-20% of my gpu and 2GB / 16GB of the rtx 4080. My entire computer will freeze for 60+ seconds. My computer is a amd 5950x, 128gb ram, 980 pro 2TB nvme, and a rtx 4080.

Arch linux, latest driver, previous driver also caused this. Some of the older drivers in ubuntu dont do this.

Apr 02 01:07:51 arch5950x kernel: NVRM: Xid (PCI:0000:0b:00): 109, pid=215308, name=Renderer, Ch 00000010, errorString CTX SWITCH TIMEOUT, Info 0x3c006

desktop enviroment xfce
nvidia-bug-report.log.gz (340.6 KB)

nvidia-bug-report.log.gz (341.8 KB)

In the resident evil remake
Seeing the error
“Xid (PCI:0000:2d:00): 109, pid=550752, name=re4.exe, Ch 00000046, errorString CTX SWITCH TIMEOUT, Info 0x2c02e” followed by soft crash ( can alt-tab and kill process )

On Driver version 530.41.03

1 Like

I am also seeing the error pop up randomly when playing through The Last of Us Part 1:

kernel: NVRM: Xid (PCI:0000:0c:00): 109, pid=13929, name=, Ch 000000c6, errorString CTX SWITCH TIMEOUT, Info 0x1c067

The error is particularly prevalent when using the flashlight in the game. I’ve had numerous crashes in the “Alone and Forsaken” and “Hotel Lobby” missions. The crash is usually preceded by a screen of technicolored barf:

Specs:
Arch Linux
Kernel 6.2.9
Ryzen 3900x
Nvidia 3090 with 530.41.03 drivers

@amrits Do we have a timeline on when this fix will be incorporated into the main driver releases? It’s really causing a lot of poor experiences for a lot of people.

I can also reproduce this issue almost 100% of the time when launching Cyberpunk 2077 with raytracing enabled. I used the following launch options in Steam:

VKD3D_CONFIG=dxr11 PROTON_ENABLE_NVAPI=1 %command%

Same logs:

kernel: NVRM: Xid (PCI:0000:0c:00): 109, pid=14152, name=GameThread, Ch 0000005e, errorString CTX SWITCH TIMEOUT, Info 0x43c065

It’s really unacceptable that this bug has lasted as long as it has.

2 Likes

Same here, really frustrating

[100069.955912] NVRM: Xid (PCI:0000:01:00): 109, pid=98691, name=eurotrucks2.exe, Ch 00000098, errorString CTX SWITCH TIMEOUT, Info 0x3c098

Running ETS2 via Steam Proton Experimental.

Specs:
Arch Linux(Linux archlinux-pc 6.2.10-arch1-1 #1 SMP PREEMPT_DYNAMIC Fri, 07 Apr 2023 02:10:43 +0000 x86_64 GNU/Linux
)
RTX2070 SUPER
Driver version: nvidia 530.41.03-3
Kernel boot parameters(just in case):

loglevel=5 cryptdevice=*my device UUID*:root root=/dev/mapper/root resume=/dev/mapper/swap lsm=landlock,lockdown,yama,integrity,apparmor,bpf

This bug basically renders the GPU unusable for gaming

1 Like

Recently bought an Asus ROG SCAR 17 (2023) laptop and am having Xid 109 errors occur randomly, with no discernable correlation to any particular app (other than the DE). i.e., in my case there is no way to repro, sorry. Problem has occurred both with and without an external monitor connected via HDMI. Happens roughly once or twice a day.

The behaviour is always roughly the same: the mouse pointer freezes, then unfreezes a few seconds later, then freezes again, permanently. Keyboard input doesn’t work, can’t Ctrl+Alt+F2 or anything like that. Have to hard reset.

After reboot in /var/log/syslog is always something like this:

Apr 14 18:36:00 blksqr kernel: [17972.441770] NVRM: GPU at PCI:0000:01:00: GPU-c7dfb3fc-68aa-0023-a035-7a7f4aa2be1d
Apr 14 18:36:00 blksqr kernel: [17972.441773] NVRM: Xid (PCI:0000:01:00): 109, pid=1687, name=Xorg, Ch 00000008, errorString CTX SWITCH TIMEOUT, Info 0x3400e
Apr 14 18:36:00 blksqr kernel: [17972.441773]
Apr 14 18:36:00 blksqr kernel: [17972.886892] sched: RT throttling activated
Apr 14 18:36:06 blksqr kernel: [17978.483736] NVRM: Xid (PCI:0000:01:00): 109, pid=374, name=systemd-udevd, Ch 00000001, errorString CTX SWITCH TIMEOUT, Info 0x1400e
Apr 14 18:36:06 blksqr kernel: [17978.483736]
Apr 14 18:36:08 blksqr kernel: [17980.484395] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
Apr 14 18:36:14 blksqr kernel: [17986.007905] NVRM: Xid (PCI:0000:01:00): 109, pid=2328, name=plasmashell, Ch 0000001e, errorString CTX SWITCH TIMEOUT, Info 0x1d400e

Kernel version: 5.17.15 (this is the latest version I can even get to boot, although I think that is more to do with it being a new laptop than anything related to nVidia)

nVidia driver version: 530.30.02.

Bug report attached.

nvidia-bug-report.log.gz (439.0 KB)

Looks like I have the same issue with RE4Remake (game). Constantly crashes and oftentimes textures are not loading, they either completely black or just invisible.
“kernel: NVRM: Xid (PCI:0000:06:00): 109, pid=218580, name=re4.exe, Ch 00000026, errorString CTX SWITCH TIMEOUT, Info 0x4c01d”
specs:
Archlinux 6.2.10-arch1-1, Ryzen9 5950x, RTX3060

Can anyone confirm that devs are working on this issue? This thread is half a year old now and this issue makes programs completely unusable.

nvidia-bug-report.log.gz (352.7 KB)

update: 530.41.03 still crashes

The issue still occurs after installing nvidia-open but less frequently

update: nope, still crashes, the GPU is completely unusable

found solution(or rather a workaround): add to kernel boot options nvidia_drm.modeset=1
and add to mkinitcpio.conf, where MODULES=(nvidia nvidia_modeset nvidia_uvm nvidia_drm) and re-run mkinitcpio.

Still crashes. It is 100% a driver bug, nothing helps

Also hit by this (XiD 109); nothing I’ve done seems to reliably work. Underutilizing my RTX 3070 in terms of compute and memory seems to allow it to run ‘longer’, possibly indefinitely, but what’s the point of that? Some communication as to the status of a potential fix, even if just investigating, would be appreciated, given how frustrating the random crashes are.

I’ve tested on Nobara & NixOS with both 525 and 530.

@aujadeva

Are your Xid 109 crashes always triggered by running a certain game, as is the case with most of the posters on this thread, or, like me, is it totally random?

I can basically ALWAYS trigger it with Forza Horizon 5, as that’s a rather demanding game. I can trigger it less frequently with other games, which made me think that lowering the compute/memory demands of it might help trigger it less frequently. Lowering the settings with FH5 does indeed cause it to trigger less frequently.

I can’t decide whether it’s more memory or more compute or if it’s possibly related to utilization of the RTX cores: for instance, I moved this computer into the living room to use hooked up to the TV (and to spread the heat making devices in my apartment around), which meant I was only going to be pushing 1080p so the demand for compute definitely dropped. I can still reliably trigger 109 despite the lowered resolution with high settings. With lower settings, it doesn’t trigger as much although I’m still using 7 out of 8 GB of VRAM. Turning off ray tracing reduces the frequency of crashes, it seems, but I haven’t gone through a reliable test of enabling/disabling each feature, especially as the test matrix includes a lot of feature flag enabling/disabling of Proton/VKD3D flags, swapping Proton versions, and testing various in game settings and what I’m usually trying to do is just sit down and relax.

I’m not sure what the specific compute pathway is for ray tracing or other uses of the RTX cores on the Linux/Proton/VKD3D stack; for the record, I’m using FSR with FH5 because when I stream I want the RTX cores to be handling coding/decoding of the video stream, not upsampling with DLSS.

Cyberpunk with it’s new insane ray tracing options might be a good one to test as well; it’s faster to get in and out of game compared to FH5.

Sometimes re4remake can work for like 5 hours without problems. And sometimes you get 3 crashes in a row in the same place in game
Apr 19 17:31:45 unit01 kernel: NVRM: Xid (PCI:0000:06:00): 109, pid=206036, name=re4.exe, Ch 0000002e, errorString CTX SWITCH TIMEOUT, Info 0x4c020
Apr 19 17:48:16 unit01 kernel: NVRM: Xid (PCI:0000:06:00): 109, pid=219596, name=re4.exe, Ch 0000002e, errorString CTX SWITCH TIMEOUT, Info 0x4c020
Apr 19 22:04:27 unit01 kernel: NVRM: Xid (PCI:0000:06:00): 109, pid=361603, name=re4.exe, Ch 0000002e, errorString CTX SWITCH TIMEOUT, Info 0x21c01d
[24601.847147] NVRM: Xid (PCI:0000:06:00): 109, pid=206036, name=re4.exe, Ch 0000002e, errorString CTX SWITCH TIMEOUT, Info 0x4c020
[25592.776113] NVRM: Xid (PCI:0000:06:00): 109, pid=219596, name=re4.exe, Ch 0000002e, errorString CTX SWITCH TIMEOUT, Info 0x4c020
[40963.565669] NVRM: Xid (PCI:0000:06:00): 109, pid=361603, name=re4.exe, Ch 0000002e, errorString CTX SWITCH TIMEOUT, Info 0x21c01d

upd:
after recent update of the game on steam I can’t even launch it