Multiple CUDA/RTX/Vulkan application crashing with Xid (13,109) errors

Driver: 545.29.06
Kernel: 6.6.2
Proton: experimental-8.0-20231129

Still broken.

Nov 30 23:51:56 kleinerpopel kernel: NVRM: GPU at PCI:0000:26:00: GPU-6f98b267-20cc-5347-51dc-8bad07fd2ad0
Nov 30 23:51:56 kleinerpopel kernel: NVRM: Xid (PCI:0000:26:00): 13, pid='<unknown>', name=<unknown>, Graphics SM Warp Exception on (GPC 4, TPC 5, SM 0): Illegal Instruction Parameter
Nov 30 23:51:56 kleinerpopel kernel: NVRM: Xid (PCI:0000:26:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x526f30=0xc000b 0x526f34=0x0 0x526f28=0xf812b60 0x526f2c=0x1104
Nov 30 23:52:01 kleinerpopel kernel: NVRM: Xid (PCI:0000:26:00): 109, pid=2592, name=MetroExodus.exe, Ch 00000046, errorString CTX SWITCH TIMEOUT, Info 0x32c02c

Also happy 1 year anniversary @all hope you enjoyed none-RTX games in the past year.
I hope Nvidia will celebrate this special day and honour our patience with a fixed driver soon?

nvidia-bug-report.log.gz (1.0 MB)

4 Likes

After I bought an AMD Radeon RX 7800 XT, all my headache has just disappeared.
All the games I want to play just works. And my virtual machines have 3D acceleration, wow.

Maybe one day Nvidia will be better with all of this on Linux. But for now, I’m done with it.

2 Likes

I have filed a separate bug 4399325 for crash in Alan Wake 2 for better tracking purpose.
We are still not able to repro xid 109 error with this game so far.
Shall try on few more setups and update.

I wanted to try it for myself with the game that started the report. The game shows the initial videos and freezes. It took me a minute to reproduce the error. More than a year is a long time to not have this error solved.

7/12/23 15:12 kernel NVRM: Xid (PCI:0000:09:00): 13, pid=‘’, name=, Graphics SM Warp Exception on (GPC 2, TPC 1, SM 1): Illegal Instruction Parameter
7/12/23 15:12 kernel NVRM: Xid (PCI:0000:09:00): 13, pid=‘’, name=, Graphics Exception: ESR 0x514fb0=0x7000b 0x514fb4=0x0 0x514fa8=0xf812b60 0x514fac=0x1104
7/12/23 15:12 kernel NVRM: Xid (PCI:0000:09:00): 109, pid=8051, name=MetroExodus.exe, Ch 000000c6, errorString CTX SWITCH TIMEOUT, Info 0x3c064

nvidia-bug-report.log.gz (723.5 KB)

My system:
Manjaro kernel 6.6
Nvidia driver: 545.29.06
Ryzen 5800x
RTX 4080 (€1300 graphics card)

I know they do everything possible to fix the errors but the company should know that it is not enough.

1 Like

Care to share your testing matrix? The cases on which you were not able to repro will be useful to me.

Kernel versions, driver versions, distributions, procedures – please.

Running the game on 720p was the only scenario Metro Exodus did not freeze at least for me. However did not re-test if this is still functional or borked as well now.

Is there anything we can do to help debugging ? Some info we can collect ? a proton / vulkan log ? a report ?
Is there some debugging facility on the nvidia driver ?

proton flags:

WINEDLLOVERRIDES="xinput1_3=n,b" PROTON_HIDE_NVIDIA_GPU=1 PROTON_ENABLE_NVAPI=0  %command%

Here happens with Forza Horizon 5 kernel 6.6.4 and nvidia 545.29.06 GE-Proton8-25

I did all game’s primary missions on Rally Adventure DLC
Symtoms are:

  • Sometimes game open, sometimes no. Let’s say at least opens 1 of 4 times

  • After that, either it works ok for hours (well some micro stutter on 1660ti but playable) or it crashes on some world scene after some minutes.

I’m talking only about the Rally Adventure DLC and when you do a mission / race the only options that works is the Rally one. (there are 2 suboptions, Rally and Race)

When selecting other than Rally, mean it a Race on Rally DLC or any race on main game, it will crash at loading

Also, the main game aside from the DLC works ok to drive the world but all missions produce a crash

I’m not pretty sure the random crash at the start of the game would be related to nividia or to proton

But i think the crashes in-game would be related to nvidia

I get one of two possible dmesg messages for each in-game crash:

The CTX SWITCH TIMEOUT

or

[Fri Dec  8 13:00:23 2023] NVRM: GPU at PCI:0000:01:00: GPU-ba900b20-4691-c7da-881c-6ad46ccceb80
[Fri Dec  8 13:00:23 2023] NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception on GPC 0: 3D WIDTH ZT Violation. Coordinates: (0x2ee, 0x168)
[Fri Dec  8 13:00:23 2023] NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x500420=0x80000004 0x500434=0x16802ee 0x500438=0x5050000 0x50043c=0x0
[Fri Dec  8 13:00:23 2023] NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception on GPC 1: 3D WIDTH ZT Violation. Coordinates: (0x2f0, 0x168)
[Fri Dec  8 13:00:23 2023] NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x508420=0x80000004 0x508434=0x16802f0 0x508438=0x5050000 0x50843c=0x0
[Fri Dec  8 13:00:23 2023] NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception on GPC 2: 3D WIDTH ZT Violation. Coordinates: (0x2f0, 0x164)
[Fri Dec  8 13:00:23 2023] NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x510420=0x80000004 0x510434=0x16402f0 0x510438=0x5050000 0x51043c=0x0
[Fri Dec  8 13:00:23 2023] NVRM: Xid (PCI:0000:01:00): 13, pid=10173, name=ForzaHorizon5.e, Graphics Exception: ChID 0031, Class 0000c597, Offset 00000000, Data 00000000

With this last version (in contrast to 535.104.05) it will always crash at in-game loading if proton flags are inverted:

PROTON_HIDE_NVIDIA_GPU=0 PROTON_ENABLE_NVAPI=1

We call about Metro exodus Enhanced Edition.
You are ok with this?
always:

[28851.796259] NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics SM Warp Exception on (GPC 6, TPC 2, SM 1): Illegal Instruction Parameter
[28851.796270] NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x5357b0=0xc000b 0x5357b4=0x0 0x5357a8=0xf812b60 0x5357ac=0x1104
[28856.067232] NVRM: Xid (PCI:0000:01:00): 109, pid=124603, name=MetroExodus.exe, Ch 0000002e, errorString CTX SWITCH TIMEOUT, Info 0x3c01c


Thank you for your report.

The best way to help debugging is to provide detailed reports like you just did with game title, driver version, last known driver version that worked, Proton version if applicable, launch options, symptoms, steps that would ideally lead to a consistent repro, and nvidia-bug-report logs.

We are looking into every title reported.

1 Like

I asked a very specific question about your testing matrix. You can enumerate every single combination of dimensions you have tested from those you just listed.

I do expect a complete answer.

I can consistently reproduce the xid error if, on Rally Adventures DLC of Forza Horizon 5:
Start the game (the only way it works with PROTON_HIDE_NVIDIA_GPU=1 PROTON_ENABLE_NVAPI=0 ), press continue, then on the garage page, go back (i think it should let me drive the car).

I ran the attached report just after the crash happened
nvidia-bug-report.log.gz (831.1 KB)

Note that in the same garage menu, if i go to Forza View then press Drive it will always work. (i’m translating from spanish it may not be super accurate)

steam-1551360_crash_at_back_from_garage.log.gz (54.6 MB)

Here is the proton log of following the reproduction steps
Complete used proton environmnt were:

PROTON_LOG_DIR=/var/tmp/ PROTON_LOG=1 VK_INSTANCE_LAYERS=VK_LAYER_KHRONOS_validation  VKD3D_CONFIG=vk_debug WINEDLLOVERRIDES="xinput1_3=n,b" PROTON_HIDE_NVIDIA_GPU=1 PROTON_ENABLE_NVAPI=0  %command% 

It may be super big when uncompressed (like 1.5gb)

nvidia-bug-report.log.gz (536.2 KB)

Reproduced it again, this time with just 1 monitor and using iGPU as main and discrete via nvidia prime render offload just in case

At least now you can reproduce the error? My game (Metro exodus) can’t even reach the main menu.

Steps to reproduce this issu where mentioned in this topic several times see:

And

For example. But also other games and applications with exact settings and setp where provided multiple times.

Yes. In my case specifically I was getting these grind-to-a-halt errors with any newer/more modern game. Specifically the one I am (Or, I guess, was… As it has been a long time since I’ve been able to play it) is Wasteland 3. I’ve posted my findings and logs earlier in the thread.

Anybody tried Pioneers of Pagonia? We got the same “XID” Error. And the game crashes once you try to build a building.
Here is how i reproduced it:
1-) Get the crash in game
2. Termina. sudo dmesg
3. Find “XID”

The Result;
[13026.679091] NVRM: Xid (PCI:0000:01:00): 109, pid=838381, name=Pioneers of Pag, Ch 00000046, errorString CTX SWITCH TIMEOUT, Info 0x9c021

It also happens in EA Sports WRC 23 1.4.0 on a 8.6km montecarlo track, Les Borels “to become night” with Subaru Impreza 1995

[Sat Dec 16 09:17:50 2023] nvidia 0000:01:00.0: Using 39-bit DMA addresses
[Sat Dec 16 09:18:54 2023] NVRM: GPU at PCI:0000:01:00: GPU-ba900b20-4691-c7da-881c-6ad46ccceb80
[Sat Dec 16 09:18:54 2023] NVRM: Xid (PCI:0000:01:00): 109, pid=7403, name=WRC.exe, Ch 00000036, errorString CTX SWITCH TIMEOUT, Info 0x3c017

nvidia-bug-report.log.gz (920.3 KB)

I have spent quite sometime in EA Sports WRC myself and have not encountered a single Xid crash.
Then again some of the mentioned games here have also been working correctly on my side …

On a side note, I’ve also noticed dramatic difference in user experience between different hw combinations.

EDIT: Which distro and kernel are you running by any chance, maybe that can help us track down what’s going on ?

Kernel, distro, hardware. None of it seems to matter. I’ve gone from Gentoo, to Arch, to Debian, back to Gentoo on multiple different kernels. Bummed an older 20 series card and the exact same problem. Hell recently I even swapped out motherboards just see if I was pulling what’s left of my hair out for nothing. Nope same old same old.

At this point it might be worth loading up nvk just to try and bisect where this problem really comes from. Any thing to light a fire under nvidia’s ass as they seem fairly disinterested for whatever reason.

After more than a year its incredibly frustrating that a proper debug into the nvidia driver could shine some light onto this issue. I’m sure there are multiple people on this forum who have the development skill including myself. If only we had the ability.

NVRM: GPU at PCI:0000:08:00: GPU-3d748122-6de5-a2ce-89aa-e1e10bd79d1c
NVRM: Xid (PCI:0000:08:00): 109, pid=6017, name=, Ch 00000066, errorString CTX SWITCH TIMEOUT, Info 0x14c064

nvidia-bug-report.log.gz (887.6 KB)

2 Likes