Multiple CUDA/RTX/Vulkan application crashing with Xid (13,109) errors

I’ve just had to hard reboot my laptop for the second time this morning due to Xid 109 errors during normal general use (no games). @amrits : How much longer are you going to keep us waiting for a fix? You say the fix already exists in older versions, why not merge it into a new beta build? 530.30.2 is two months old now, and defective. Unlike most of the other posters in this thread, my whole machine is crippled by this driver bug. I’m not trying to run cutting edge DX12 games that aren’t supposed to run on Linux in the first place.

Seems Linus was justified with his middle finger. Prove him wrong.

1 Like

And another two crashes in rapid succession, both times while watching fullscreen video with an external monitor. Unlike my previous crashes, today has the pattern that all 4 crashes occurred while watching video, 2 on YouTube in Brave browser, 2 in Haruna media player.

@amrits Does this aid in diagnostics at all?

My Xid 109s mostly happen in 3s, with the processes usually being some mix of systemsettings, plasmashell and systemd-udevd.

Apr 22 20:36:51 blksqr kernel: [ 422.306310] NVRM: Xid (PCI:0000:01:00): 109, pid=4446, name=haruna, Ch 0000004b, errorString CTX SWITCH TIMEOUT, Info 0x26400e
Apr 22 20:36:51 blksqr kernel: [ 422.306310]
Apr 22 20:36:52 blksqr kernel: [ 423.219344] sched: RT throttling activated
Apr 22 20:36:52 blksqr dbus-daemon[1581]: [system] Activating service name=‘org.kde.powerdevil.backlighthelper’ requested by ‘:1.60’ (uid=1000 pid=2483 comm=“/usr/lib/x86_64-linux-gnu/libexec/org_kde_powerdev” label=“unconfined”) (using servicehelper)
Apr 22 20:36:52 blksqr dbus-daemon[1581]: [system] Successfully activated service ‘org.kde.powerdevil.backlighthelper’

Apr 22 20:37:23 blksqr kernel: [ 454.808375] NVRM: Xid (PCI:0000:01:00): 109, pid=391, name=systemd-udevd, Ch 00000001, errorString CTX SWITCH TIMEOUT, Info 0x1400e
Apr 22 20:37:23 blksqr kernel: [ 454.808375]
Apr 22 20:37:30 blksqr kernel: [ 461.523717] asus_wmi: Unknown key code 0xc0
Apr 22 20:37:58 blksqr kernel: [ 489.308535] NVRM: Xid (PCI:0000:01:00): 109, pid=2484, name=plasmashell, Ch 00000018, errorString CTX SWITCH TIMEOUT, Info 0x8400e
Apr 22 20:37:58 blksqr kernel: [ 489.308535]
Apr 22 20:37:58 blksqr kernel: [ 489.309879] NVRM: Xid (PCI:0000:01:00): 109, pid=2484, name=plasmashell, Ch 00000018, errorString CTX SWITCH TIMEOUT, Info 0x8400e

@amrits is the fix in 525.116.03 ?

[ 726.589809] NVRM: GPU at PCI:0000:07:00: GPU-7008f4ca-0928-d2ff-80ae-9fc652eb3a5a
[ 726.589814] NVRM: Xid (PCI:0000:07:00): 13, pid=‘’, name=, Graphics SM Warp Exception on (GPC 2, TPC 0, SM 1): Illegal Instruction Parameter
[ 726.589824] NVRM: Xid (PCI:0000:07:00): 13, pid=‘’, name=, Graphics Exception: ESR 0x5147b0=0x5000b 0x5147b4=0x0 0x5147a8=0xf812b60 0x5147ac=0x1104
[ 726.590648] NVRM: Xid (PCI:0000:07:00): 13, pid=8420, name=MetroExodus.exe, Graphics Exception: ChID 00c1, Class 0000c797, Offset 00000000, Data 00000000

Nop…

@adolfotregosa is this Xid 13 crash with the new 525.116.03 driver?

Has anyone else tried this new driver? It looks promising in that it’s now the only version that turns up when I search for drivers compatible with my laptop (RTX 4080), suggesting that the devs have deemed 530.30.02 unfit for 4080 laptops (which would explain all the crashes I’ve been experiencing).

devs - can you confirm?

Still crashing for me with driver 525.116.03:

Apr 29 11:52:05 kleinerpopel kernel: NVRM: GPU at PCI:0000:26:00: GPU-6f98b267-20cc-5347-51dc-8bad07fd2ad0
Apr 29 11:52:05 kleinerpopel kernel: NVRM: Xid (PCI:0000:26:00): 13, pid='<unknown>', name=<unknown>, Graphics SM Warp Exception on (GPC 2, TPC 0, SM 0): Illegal Instruction Parameter
Apr 29 11:52:05 kleinerpopel kernel: NVRM: Xid (PCI:0000:26:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x514730=0xc000b 0x514734=0x0 0x514728=0xf812b60 0x51472c=0x1104
Apr 29 11:52:09 kleinerpopel kernel: NVRM: Xid (PCI:0000:26:00): 109, pid=7572, name=MetroExodus.exe, Ch 00000036, errorString CTX SWITCH TIMEOUT, Info 0x8c01b

Horizon Zero Dawn however seems to run fine now.
Beside the driver change I ran both game using Proton 8.0-1 (Experimental Branch not mainline)

Here’s the crash report with Metro:
nvidia-bug-report.log.gz (270.5 KB)

This is still happening 100% of the time whenever I try to launch Metro Exodus Enhanced Edition with the latest Proton Experimental.

Launch options: VKD3D_CONFIG=dxr11 PROTON_ENABLE_NVAPI=1 %command%

kernel: NVRM: Xid (PCI:0000:0c:00): 13, pid='<unknown>', name=<unknown>, Graphics SM Warp Exception on (GPC 1, TPC 5, SM 1): Illegal Instruction Parameter
kernel: NVRM: Xid (PCI:0000:0c:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x50efb0=0x13000b 0x50efb4=0x0 0x50efa8=0xf812b60 0x50efac=0x1104
kernel: NVRM: Xid (PCI:0000:0c:00): 109, pid=409446, name=MetroExodus.exe, Ch 000000ce, errorString CTX SWITCH TIMEOUT, Info 0x4c068

Arch Linux Kernel 6.2.13-arch1-1
Nvidia 3090 w/ 530.41.03 drivers

On the plus side, a recent Proton experimental update seems to workaround the Xid crash for Cyberpunk. The RT modes work now, although Overdrive is incredibly slow in Linux compared to Windows.

I take it that as for 4070 TI, downgrade to 520.56.06 won’t work, because the card itself was released after the driver? or can I somehow make this happen

The release notes for a given driver version gives an exhaustive list of all the supported cards. If your card isn’t on the list, I don’t think you have a realistic chance of it working. I too am limited on driver options since I have a 4080 Laptop. 520.x and most 525.x aren’t an option for me.

1 Like

This bug is also happening after playing back the Enemies demo in the Unity Editor 2023.1 beta:

May 03 05:09:51 kernel: NVRM: Xid (PCI:0000:0c:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception on GPC 0: 3D WIDTH ZT Violation. Coordinates: (0x2c0, 0x0)
May 03 05:09:51 kernel: NVRM: Xid (PCI:0000:0c:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x500420=0x80000004 0x500434=0x2c0 0x500438=0x60000 0x50043c=0x0
May 03 05:09:51 kernel: NVRM: Xid (PCI:0000:0c:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception on GPC 1: 3D WIDTH ZT Violation. Coordinates: (0x2c8, 0x0)
May 03 05:09:51 kernel: NVRM: Xid (PCI:0000:0c:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x508420=0x80000004 0x508434=0x2c8 0x508438=0x60000 0x50843c=0x0
May 03 05:09:51 kernel: NVRM: Xid (PCI:0000:0c:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception on GPC 2: 3D WIDTH ZT Violation. Coordinates: (0x298, 0x0)
May 03 05:09:51 kernel: NVRM: Xid (PCI:0000:0c:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x510420=0x80000004 0x510434=0x298 0x510438=0x60000 0x51043c=0x0
May 03 05:09:51 kernel: NVRM: Xid (PCI:0000:0c:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception on GPC 3: 3D WIDTH ZT Violation. Coordinates: (0x2a0, 0x0)
May 03 05:09:51 kernel: NVRM: Xid (PCI:0000:0c:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x518420=0x80000004 0x518434=0x2a0 0x518438=0x60000 0x51843c=0x0
May 03 05:09:51 kernel: NVRM: Xid (PCI:0000:0c:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception on GPC 4: 3D WIDTH ZT Violation. Coordinates: (0x2a8, 0x0)
May 03 05:09:51 kernel: NVRM: Xid (PCI:0000:0c:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x520420=0x80000004 0x520434=0x2a8 0x520438=0x60000 0x52043c=0x0
May 03 05:09:51 kernel: NVRM: Xid (PCI:0000:0c:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception on GPC 5: 3D WIDTH ZT Violation. Coordinates: (0x2b0, 0x0)
May 03 05:09:51 kernel: NVRM: Xid (PCI:0000:0c:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x528420=0x80000004 0x528434=0x2b0 0x528438=0x60000 0x52843c=0x0
May 03 05:09:51 kernel: NVRM: Xid (PCI:0000:0c:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception on GPC 6: 3D WIDTH ZT Violation. Coordinates: (0x2b8, 0x0)
May 03 05:09:51 kernel: NVRM: Xid (PCI:0000:0c:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x530420=0x80000004 0x530434=0x2b8 0x530438=0x60000 0x53043c=0x0
May 03 05:09:51 kernel: NVRM: Xid (PCI:0000:0c:00): 13, pid=310795, name=Unity, Graphics Exception: ChID 00ce, Class 0000c797, Offset 00000000, Data 00000000

This might even be a new issue also causing a Xid 13.

All others Xid 13 in this thread had in common they where caused by CTX SWITCH TIMEOUT and an Illegal Instruction Parameter

The shown Unity issue is a more general Graphics Exception it seems to me.

@Vortex_Acherontic

In my case, the error is extremely general, in that it happens apparently randomly during the use of any app, or even idle, with a slight bias toward it occurring while video is playing (either in browser or a media player). My laptop is very new, with a very new GPU, which might explain why Xid 109 is occurring so much for me despite not using Proton.

I had general big problems with Linux on this machine (which apparently is common for new hardware), most of which have been cleared up by BIOS updates. All that remains is lack of S3 suspend and this broken nVIdia driver.

I was hoping that my anomalous sample point might aid the devs in sniffing out a cause, but the radio silence continues…

Looks like a new version is out …

I think this wording is incorrect: “Fix is only available in driver 520.56.06 so far.” There’s no fix because this driver had been released before (October 12, 2022) the bug was reported (November 26, 2022). As such, no fix exists because it’s a regression that has not been fixed so there’s nothing to incorporate into the current releases.

1 Like

see Multiple CUDA/RTX/Vulkan application crashing with Xid (13,109) errors - #46 by amrits

The problem is that no fix exists (unless time travel has been invented), neither in the old version nor in any of the new ones. The version that works does so not because it has some sort of a fix, but because it doesn’t have the regression that causes this issue. It might be a nitpick but I think it’s important to keep expectations in check. People here might think the fix simply needs to be integrated in a newer driver version but in fact the issue needs to be identified first which can take significantly more time.

so you do not think his statemenet here

is correct ?

As I said, it’s poorly worded. The older version working != fix available, you can’t fix the issue before it has been even reported and known. I might not know something of course, but if some issue appears after a certain version it’s usually called an unfixed regression and not “fix available in the older version”. If you break things you don’t usually say that things are fixed before that moment you broke them.

I agree with you. Poorly worded. Maybe it was introduced with 525 and later. Who knows. Then it is not fixed in older versions either

@amrits May I ask for an update on the situation? I think it’s fair to say, that this issue is rather disruptive.
I would greatly appreciate it.