Multiple CUDA/RTX/Vulkan application crashing with Xid (13,109) errors

Hello there this is a follow up post to my previous Control (The game) issue as I encounter similar issues with Metro Exodus (Linux native game) and Metro Exodus PC Enhanced Edition (Proton + VKD3D)

Nov 26 11:34:25 z004 kernel: NVRM: Xid (PCI:0000:26:00): 109, pid=9664, name=MetroExodus.exe, Ch 00000056, errorString CTX SWITCH TIMEOUT, Info 0x34c027
Nov 26 11:34:21 z004 kernel: NVRM: Xid (PCI:0000:26:00): 13, pid=‘’, name=, Graphics Exception: ESR 0x5147b0=0x17000b 0x5147b4=0x0 0x5147a8=0xf812b60 0x5147ac=0x1104
Nov 26 11:34:21 z004 kernel: NVRM: Xid (PCI:0000:26:00): 13, pid=‘’, name=, Graphics SM Warp Exception on (GPC 2, TPC 0, SM 1): Illegal Instruction Parameter

What is the root cause here?

Driver: 525.53
CUDA: 12.0
GPU: RTX 3080

4 Likes

Issue is still present with 525.60.11

Nov 30 13:13:25 z004 kernel: NVRM: GPU at PCI:0000:26:00: GPU-6f98b267-20cc-5347-51dc-8bad07fd2ad0
Nov 30 13:13:25 z004 kernel: NVRM: Xid (PCI:0000:26:00): 13, pid=‘’, name=, Graphics SM Warp Exception on (GPC 2, TPC 0, SM 0): Illegal Instruction Parameter
Nov 30 13:13:25 z004 kernel: NVRM: Xid (PCI:0000:26:00): 13, pid=‘’, name=, Graphics Exception: ESR 0x514730=0x1b000b 0x514734=0x0 0x514728=0xf812b60 0x51472c=0x1104
Nov 30 13:13:29 z004 kernel: NVRM: Xid (PCI:0000:26:00): 109, pid=22407, name=MetroExodus.exe, Ch 0000004e, errorString CTX SWITCH TIMEOUT, Info 0x17c027

DE: Gnome 43.1 running on Wayland
Distro: openSUSE MicroOS
Driver: 525.60.11
CUDA: 12.0
Proton: 7.0-4
VKD3D: 2.6.0
Vulkan: 1.3.224
Flatpak Runntime: 22.08
(Running Steam via Flatpak)

I am getting this issue as well with Metro Exodus and the 525.60.11 drivers. It used to work fine on the 520 series.

Similar logs to what @Vortex_Acherontic posted:

Nov 30 11:10:59 kernel: NVRM: Xid (PCI:0000:0c:00): 13, pid='<unknown>', name=<unknown>, Graphics SM Warp Exception on (GPC 3, TPC 3, SM 1): Illegal Instruction Parameter
Nov 30 11:10:59 kernel: NVRM: Xid (PCI:0000:0c:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x51dfb0=0x7000b 0x51dfb4=0x0 0x51dfa8=0xf812b60 0x51dfac=0x1104
Nov 30 11:10:59 kernel: NVRM: Xid (PCI:0000:0c:00): 13, pid='<unknown>', name=<unknown>, Graphics SM Warp Exception on (GPC 4, TPC 0, SM 1): Illegal Instruction Parameter
Nov 30 11:10:59 kernel: NVRM: Xid (PCI:0000:0c:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x5247b0=0xf000b 0x5247b4=0x0 0x5247a8=0xf812b60 0x5247ac=0x1104
Nov 30 11:11:03 kernel: NVRM: Xid (PCI:0000:0c:00): 109, pid=87503, name=MetroExodus.exe, Ch 0000003e, errorString CTX SWITCH TIMEOUT, Info 0x1cc026

Distro: Arch Linux
Kernel: 6.0.10.arch2-1
Xorg: 21.1.4-1
DE: Gnome 43.1 in X11
Driver: 525.60.11
Proton Experimental

1 Like

Same issue for allot of people…

[Mon Dec 19 12:53:43 2022] NVRM: GPU at PCI:0000:01:00: GPU-788f1619-c663-fb6f-56d4-f0e39b292db1
[Mon Dec 19 12:53:43 2022] NVRM: Xid (PCI:0000:01:00): 13, pid=‘’, name=, Graphics SM Warp Exception on (GPC 2, TPC 0, SM 1): Illegal Instruction Parameter
[Mon Dec 19 12:53:43 2022] NVRM: Xid (PCI:0000:01:00): 13, pid=‘’, name=, Graphics Exception: ESR 0x5147b0=0xc000b 0x5147b4=0x0 0x5147a8=0xf812b60 0x5147ac=0x1104
[Mon Dec 19 12:53:47 2022] NVRM: Xid (PCI:0000:01:00): 109, pid=43868, name=MetroExodus.exe, Ch 000000be, errorString CTX SWITCH TIMEOUT, Info 0x50c05a

Distro: Arch Linux
Kernel: 6.0.12-arch1-1
Xorg: 21.1.6-1KDE O
DE: KDE Plasma X11
Driver: 525.60.11
nvidia bug report attached

nvidia-bug-report.log.gz (1.6 MB)

525.78, not fixed

I’m getting something similar on 525.78 in Arch:

NVRM: Xid (PCI:0000:42:00): 109, pid=1185, name=Renderer, Ch 00000010, errorString CTX SWITCH TIMEOUT, Info 0x2c004

It consistently freezes my system for about 10-20 seconds whenever I launch a compute process with PyTorch. The last driver version that didn’t have this issue for me was 515.76.

1 Like

525.85.05 issue is present as well:

Jan 27 10:16:19 z004 kernel: NVRM: GPU at PCI:0000:26:00: GPU-6f98b267-20cc-5347-51dc-8bad07fd2ad0
Jan 27 10:16:19 z004 kernel: NVRM: Xid (PCI:0000:26:00): 13, pid='<unknown>', name=<unknown>, Graphics SM Warp Exception on (GPC 0, TPC 0, SM 0): Illegal Instruction Parameter
Jan 27 10:16:19 z004 kernel: NVRM: Xid (PCI:0000:26:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x504730=0x1b000b 0x504734=0x0 0x504728=0xf812b60 0x50472c=0x1104
Jan 27 10:16:23 z004 kernel: NVRM: Xid (PCI:0000:26:00): 109, pid=14282, name=MetroExodus.exe, Ch 00000066, errorString CTX SWITCH TIMEOUT, Info 0x43c040

nvidia-bug-report.log.gz (334.6 KB)

I have filed a bug 3959156 internally for tracking purpose.
Shall try to reproduce issue locally and if needed any additional information, will get back.

2 Likes

Hi All,
I tried playing game Metro Exodus (Linux native game) for around 30 minutes on couple of notebooks which have RTX 3070 Ti and RTX 2060 but could not observed any XID errors.
I would like to know repro frequency at your end and is there any other way also to reproduce issue consistently.

The issue happens with the Windows version of Metro Exodus when it’s ran through Proton (the log says “name=MetroExodus.exe”). The Windows version runs much smoother so it’s better than the native. Before it worked almost fine except you had to disable hairworks (otherwise it freezes in intro), the rest was okay. Now it freezes on the title screen right before showing the main menu, the log reports the Xid errors as stated in posts above.

My game settings are everything to max except Hairwork which I disabled.

What I observed though is that this issue does not happen if you start Metro Exodus in safe-mode (after crash) or the first time post install and then set everything to max (except hairworks) and start playing without rebooting the game.

It happens on the 2nd start after all settings have been turned up and the game was shut-down entirely.

This however happens on both the native Metro Exodus and Metro Exodus PC Enhanced Edition via Proton and VKD3D

PC Enhanced Edition Settings I get the crash with:

  • Resolution: 1920x1080
  • Quality: Extreme
  • VSync: Full
  • Motion blur: High
  • Raytracing: Ultra
  • NVidia DLSS: Quality
  • Reflections: Raytraced
  • VRS: 4X
  • Hairwoks: Off
  • Advanced Physx: On
  • Tesselation: On
  • Field of View:

Alight … I think I found the issue. For some unknown reason it’s the resolution.
Running the above settings but on 720p all is fine, setting my resolution to 1080p makes the game crash before the main menu on the next game start.

My desktop config is two 1920x1080 (60Hz) displays which makes my primary resolution 1080p and can’t get higher.
So it may be the issue that setting the game resolution to the primary desktop resolution crashes it?

Thanks for sharing the information, I am able to reproduce issue locally now and will keep posted on the same.

1 Like

Hi All,
Can you please try with driver 520.56.06 and share test results.

Re-Doing the same task with 520.56.06 worked fine.

  • Started game in safe mode
  • Made settings as outlined above
  • Restarted the game
  • Loaded a save game and walked a few meters

In case it holds any valuable information I also attached the “bug-report” archive for 520.56.06 even though no bug seems to have happened:

nvidia-bug-report_520.56.06.log.gz (294.4 KB)

not sure if im hitting the same issue but. on a prime setup i get

tom-acer kernel: NVRM: GPU at PCI:0000:01:00: GPU-58e586ab-a95c-b7fb-4f87-143605fb6aa2
tom-acer kernel: NVRM: GPU Board Serial Number: 0
tom-acer kernel: NVRM: Xid (PCI:0000:01:00): 56, pid='<unknown>', name=<unknown>, CMDre 00000001 00000200 00000001 00000005 0000001d

when i try to run diablo2 with median xl patches and GitHub - bolrog/d2dx: D2DX is a complete solution to make Diablo II run well on modern PCs, with high fps and better resolutions. so it in turn is a dx11 title and running it fullscreen on an external monitor. windowed or even just running on the internal it works. but as fast as i try to run it fullscreen on the external monitor this Xid happends. and a reboot is required. this is on kwin 5.27 wayland, and nvidia 525.89.02, tried downgrading various things since i think this was working before. but didnt go as long back as 520.56.06 , it can occur with other various titles when trying to run them fullscreen on the external monitor in wine aswell

yep managed to find an old archive of 520.56.06 and those runs the games just fine aswell. no Xid 56, but at the point of where it usually froze. it prints this to dmesg [drm:nv_drm_fence_context_create_ioctl [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00000100] Failed to allocate fence signaling event , if thats anything related or just an other issue that simply was fixed later.

I’m experiencing a similar issue using an RTX 4090. Training runs with pytorch start fine but randomly fail anywhere from 1 to 10 hours into training, with the Xid 109 CTX SWITCH TIMEOUT error.
The difficult part is that I haven’t found a way to quickly reproduce the issue, it only occurs randomly, usually after an hour or so.

Various configurations I’ve tested:
WSL
Native Ubuntu
Power Limiting GPU to 50%
Limiting memory usage to 50%

Were you able to find a fix for this?

Thanks for sharing the test results and it looks like you are no longer facing same issue with driver 520.56.06

Thanks @gulafaran for sharing test results, you are no longer experiencing the original issue with driver 520.56.06.
However, you are seeing different error messages, can you please confirm if it’s consistent and you are seeing any performance drop or application crashing or any other functional issue.