Multiple CUDA/RTX/Vulkan application crashing with Xid (13,109) errors

Just had a hard lock for 30 seconds that booted the other plays out of my multiplayer game. It feels like it’s the same thing that happens during the CTX Switch Timeouts but it was able to recover. I didn’t really find anything in the journalctl/dmesg logs.

545.29.06 on kernel 6.7.4, RTX 2060 Super. Arch linux.

nvidia-bug-report.log.gz (425.3 KB)

I see similar behaviour in Warframe- long, hard freezes followed by stutters but it is able to recover. It always kills networking (or something network adjacent) and boots the other players if my desktop PC is hosting. My player is usually dead when it recovers so it’s a visual thing and the game is still running/playing audio while frozen visually.

I tested 550.40.07 with Metro Exodus PC Enhanced Edition and Horizon Zero Dawn, both seem to work fine now.

With Metro the Xid occurred while running the game on the desktops native resolution right before entering the main menu.
With Horizon Zero Dawn the issue could be triggered by running the benchmark at lest 2 times and sometimes spontaneously in-game.

For other apps/games mentioned throughout this thread I can not give any feedback on the recent driver situation.

OS: openSUSE Aeon
Kernel: 6.7.2-1-default
DE: Gnome 45.3
Wayland Compositor / X11 WM: Mutter 45.3
libwayland-client0: 1.22
libwayland-server0: 1.22
xwayland: 23.2.4

1 Like

Seeing this issue with World of Warcraft, even on the latest drivers:

OS: Garuda Linux
Kernel: 6.7.4-zen1-1-zen
Resolution: 2560x1440
DE: Plasma 5.27.10
WM: kwin
Wayland
CPU: AMD Ryzen 9 7950X3D (Gaming cores pinned to process)
GPU: NVIDIA GeForce RTX 3080
Driver version: 550.40.07

NVRM: Xid (PCI:0000:01:00): 109, pid=130187, name=WoW.exe, Ch 0000005c, errorString CTX SWITCH TIMEOUT, Info 0x4c059

Was seeing it before the driver patch, still am. Seems unrelated to whether Ray Tracing is enabled in settings. Unpredictable timing, but I think it’s on asset loads of some kind.

Also seeing this on World of Warcraft - after digging through logs a simple dmesg dumped it out:

[104007.416632] NVRM: Xid (PCI:0000:01:00): 109, pid=381348, name=WoW.exe, Ch 00000066, errorString CTX SWITCH TIMEOUT, Info 0x3c02e

OS: Linux Mint 21.3 x86_64
Kernel: 6.6.5-060605-generic
Resolution: 5120x2880, 5120x2880
DE: Cinnamon 6.0.4
WM: Mutter (Muffin)
WM Theme: Mint-Y-Dark-Aqua (Mint-Y)
Theme: Adwaita-dark [GTK2/3]
Terminal: gnome-terminal
CPU: AMD Ryzen 9 7900X (24) @ 5.733G
GPU: NVIDIA GeForce RTX 3080 Lite Ha
Driver version: 550.40.07
GPU: AMD ATI 15:00.0 Device 164e
Memory: 16893MiB / 31202MiB

Feels like it freezes on loading something as previous poster mentioned, you get that momentary stutter where you think its loading something in but it just freezes. The more windows I have open on the desktop the more immediate the freeze will be. A reboot may allow it last a few hours before it happens again. It’s consistent enough where I don’t trust it at all, and have to reboot to windows for raid night.

Probably unrelated, but Palworld crashes within 20 seconds of startup, every time after the first run, unless this file is deleted:

<installation_directory>/steamapps/compatdata/1623730/pfx/drive_c/ProgramData/NVIDIA/NGX/models/config/versions/1/files/nvngx_mapping.json

That file has something to do with AI programming, I guess. I only post because maybe Nvidia NGX was changed/added to drivers post version 520.56.06 as prior drivers did not spawn “CTX Switch Timeout” errors.

Cannot use Linux for anything GPU heavy reliably in the last month or so…

Example of errors, always Xid 109:
[31013.396308] NVRM: Xid (PCI:0000:01:00): 109, pid=168149, name=r5apex_dx12.exe, Ch 00000076, errorString CTX SWITCH TIMEOUT, Info 0x3c046
[ 2823.363202] NVRM: Xid (PCI:0000:01:00): 109, pid=23382, name=cs2, Ch 000000b6, errorString CTX SWITCH TIMEOUT, Info 0x25c05d
NVRM: Xid (PCI:0000:01:00): 109, pid=‘’, name=, Ch 000000a6, errorString CTX SWITCH TIMEOUT, Info 0x26c058

Can consistently reproduce by playing ~1-2 games of CS2 Arms Race, the map Baggage will crash 90% of the time mid-game after a few minutes. Also has occured in compute heavy AI stuff, and in games like Apex Legends running through proton (once Apex crashes after 10-45 mins, the game will not run for longer than 5 without another Xid 109 happening). Occasionally X11/KDE Plasma won’t recover from the crash and a full hard reboot on crash is required.

Attempts to Debug:

I have tried 545.29.06, the beta 550.40.07, and the latest Vulkan Dev driver ( 535.43.09) and the issue happens either way. I have turned on/off ReBar, ensured power management stuff is all in order, and some other random tweaks. Went back to my stock latest Nvidia install from the main Arch repos and default config, issue still there and easily reproduced. When the crash happens the screen freezes but audio, etc. continues to play in the background, and it takes ~15 seconds for the system to recover enough to alt-tab or switch terminals most of the time, with a hard (reset button) restart required occasionally. Sometimes in proton apps the screen will freeze, then render a few frames after a few seconds, then freeze again, always with Xid 109 in dmesg after the crash. This happens indepedent of whether an app is run with DX11 or DX12 in Proton (all dxvk in the end), and with native Vulkan games like CS2. I have only had it happen during CUDA loads a few times but have not recently done any work with compute lately.

Bug report attached! I ran the bug tool immediately after reproducing the crash issue.

I would really like to use my GPU again, so anything else I can do to help solve this would be greatly appreciated.

System info:
Arch Linux kernel 6.7.5,
Nvidia Driver v.545.29.06
Plasma 5.27.10 through KWin
i7-12700k,
RTX 3090
MSI Z690A, 32gb DDR5,
nvidia-bug-report.log.gz (937.6 KB)

For me installing the 550 Beta driver solved so many issues. Maybe worth a shot trying this driver?

I have had many freezes using the 550 drivers. There were no Xid error logs in the registry either, in fact there was no nvidia log. The only thing I could do is turn off the computer by pressing the power off button. I think the problem appears when there is some suspend/resume in between.

Rolling back to version 545 resolves the issue.

I tried it (550 beta) and could still reproduce easily.

Thanks! I just successfully traveled twice in a row. I think issue with Forza Horizon 4 is resolved on 550.54.14

Still can reproduce issue consistently on latest (550.54.14) driver.

There has been so many applications mentioned throughout this thread that you need to specify for which applications you can still reproduce the issue.

Edit: For example I can confirm that neither for invokeai, Upscayl, Metro Exodus, Resident Evil 2, Resident Evil Village, Control nor horizon zero Dawn this issue is still presented. While RTX was enabled in all games which support it.
Also I need to mention that invokeai even worked fine with 545 while other pytorch/ AI tools mentioned in this thread had triggered the issue.

Alan Wake II still almost instantly crashing with Xid 109 Error on latest Fedora Xfce and 550 official driver. Logs already provided somewhere above but on other Linux (Arch)

So I did run some more intense testing on multiple applications making use of RTX, DX12 and there fore VKD3D on Linux or CUDA and the latest stable driver.
At least in my tests, no Xid was triggered.

Specs:

  • Driver Version: 550.54.14
  • CUDA Version: 12.4
  • Vulkan: 1.3.277
  • Kernel: 6.7.5-1-default
  • OS: openSUSE Aeon
  • Gnome 45.3
  • Display Server: Wayland
  • GPU: NVIDIA GeForce RTX 3080 Lite Hash Rate
  • RAM: 16 GB + ZRam
  • Primary Resolution: 1920x1080

Tests I ran and in which everything was okay:

  • The Ascent
    • RTX: Enabled
      • Reflections
      • Shadows
      • Ambient Occlusion
      • DLSS enabled
    • GPU load: 100%
    • VRAM: ~ 6GiB
    • VSync: Off
  • Bright Memory: Infinite
    • RTX: Enabled
      • Preset: High
    • VSync: Dsiabeld
    • VRAM: ~ 7GiB
    • GPU usage: 100%
  • The Calisto Protocol
    • RTX: Enabled
      • Shadows: Enabled
      • Reflections: High
      • Transmissions: Enabled
    • VSync: Off
    • GPU Load: 100%
    • VRAM: ~ 8 GiB
    • Upscaling: Disabled
  • Horizon Zero Dawn
    • Graphic Preset: Custom (All maxed out)
    • VSync: Disabled
    • Upscaling: None
    • GPU Load: 100%
    • VRAM: ~ 9 GiB
  • Metro Exedous: PC Enhanced Edition
  • Control
    • RTX: Enabled
    • Preset: Highest
  • Resident Evil 2 Remake
    • RTX: Enabled
    • VSync: Off
    • GPU usage: 100%
    • VRAM: ~ 9.9 GiB
  • Resident Evil Village
    • RTX: Enabled
    • Preset: Raytracing
    • VSync: Off
    • GPU usage: 100%
    • VRAM: ~ 8 GiB
  • InvokeAI
    • 200+ images generated
  • Upscayl
    • 2 images upscaled from 1024 to 4096

Which close up at least the apps and games which initially caused me to opened this thread.

Because of the crowed responses and increasing difficulties to keep track of every different app, game and use case mentioned throughout this thread I advice users to open up separate bug reports.
It might be an application, distribution or driver packaging specific issues and thus keeping this thread alive until forever is not needed.

From now on until 10th March (two weeks) I will keep this thread open for further reposes. Otherwise I’ll mark the issue fixed as of driver 550.40.07 beta release and 550.54.14 stable release.

2 Likes

Still seeing this issue with World of Warcraft and the latest stable driver, with and without Ray Tracing active, it seems more related to asset loading - perhaps it loads RT assets regardless of the setting?

Notably this thread is by far the highest google result for me regarding Xid 109 issues, and getting SEO for other issues to the top would be difficult, closing the thread may bury bug reports and discussion regarding this issue for some time.

I am not very familiar with this forum. Perhaps a linked topic would be sufficient upon closing of the thread?

1 Like

The issue I see is like crashing a car. There are multiple ways of doing so hence the root cause is different but the result, a crashed car, is the same.

For me it looks like this. All reports of users getting Xid 13 and or 109 is the crashed car but the actual cause is different.

Therefore resulting in buring bug reports already.

1 Like

Alan Wake 2 will be fixed in our next driver release.

We found a synchronization issue related to ray tracing workloads that could apply to other ray tracing titles.

3 Likes

Yeeeehaw!!! So much looking forward to that.
Good work!

1 Like