Tried version 535.43.13, acts identically to the 535.113.xx driver I switched from, (besides the noueavu driver trying to smother it to death at boot time) games show their various splash screens, I hear menu sound, but screen goes black (from whatever splash screen it happens to have displayed). I can get “ingame” in tf2, but it throws a few Xid 13 errors, and I get below 1 fps.
nvidia-bug-report.log.gz (1.1 MB)
Hi All,
Anyone knows the last passing driver where Horizon Zero Dawn did not freeze.
@amrits FWIW
jorgicio commented on Aug 18
Latest drivers since 525 broke the compatibility with the game. Tried the 520 version, but it gave me some glitches like blue tone and black rendering. I had to downgrade to 515 version and using an LTS kernel to make it work and now it works flawlessly.
ChaosBlades commented on Aug 18: I think it was working great with drivers 525 or 520 (…)
520, because the freezing issue starts with driver 525. I tested by myself several times from this version onwards.
@amrits More people commented in the Proton’s issue: this comment and below. They seem to confirm that the freezing issues start with 525. Also, DLSS was probably again implicated in the problems, with one person reporting no problems on 535.113.01 with DLSS off, but almost immediate freeze once DLSS is turned on. There’s also a possible report of a reliably reproducible freeze in this report if that helps.
I have this game. I think it is possible that DLSS is a contributing factor, but I still get freezes without DLSS enabled. Based on my limited experimentation, it seems that turning on motion blur in settings significantly increases the chance of (and reduces the time taken until) a freeze.
Unfortunately, I cannot answer on which driver didn’t have this issue, as I only started playing the game with driver 535.
After tweaking the settings, I am sometimes able to get several hours of play without a crash (but a crash can still happen at any time)
Hey @dletone,
Is there a GRID 16.1 variant of this driver. I tried to install it on the guest VM but is not working and says incompatible.
Thanks.
@amrits With 545.23.06 on 3080 12GB I can 100% reproducibly trigger a Xid 109 with Witcher 3 on new game start if and only if ray-traced reflections are enabled. The error occurs right after the cutscenes are skipped with Space, just before the player is given control over the character.
Note that if RT reflections are turned off, they can be turned back on without immediate consequences right after the player is given control of the character, so whatever triggers the crash happens during that “fade in” after the cutscenes.
I can reproduce this at Witcher 3 as well. However, unlike @kerberizer, this doesn’t usually happen on cutscenes. (Tho I did not try to disable ray traced reflections)
For me it happens almost every single time after I reload a save (when I die, for example). It also sometimes happens when I open the map/inventory menu, and more rarely randomly during the game.
One thing that might or might not be relevant is that the game only works for me when using GE-Proton7-55. When using any more recent version of GE-proton, or any version of Proton at all, the game will freeze as soon as I start a game (either when loading a save or when starting a new game, in which case it will freeze after the initial cutscenes). I haven’t checked the logs back then before I figured the (almost) working Proton version, but I could try to re-reproduce it and take a look to make sure that it is indeed the same problem.
However, even when using the mentioned Proton version I get the freezes really often: almost every time I reload a save/die, and often when I (try to) open the map. On these I can confirm that the error shown in the logs is the same switch timeout tracked here.
RTX 4090, using proprietary NVidia drivers, openSUSE Tumbleweed, happens both on X11 (XFCE+Compiz) and on Wayland (Wayfire).
Thanks, @kerberizer for the exact reliable repro steps, we were able to see same xid error with Witcher 3 if ray-traced reflections are enabled.
Team will debug it now and will keep you updated on it.
I get the same Xid 31 error as others in Horizon Zero Dawn with the same dmesg output. The interesting thing is that the game seems to run pretty fine with a 30FPS cap, but more than that causes a freeze and that Xid 31 error.
EDIT: The game only runs fine at 30 FPS for some time, until it decides to freeze again with the same exact Xid 31 error.
Found that Ryujinx crashes with a similar error, though don’t believe this to be as easily reproducible as Horizon Zero Dawn.
[ 1089.431790] NVRM: GPU at PCI:0000:01:00: GPU-5ac189b0-f681-9e18-5628-98e3f14b3fcd
[ 1089.431793] NVRM: Xid (PCI:0000:01:00): 31, pid=6549, name=GUI.RenderLoop, Ch 00000036, intr 00000000. MMU Fault: ENGINE CE0 HUBCLIENT_CE1 faulted @ 0xde_8d690000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_WRIT
[ 1290.093480] NVRM: Xid (PCI:0000:01:00): 31, pid=9789, name=GUI.RenderLoop, Ch 00000046, intr 00000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_PROP_0 faulted @ 0xde_998c0000. Fault is of type FAULT_PTE ACCESS_TYPE_VIRT_READ
After no problems with 535.113 series, a recent (K)Ubuntu 23.10 update brought a 535.129. Crashing at the launch time with Metro Exodus Enhanced (proton) started appearing. Now I get:
2023-11-03T14:28:44.789717+02:00 odysei-desktop kernel: [ 501.375229] NVRM: GPU at PCI:0000:0b:00: GPU-ee295651-91b4-21c4-0f15-90667263b594
2023-11-03T14:28:44.789726+02:00 odysei-desktop kernel: [ 501.375234] NVRM: Xid (PCI:0000:0b:00): 13, pid='<unknown>', name=<unknown>, Graphics SM Warp Exception on (GPC 2, TPC 2, SM 0): Illegal Instruction Parameter
2023-11-03T14:28:44.789728+02:00 odysei-desktop kernel: [ 501.375245] NVRM: Xid (PCI:0000:0b:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x515730=0x13000b 0x515734=0x0 0x515728=0xf812b60 0x51572c=0x1104
2023-11-03T14:28:44.825712+02:00 odysei-desktop kernel: [ 501.409701] NVRM: Xid (PCI:0000:0b:00): 13, pid='<unknown>', name=<unknown>, Graphics SM Warp Exception on (GPC 3, TPC 1, SM 1): Illegal Instruction Parameter
2023-11-03T14:28:44.825717+02:00 odysei-desktop kernel: [ 501.409713] NVRM: Xid (PCI:0000:0b:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x51cfb0=0x13000b 0x51cfb4=0x0 0x51cfa8=0xf812b60 0x51cfac=0x1104
2023-11-03T14:28:54.229713+02:00 odysei-desktop kernel: [ 510.814065] NVRM: Xid (PCI:0000:0b:00): 109, pid=10363, name=MetroExodus.exe, Ch 0000009e, errorString CTX SWITCH TIMEOUT, Info 0x1c0b8
2023-11-03T14:28:54.229722+02:00 odysei-desktop kernel: [ 510.814065]
2023-11-03T14:53:43.893716+02:00 odysei-desktop kernel: [ 2000.481604] NVRM: Xid (PCI:0000:0b:00): 13, pid='<unknown>', name=<unknown>, Graphics SM Warp Exception on (GPC 0, TPC 3, SM 1): Illegal Instruction Parameter
2023-11-03T14:53:43.893726+02:00 odysei-desktop kernel: [ 2000.481630] NVRM: Xid (PCI:0000:0b:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x505fb0=0x11000b 0x505fb4=0x0 0x505fa8=0xf812b60 0x505fac=0x1104
2023-11-03T14:53:53.361723+02:00 odysei-desktop kernel: [ 2009.950004] NVRM: Xid (PCI:0000:0b:00): 109, pid=12778, name=MetroExodus.exe, Ch 0000009e, errorString CTX SWITCH TIMEOUT, Info 0x1c0bb
Running on RTX 4070.
Can confirm since 535.129.03 Metro Exodus PC Enhanced Edition which worked fine before is now broken again.
VKD3D-Proton 2.10
Proton Experimental: 8.0-20231019
RTX 3080
Kernel: 6.5.9-1-default
Nov 03 17:54:05 kleinerpopel kernel: NVRM: GPU at PCI:0000:26:00: GPU-6f98b267-20cc-5347-51dc-8bad07fd2ad0
Nov 03 17:54:05 kleinerpopel kernel: NVRM: Xid (PCI:0000:26:00): 13, pid='<unknown>', name=<unknown>, Graphics SM Warp Exception on (GPC 0, TPC 2, SM 0): Illegal Instruction Parameter
Nov 03 17:54:05 kleinerpopel kernel: NVRM: Xid (PCI:0000:26:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x505730=0x13000b 0x505734=0x0 0x505728=0xf812b60 0x50572c=0x1104
Nov 03 17:54:09 kleinerpopel kernel: NVRM: Xid (PCI:0000:26:00): 109, pid=8273, name=MetroExodus.exe, Ch 00000036, errorString CTX SWITCH TIMEOUT, Info 0x1fc01c
nvidia-bug-report.log.gz (976.2 KB)
Don’t know how useful is an additional data point since we are fast approaching the 1 year anniversary of this bug and three major driver releases but on an RTX 3070 The Last of Us also hangs with Xid 109 (CTX SWITCH TIMEOUT, Ch 000000a6, Info 0x3c050) extremely frequently with any of the 535 (535.113.01) or 545 (545.29.02) drivers. It happens almost always (but not exclusively) after transitioning out of a cutscene. Version of proton, vkd3d makes no difference. I remember a similar issue happening with Uncharted 4 (which presumably uses the same-ish engine) many months ago with 530.41 but it was much less prevalent. The Last of Us is practically unplayable except maybe by setting all graphic options to low (low textures, no ambient occlusion, no shadows, no reflections). Between vulkan and GBM issues it’s so tiring trying to figure out what’s wrong. Granted the graphics landscape on Linux is moving fast so I’m certain it’s not an easy issue to tackle but it’s still extremely disappointing having to fight the driver at every new release.
nvidia-bug-report.log.gz (585.6 KB)
Color me annoyed.
I’ve had no problems of any kind with this card in any application. I run BOINC applications with CUDA. I run AAA games with Proton and Lutris.
Now I try playing Alan Wake 2 with Lutris and wine-ge-8-22-x86_64. That goes well until about ten minutes in and I learn all about Xid 109.
Color me supremely annoyed. Nvidia is worth it for the shiny RTX stuff this time, I thought. Nvidia has come a long way on Linux, I thought. Surely I won’t be burned again by an issue only Nvidia can fix, I thought.
Evidently this is the last time I’ll suffer such a casualty of thought.
So here are a few hints.
- I first saw this bug on driver 545.29.02, and collected a bug report (attached).
- I downgraded to 535.113.01, still saw the issue, and eventually collected a bug report (attached).
- I downgraded to 520.56.06, the last version of the driver known not to be affected, but it turns out this is old enough not to support mesh shaders, which absolutely destroys any usefulness it might have for the purpose of running the game I’ve been waiting 13 years to play (for the uninitiated: AW2 practically requires them for frame rates above 2 per second). Edit for emphasis: this means there is no playable version of the driver for this game.
- The only thing I noticed that affected the outcome was – possibly – enabling “NVIDIA Prime Render Offload” in my Lutris runner system options. This shouldn’t have had any effect, since I’m running on a desktop and I only have one discrete graphics adapter, but somehow with it switched on I was able to get through an hour of the first chapter without the Xid 109 timeout occurring. Until it did again, and then every ten minutes of game time thereafter, so what do I know.
I would say “don’t do this to me again or I’ll stop spending money at you,” but we’re already here. My graphics adapter may as well be a paperweight in this situation, and you might be able to imagine how much it cost.
Here are the crunchy bits. Have fun.
$ dmesg -T | grep Xid
[Sun Nov 12 13:57:06 2023] NVRM: Xid (PCI:0000:0a:00): 109, pid=47940, name=AlanWake2.exe, Ch 000000a6, errorString CTX SWITCH TIMEOUT, Info 0x2c037
[Sun Nov 12 14:09:42 2023] NVRM: Xid (PCI:0000:0a:00): 109, pid=54824, name=AlanWake2.exe, Ch 000000a6, errorString CTX SWITCH TIMEOUT, Info 0x2c037
[Sun Nov 12 14:19:55 2023] NVRM: Xid (PCI:0000:0a:00): 109, pid=58276, name=AlanWake2.exe, Ch 000000b6, errorString CTX SWITCH TIMEOUT, Info 0x2c057
[Sun Nov 12 17:51:39 2023] NVRM: Xid (PCI:0000:0a:00): 109, pid=150125, name=AlanWake2.exe, Ch 000000b6, errorString CTX SWITCH TIMEOUT, Info 0x2c037
[Sun Nov 12 17:59:26 2023] NVRM: Xid (PCI:0000:0a:00): 109, pid=157960, name=AlanWake2.exe, Ch 000000a6, errorString CTX SWITCH TIMEOUT, Info 0x2c037
[Sun Nov 12 18:14:09 2023] NVRM: Xid (PCI:0000:0a:00): 109, pid=162214, name=AlanWake2.exe, Ch 000000a6, errorString CTX SWITCH TIMEOUT, Info 0x2c037
$ uname -a
Linux pygoscelis 6.1.60-gentoo-dist #1 SMP PREEMPT_DYNAMIC Sat Nov 11 04:16:40 EST 2023 x86_64 AMD Ryzen 7 2700X Eight-Core Processor AuthenticAMD GNU/Linux
$ lspci | grep -i vga
0a:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090 Ti] (rev a1)
$ nvidia-smi
Sun Nov 12 18:23:32 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.113.01 Driver Version: 535.113.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3090 Ti Off | 00000000:0A:00.0 On | Off |
| 32% 55C P0 112W / 450W | 851MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
nvidia-bug-report.545.29.02.log.gz (703.0 KB)
nvidia-bug-report.535.113.01.log.gz (626.0 KB)
I’ve been hitting this on two different Manjaro boxes for quite some time now.
Just in case someone at Nvidia is wondering: Oh my god this sucks.
I’m pretty done with Nvidia as well at this point; I can’t continue to support them with how unbearably slow (and in many cases non-existent) driver support is for every issue I encounter. I’ve already just bought a new AMD card.
I wish it didn’t have to be this way but good riddance; hopefully this company will stop dragging its feet one day but I don’t see the light at the end of that tunnel. Refusing to open-source the driver when support is this bad is insane.
The CTX SWITCH TIMEOUT error is also happening to me with The Last of Us with Proton. Linux driver 535.129.03, kernel debian 6.5.0-4-amd64, GPU RTX 3070.
I’ve tried a whole bunch of things, like various Proton versions (8.0-4, experimental, GE 8-23), using VKD3D_CONFIG=single_queue, nothing really changes: it always happens every 20mn or so.
I also tried the 525 series: same issue.
Is there anything we can do to help debug this?
edit: I tried the driver mentioned here Multiple CUDA/RTX/Vulkan application crashing with Xid (13,109) errors - #178 by dleone . It’s worse: the CTX SWITCH TIMEOUT error happens always after the launch logo. At least it’s an easy reproducer! cc @dleone
An update in my case, specifically involving Alan Wake 2. Spoiler-free details follow.
Reproducing the Xid 109 timeout was achievable with the game running on any combination of settings that included active ray-tracing options. On a new game, it is likely that the timeout will occur well before the player reaches the end of the introductory sequence. If the player makes it to the first cutscene and begins the first chapter, for me the timeout would always occur shortly after being introduced to the player character’s Tab-menu functions and always before making it down the first flight of stairs. No more than 15 minutes of play time should be necessary, so long as the game is not paused by its Esc-menu.
It also appears that standing still and not interacting with a scene may delay the timeout so long as the player’s viewport remains stationary and not much happens. A context switch is required in order for the timeout to occur, which is more likely during memory access operations.
In all 525, 535, and 545-series drivers available, the only game settings that avoid the timeout are those with all RTX options (direct lighting, indirect lighting, and transparency/reflections) set to “OFF.” This was not immediately obvious, but other users have documented similar observations in other titles. There are no out-of-game settings whatsoever, either via Proton, Wine, Lutris environment variables, etc., that are ameliorative. I was able to reach the end of the game with such a configuration, but this has … severely underwhelming graphical consequences for a title that leans so heavily on realistic lighting for both artistic and diegetic effect.
A pity. I know how closely Remedy and Nvidia worked on this title. But only the Windows customers got a “game-ready” driver on release day, while we – who paid just as much for our hardware and for the title – got this.
Just came across to test driver 545.29.02 with Metro Exodus PC Enhanced Edition, it features the same regression reintroduced in 535.129.03.
The game freezes right after the intro video when it should advance to the main menu.
As described before and at the very beginning of this thread.
NVRM: GPU at PCI:0000:26:00: GPU-6f98b267-20cc-5347-51dc-8bad07fd2ad0
NVRM: Xid (PCI:0000:26:00): 13, pid='<unknown>', name=<unknown>, Graphics SM Warp Exception on (GPC 0, TPC 0, SM 1): Illegal Instruction Parameter
NVRM: Xid (PCI:0000:26:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x5047b0=0x15000b 0x5047b4=0x0 0x5047a8=0xf812b60 0x5047ac=0x1104
NVRM: Xid (PCI:0000:26:00): 109, pid=20728, name=MetroExodus.exe, Ch 00000036, errorString CTX SWITCH TIMEOUT, Info 0x1dc01a
Kernel: 6.6.1
GPU: RTX 3080
Proton: experimental-8.0-20231114c
VKD3D-Proton: 2.10
nvidia-bug-report.log.gz (1.0 MB)
Also I agree with others in this thread. It is beyond ridiculous that we receive such a bad customer support as this is an issue not only affecting games running via Proton and other compatibility tools but also native applications.