Unfortunately the only open source is only the kernel module of the driver which is already somewhat utilized by nvk(not fully ready afaik, but it shows high potential).
According to your bug report we have the same GPU so that is very bizarre that Xid triggers without logical explanation, hmm.
Might not help out and it will also prevent you from using DLSS, but have you tried to hide the NVIDIA gpu with this variable for Proton: PROTON_HIDE_NVIDIA_GPU=1
I agree that itās not logical. For me this problem occurs with any game that slightly taxes my system. It feels like it should be more wide spread than it is. It honestly surprises me that this thread isnāt packed with āme tooā responses.
At any rate Iāve found no combinations, or lack thereof, of Proton environmental variables that help this issue.
nvidia-bug-report.log.gz (897.2 KB)
Itās happening on Xorg directly for me, doesnāt seem linked to proton at all.
I have like 50 hours played on WRC 23. Always worked fine, had no problems (well, restricting the scope to this issue and it worked better after the 1.4.0 patch regarding precompiled shaders). But that specific track would trigger the XID error (i got it like three times retrying that track). I can try to reproduce it even externally record it just in case (not sure what would trigger it, maybe itās when rendering a specific frame, so recording it would notice which frame an why is it different, and increase chances of reproducing it consistently)
I think on the nvidia log should be the hardware details
Iām on Exherbo (a Gentoo like distro), itās a 1660ti (mobile) on an Acer predator helios 300 PH315-52-78VL, kernel 6.6.4, driver 545.29.06, 16gb RAM, i7-9750H
Not sure which details do you want
Iām using external devices like a Thrustmaster T300RS, the th8a shifter and a local provider of handbrake, i can do test without them connected too.
Edit, iāve just reproduced it again. Did many tracks and i keep playing with no problems at except of this track that triggers the XID
Steps to reproduce:
- Create a custom rally
- Select RALLYE MONTE-CARLO
- Season: Spring
- Add Stage
- Select Les Borels 8,6km, and all stock options
- Confirm, Confirm
- Start
- Select Subaru Impreza 1995
- Play until XID happens
Iām not sure if itās a specific frame. I also think performance on this specific track is relatively poor
Can you try Forza Horizon 5 instead ? I think it will crash much more with this issue
I donāt have Forza Horizon 5, as for the WRC 23, Iāll try the stage and update later with the results.
Iāve just added more clear repro steps. Iāve two videos
Game config
Gameplay until XID error:
This last video ends as obs throws that the HVENC codec is taking too long
Tried the stage with the same car etc⦠No issues. Cant replicate it.
I donāt see any major performance fluctuations either. All stages have pretty similar framerates for me. Crowded areas with a bit less fps and forests/fields more.
Been playing the game for nearly 60 hours now with zero crashes/freezes.
Ryzen 5800X3D, RTX 3080 ⦠currently 535.43.20 vulkan dev drivers but played with 545.29 also
edit: what happens if you cap the powerlimit of the gpu to be a bit lower?
maybe that stage uses more cpu also and you hit the laptop powerbrick limits and the driver then just gives up?
random thought.
Managed to get it to trigger on arch with the latest Xorg and stock DWM with nothing but the latest firefox running. This cannot get more basic.
Dec 17 17:46:11 arch kernel: NVRM: Xid (PCI:0000:01:00): 109, pid=544, name=Xorg, Ch 0000000a, errorString CTX SWITCH TIMEOUT, Info 0x34007
Dec 17 17:45:57 arch kernel: NVRM: Xid (PCI:0000:01:00): 109, pid=544, name=Xorg, Ch 0000000a, errorString CTX SWITCH TIMEOUT, Info 0x34007
Dec 17 17:45:44 arch kernel: NVRM: Xid (PCI:0000:01:00): 109, pid=544, name=Xorg, Ch 0000000a, errorString CTX SWITCH TIMEOUT, Info 0x34007
Dec 17 17:45:31 arch kernel: NVRM: Xid (PCI:0000:01:00): 109, pid=544, name=Xorg, Ch 0000000a, errorString CTX SWITCH TIMEOUT, Info 0x34007
Dec 17 17:45:17 arch kernel: NVRM: Xid (PCI:0000:01:00): 109, pid=544, name=Xorg, Ch 0000000a, errorString CTX SWITCH TIMEOUT, Info 0x34007
Dec 17 17:45:04 arch kernel: NVRM: Xid (PCI:0000:01:00): 109, pid=544, name=Xorg, Ch 0000000a, errorString CTX SWITCH TIMEOUT, Info 0x34007
Dec 17 17:44:51 arch kernel: NVRM: Xid (PCI:0000:01:00): 109, pid=544, name=Xorg, Ch 0000000a, errorString CTX SWITCH TIMEOUT, Info 0x34007
Thanks for your testing
Which proton version ?
Is it the latest WRC ?
It should say 1.4.0 somewhere at the game start
Seems setting power.limit is locked on some driver versions
sudo nvidia-smi --power-limit 75
Changing power management limit is not supported for GPU: 00000000:01:00.0.
Treating as warning and moving on.
All done.
Related issues:
Iāll look what can i do
Would you mind sharing the result of running nvidia-bug-report.sh ?
I can repro x109 error for game Pioneers of Pagonia with NVIDIA GeForce RTX 3070 + Driver 535.146.02.
4425951 has been filed locally for tracking purpose.
I assume it has to be run in the same session that caused the crash? Since it causes my computer to shut down Iām not sure itās possible. If I reboot and run it is it still useful? Iāll see what I can do next time it happens.
Iām experiencing this issue almost on any access to gpu. It started this month only after some updates to my Ubuntu 22.04.1 (Kernel 6.2.0-39). I donāt play games but do AI work. My desktop has 2x3090. With any python access through conda or even whilst starting pycharm, Ubuntu (gnome) freezes. And here is teh snippet from dmesg:
[ 884.191814] [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00005c00] Failed to grab modeset ownership
[ 884.191876] [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00002100] Failed to grab modeset ownership
[ 884.766884] retire_capture_urb: 43 callbacks suppressed
[ 2236.090323] NVRM: GPU at PCI:0000:5c:00: GPU-bf881eec-e206-0714-7afe-17c8cb11520c
[ 2236.090331] NVRM: Xid (PCI:0000:5c:00): 109, pid=11593, name=gnome-shell, Ch 00000010, errorString CTX SWITCH TIMEOUT, Info 0x3c007
[ 5257.450801] NVRM: Xid (PCI:0000:5c:00): 109, pid=11438, name=Xorg, Ch 00000018, errorString CTX SWITCH TIMEOUT, Info 0x11c003
[ 5499.487154] NVRM: Xid (PCI:0000:5c:00): 109, pid=11438, name=Xorg, Ch 00000008, errorString CTX SWITCH TIMEOUT, Info 0x11c002
[ 5768.720340] [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00005c00] Failed to grab modeset ownership
[ 5768.720457] [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00002100] Failed to grab modeset ownership
[ 5768.720546] [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00005c00] Failed to grab modeset ownership
My system is unusable now. Did CONDA_OVERRIDES_CUDA=12.2 to make conda work, but unable to make pycharm work.
[40226.813612] NVRM: GPU at PCI:0000:01:00: GPU-3440fcd9-ad72-5684-052e-87619260bcbf
[40226.813615] NVRM: Xid (PCI:0000:01:00): 109, pid=55550, name=Warframe.x64.ex, Ch 0000003e, errorString CTX SWITCH TIMEOUT, Info 0x3c01b
Xid 109 is back for me- kernel 6.6.7, Nvidia driver 545.29.06 using an RTX 2060 Super. This happens reliably after 10-20 minutes in games.
nvidia-bug-report.log.gz (929.8 KB)
Solved my problem and it is not NVIDIA card/driver. As some people mentioned I did clean install OS multiple times but nothing worked. Still conda info
made system to freeze. Only change I made to my system this month was adding a 10G dual port NIC. I removed that from the system and all working fine and no issues. It was the NIC installed on PCIex8 lane caused the system to freeze and for some reason NVIDIA got teh error Xid: 109.
I have only an Nvidia card installed as PCIe, unless the NVMe SSD counts. I am, however, having better luck with the nvidia-open-dkms open-source kernel modules over the proprietary driver. I havenāt experienced a crash in over a day.
+1 on this issue.
Affects all graphics/cuda workloads, only recently realized this is what caused everything to crash. Reliably can trigger with RE village:
NVRM: Xid (PCI:0000:01:00): 109, pid=307016, name=re8.exe, Ch 00000046, errorString CTX SWITCH TIMEOUT, Info 0x2c022
Debian 12 bookworm
Linux 6.1.0-13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.55-1 (2023-09-29) x86_64 GNU/Linux
Tested drivers in bookworm 525.147.05-4~deb12u1, experimental 535.43.02-1, and nvidia installed 545.23.08-1. All hang in the exact same way, 545 seems to be the worst.
RTX 2070 Super
Solved by downgrading to 470-tesla driver, kernel 6.1, seems stable for now, but missing alot of needed driver functionalityā¦
I snapped this bug report during a big freeze moment, when the GPU locked up for a good ~20 seconds then recovered. Kernel 6.6.8, Nvidia driver 545.29.06, RTX 2060 Super
nvidia-bug-report.log.gz (423.0 KB)
Found this guy (not me) has the same issue, with the 520.56.06 driver, but on an rtx 4090. Canāt confirm, as I donāt have the hardware. Crashes the same as the cuda workloads here though: Random CUBLAS_STATUS_INTERNAL_ERROR crashes during training with RTX 4090 - PyTorch Forums