I have been attempting to troubleshoot this solo for far longer than I should have, but when playing basically any Vulkan game I will at some point crash, locking my entire X session. But the only errors I have been able to find are XID 8, which as far as debugging goes isn’t super helpful.
Games I’ve crashed on:
Deep Rock Galactic
Elden Ring
Back4Blood
Any ideas would be appreciated - I’m honestly getting desperate haha.
Added the file, rebooted and attempted to play again. Crashed again within ~10 minutes, but now the crash was a hard crash requiring a full system reboot, rather than what appears to be waiting for the driver to restart itself.
“Interactive false” disables the driver’s watchdog which kills hanging (or just too long running) gpu processes (resulting in XID 8).
Unfortunately, in your case it wasn’t just a long-running gpu task so the watchdog got impatient but indeed a hard hang. Also, no additional errors were logged.
There’s some VK_ERR_DEVICE_LOST in dxvk logs, and some “NVRM wait for channel idle timeout” in the nvidia logs, but I’m guessing the latter is related to the XID 8?
Is there a verbose mode I can the driver under to hopefully get some more context into what is causing the issue? From my reading the Vulkan error above is pretty generic, and so is my XID error.
So where should I be moving forward with my testing to further narrow down the issue?
Yes, all messages are just a symptom of the same, some vulkan task is hanging the gpu.
I don’t think there’s anything to get more info on the driver side, I guess you’ll need to do a full apitrace with dxvk and take that to the dxvk devs to maybe check which calls are hanging.
In that respect, since you’ve said “basically any Vulkan game”, did you already try a native game?
Also, did you already try downgrading to the 470 driver to check for a recent regression?
In general, checking protondb, all your games should run flawless, so might be something specific to your hardware.
Well I tried utilizing VKD3D-Proton with Deep Rock Galactic instead of DXVK (DX12->vulkan instead of DX11->vulkan). DX12 was stable for a lot longer, but eventually the exact same crash occurred. This rules to me rules out a potential DXVK specific issue.
What tools are available to perform hardware inspection/tests on my GPU?
I had a lot of trouble trying to get gravitymark to run. 3/4 of the time I get no messages back when trying to run a test, the rest of the time I get the following error:
Oh, that’s bad. I didn’t expect that either. So nothing vulkan specific but a general gpu fault.
Please try reseating the nvidia board in its pcie slot, check if it works in another system to rule out mainboard issues.