XID 8 - Vulkan crashes in multiple games

I have been attempting to troubleshoot this solo for far longer than I should have, but when playing basically any Vulkan game I will at some point crash, locking my entire X session. But the only errors I have been able to find are XID 8, which as far as debugging goes isn’t super helpful.

Games I’ve crashed on:

  • Deep Rock Galactic
  • Elden Ring
  • Back4Blood

Any ideas would be appreciated - I’m honestly getting desperate haha.

nvidia-bug-report.log.gz (357.5 KB)

Please check if this helps:
Create /etc/X11/xorg.conf.d/nvidia-interactive.conf

Section "OutputClass"
    Identifier     "nvidia"
    MatchDriver    "nvidia-drm"
    Driver         "nvidia"
    Option "Interactive" "false"

Added the file, rebooted and attempted to play again. Crashed again within ~10 minutes, but now the crash was a hard crash requiring a full system reboot, rather than what appears to be waiting for the driver to restart itself.

New log attached.
nvidia-bug-report.log.gz (337.3 KB)

“Interactive false” disables the driver’s watchdog which kills hanging (or just too long running) gpu processes (resulting in XID 8).
Unfortunately, in your case it wasn’t just a long-running gpu task so the watchdog got impatient but indeed a hard hang. Also, no additional errors were logged.

Gotcha, thanks for the explanation!

There’s some VK_ERR_DEVICE_LOST in dxvk logs, and some “NVRM wait for channel idle timeout” in the nvidia logs, but I’m guessing the latter is related to the XID 8?

Is there a verbose mode I can the driver under to hopefully get some more context into what is causing the issue? From my reading the Vulkan error above is pretty generic, and so is my XID error.

So where should I be moving forward with my testing to further narrow down the issue?

Yes, all messages are just a symptom of the same, some vulkan task is hanging the gpu.
I don’t think there’s anything to get more info on the driver side, I guess you’ll need to do a full apitrace with dxvk and take that to the dxvk devs to maybe check which calls are hanging.
In that respect, since you’ve said “basically any Vulkan game”, did you already try a native game?
Also, did you already try downgrading to the 470 driver to check for a recent regression?
In general, checking protondb, all your games should run flawless, so might be something specific to your hardware.

I haven’t tried 470, but I have tried 475, 495, 510, and 515. Any specific reason for 470?

I’ll have to sus through the api trace steps later this evening and submit an issue on the dxvk GitHub.

I don’t think I actually own a native Linux game that uses Vulkan over OpenGL, I’ll have to do some research on that front.

470 is a legacy driver so it’s compatible to recent kernel and available in any distro’s repo.

Tried downgrading to 470, but the driver is too old. Both DRG and Back4Blood gave errors stating I needed more modern d3d capabilities.

I’ll get an API trace on 515 and go from there. I truly appreciate all your help.

Well I tried utilizing VKD3D-Proton with Deep Rock Galactic instead of DXVK (DX12->vulkan instead of DX11->vulkan). DX12 was stable for a lot longer, but eventually the exact same crash occurred. This rules to me rules out a potential DXVK specific issue.

What tools are available to perform hardware inspection/tests on my GPU?

Maybe use gpu-burn or try gravitymark in vulkan mode.

I had a lot of trouble trying to get gravitymark to run. 3/4 of the time I get no messages back when trying to run a test, the rest of the time I get the following error:

WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
E: 143.98 ms: VKWindow::create_context(): surface is not supported by adapter
E: 144.05 ms: VKWindow::create_context(): can't create context
E: 144.20 ms: Window::create(): can't create context
E: 144.23 ms: GravityMark::create(): can't create Window for Vulkan platform
M:      0 us: ../data.zip: 313 files
M:    175 us: Temporal antialiasing
M:   7.85 ms: Build Date: Apr  8 2022
M:   7.94 ms: Build Info: release; fusion; vk=1; gl=45; gles=32; cu=1
M:   7.97 ms: Build Version: 1.53
M:  51.86 ms: Name: ASUSTeK COMPUTER INC. ROG STRIX X399-E GAMING System Product Name
M:  51.95 ms: System: Linux 5.15.0-46-generic x86_64 GNU/Linux
M:  51.98 ms: Kernel: #49-Ubuntu SMP Thu Aug 4 18:03:25 UTC 2022
M:  52.01 ms: Memory: 31.20 GB
M:  52.05 ms: Uptime: 1 day 15:53
M:  52.07 ms: CPU: AMD Ryzen Threadripper 2950X 16-Core Processor
M:  52.10 ms: GPU: NVIDIA GeForce RTX 2080 Ti
M:  52.15 ms: Device: VEN_10DE&DEV_1E07&SUBSYS_86671043
M:  52.17 ms: Version: 515.65.01
M:  52.20 ms: Memory: 11.00 GB
M:  59.89 ms: Desktop: 3840x1080 1.0
M:  59.94 ms: Screen 0: 1920x1080 0 0 DP-0
M:  59.97 ms: Screen 1: 1920x1080 1920 0 DP-4
M:  60.02 ms: Creating 1920x1080 Vulkan Window

I’m not even sure what lavapipe is, so I’m not quite sure if that is a gravitymark error, or Nvidia related. I’ll give gpu-burn a shot

(sorry for the double post)

Ran gpu-burn and got crashes. The first time I ran it, I got a full system crash, following attempts to run end with an XID 8 as well.

Screen grab from gpu-burn output:

Nvidia-bug-report from two failed runs
nvidia-bug-report.log.gz (402.3 KB)

With crashes occurring just under load, I’m now worried it might be a hardware related issue rather than a driver one.

Oh, that’s bad. I didn’t expect that either. So nothing vulkan specific but a general gpu fault.
Please try reseating the nvidia board in its pcie slot, check if it works in another system to rule out mainboard issues.

I had a 980ti my old build. Swapped them out and completed gpu-burn with 0 errors. I’m guessing this means my 2080ti is the issue?

Yes, it’s broken.

Damn… Worst case scenario :/
I’ll try talking to insurance and see where it goes from there.

Thanks for all your help

IIRC, since the first batches of 2080ti died like flies, some vendors extended warranty for them. You should check that, too.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.