Games freeze and then crash when using Steam Proton on GeForce GTX 1060 Mobile

Hi,

I’m experiencing freezes at games when using Steam Proton, the game just freezes for few minutes and then crashes. This happened in all games that I’ve played:

  • Yakuza Kiwami
  • Yakuza 0
  • Mass Effect Legendary Edition
  • Days gone
  • Foxhole
  • Rocket League
  • Overcooked! All you can eat
  • Deep Rock Galactic

These crashes happen at random times, in some games the crash may happen after 2-3 hours of playing (for example in Mass Effect) and in some games it happens very often, maybe once in 15 minutes (Deep Rock Galactic). I’ve tried checking RAM, VRAM, Video card temperature, but everything seems to be fine. Steam Proton logs show some errors such as VK_ERROR_DEVICE_LOST, but I still have no idea what the issue is that’s causing these crashes. Right now I’m using Ubuntu 20.04. Last year I’ve been playing Deep Rock Galactic on ultra settings on Windows 10, and never had any crashes of the game. I’ve been able to play a native Linux game, Europa Universalis IV, without any freezes/crashes for 7 hours. I was never able to play a proton game without it freezing after 3-4 hours.

I’ve tried installing both older and newer drivers (nvidia-driver-460; 495; 390) and the issue stays. Right now I’m using nvidia-driver-470. I’ve tried using different Proton versions as well (Proton Experimental; 6.3-7; 5.13-6; Glorious Eggroll 6.16-GE1)

"When I run uname -ar, I get: Linux rus-N95TP6 5.11.0-38-generic #42~20.04.1-Ubuntu SMP Tue Sep 28 20:41:07 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

My dGPU is Nvidia GeForce GTX 1060 Mobile; My iGPU is Intel(R) UHD Graphics 630 (CFL GT2)

My CPU is Intel(R) Core™ i7-8700 CPU @ 3.20GHz

Below I’ve attached the bug report generated by nvidia-bug-report.sh around 2 minutes after Deep Rock Galactic froze, because the crashes happen there more often than in the other games. I’ve also attached the Proton log if it might help. It’s probably relevant that I’m running a laptop with an integrated Intel iGPU (hence PRIME is involved, though the Steam proton log seems to show that the video device used is in fact the Nvidia dGPU)

steam-548430.log (1.2 MB)

nvidia-bug-report.log.gz (15.3 MB)

So, my question is, why do these crashes happen and how can I fix this problem?

[263577.247] (EE) modeset(0): present flip failed
[263577.256] (WW) modeset(0): flip queue failed: Invalid argument
[263577.256] (WW) modeset(0): Page flip failed: Invalid argument

Please check if this also happens with a kernel 5.4.

Hi generix,

Thank you for fast response. I’ve tried to run Deep Rock Galactic with the kernel 5.4 (Linux rus-N95TP6 5.4.0-89-generic #100-Ubuntu SMP Fri Sep 24 14:50:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux) and the same freeze happened. Attaching the log files below:

nvidia-bug-report.log.gz (15.4 MB)

steam-548430.log (1.3 MB)

At least the log flood from the modesetting driver is gone so the real issue surfaced:

[ 7906.691262] NVRM: GPU at PCI:0000:01:00: GPU-c3a7e654-0ecc-bacf-9944-d53b055ef0ed
[ 7906.691264] NVRM: Xid (PCI:0000:01:00): 13, pid=322, Graphics Exception:  EXTRA_MACRO_DATA
[ 7906.691268] NVRM: Xid (PCI:0000:01:00): 13, pid=322, Graphics Exception: ESR 0x404490=0x80000002
[ 7906.691341] NVRM: Xid (PCI:0000:01:00): 13, pid=14135, Graphics Exception: ChID 0054, Class 0000c197, Offset 00000000, Data 00000000

which isn’t very specific though I could imagine a thermal issue. Please try logging temperatures using nvidia-smi.

Hi generix,

Here’s the log of the temperature:
temp-logs.txt (1.5 MB)

Was testing it on the game Deep Rock Galactic. The temperature peaked at 91 C and stayed there for about 3 hours. The game froze and crashed around 23:14:13 according to the log.

Peaking at 93°C is much too hot, please clean your fan and heatspreader. Always running at temperatures that high will damage your system.

Hi @generix,

In order to test the overheating theory I ran Deep Rock Galactic at a capped framerate of 40fps in order to keep my GPU at low temps. If this were a thermal issue then the game wouldn’t freeze.

Unfortunately, it froze. I checked the temperature logs: it peaked at 78C, i.e. well below dangerous temps.

This was while running kernel 5.4.

I’ve attached all the logs.

nvidia-bug-report.log.gz (15.9 MB)

steam-548430.zip (80.7 KB)

2021-11-09-temp-logs-2.zip (10.3 KB)
The freeze happened around 19:24:01 according to the temperature log.

Hi @generix,

In addition, I’d like to report that I was able to run a non-native linux game through Steam Proton, Hades, in Vulkan mode (the game supports running either in DirectX 12 mode or Vulkan mode. I assume that running it in DirectX 12 mode makes Steam Proton employ DXVK, while running in Vulkan mode doesn’t require DXVK, but that’s just an assumption I’m making, check the attached proton logs to confirm) Without any freezes for over 16 hours, I left it overnight and just closed the game in the end, it was obvious it won’t freeze.

Later on, for testing and differential diagnosis purposes, I’ve run Hades in DirectX 12 mode. To my expectations, it froze/crash after 1 hour.

This leads me to think this issue is somehow related to DXVK.

Proton logs for both runs attached:

Hades (Vulkan).log.gz (19.3 MB)

Hades(DirectX).log.gz (11.8 MB)

nvidia-bug-report(Hades).log.gz (16.0 MB)

2021-11-09-temp-logs-Hades.txt.gz (14.5 KB)

The freeze happened around 00:38:33 according to the temperature log time. The peak temperature was 59 C.

So temperatures don’t matter and only happens using DXVK/VK3D. Doesn’t make his clearer, though. Also the XID 13 errors seem to be a red herring. Those happened at
16:30 and 18:04 an in both cases it seemed to be an application error, the nvidia driver unblocking it “(WW) NVIDIA: Wait for channel idle timed out.”. The xserver went on running.
At 00:39, actually nothing happened, the Xserver running fine, only messages that an xbox controller was added and then a switch to VT, likely to create the bug-report.log.
So this looks like really only the game was freezing.

Hi @generix,

Any idea what should I do to fix these freezes? What else can I provide to help solve this problem?

At the system level (nvidia-bug-report.log), no info can be found. So you should move to the DXVK/VK3D level, maybe increase debug verbosity there. You mentioned “VK_ERROR_DEVICE_LOST”, I couldn’t find that in the steam log, where did you get that from?

Hi @generix,

Thank you for the fast response. This error was showing in the previous Proton logs before I started testing the game on kernel 5.4. For example, here:

steam-548430 (copy).log.gz (137.9 KB)

In every log I’ve seen when I tested the game on kernel 5.11 (5.11.0-38-generic #42~20.04.1-Ubuntu SMP Tue Sep 28 20:41:07 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux), that error occurred.

Since that message seems to be gone with 5.4, I suspect this came from the intel gpu. Maybe take any other vulkan implementation out of the equation by prepending
VULKAN_DEVICE_INDEX=1
to the game executable.
(check ‘VULKAN_DEVICE_INDEX=1 vulkaninfo’ whether this is really the nvidia gpu)

Hi @generix,

I’ve tried prepending VULKAN_DEVICE_INDEX=1 to the game executable and ran Deep Rock Galactic. The freeze happened after an hour of playing session. Just want to notice that in this proton log the error “VK_ERROR_DEVICE_LOST” appeared again. I’m attaching the logs below:

nvidia-bug-report.log.gz (15.9 MB)

steam-548430.log.gz (133.9 KB)

I’ll also attach the output of the command ‘VULKAN_DEVICE_INDEX=1 vulkaninfo’, please help me check if everything is correct:

VulkanDeviceIndex.txt.gz (8.5 KB)

Please try setting dxvk.halveNvidiaHVVHeap = True in dxvk.conf for the affected games.

hi @generix,

I would like to formally apologize for leading you astray.

The truth, to my great shame, is that this whole time verbose logging was NOT, in fact, enabled in the X sessions that were used to generate the nvidia-bug-reports.

In actuality, the nvidia-bug-report.sh script was run in the X session that didn’t even have the game running in it. It was logged in as a different user and was just running a plain desktop.

I have since rectified my mistake and run the X session with verbose logging and launched the game there and I applied the dxvk setting as you requested and it still froze/crashed. Attaching the PROPER nvidia-bug-report, as well as the proton log.

nvidia-bug-report.log.gz (16.0 MB)
steam-548430.log.gz (93.1 KB)

Doesn’t matter where you created the nvidia-bug-report.log, it contains low-level logs independent of Xservers.
I’m really out of ideas, better reach out to the dxvk issue tracker.

Hi, @generix

Thank you for you help. If you accept cryptocurrency, please let me know the address where I can send a donation.

Hi, @generix

I want to report that in the beginning of January (Janurary 9) I’ve found out that my game didn’t freeze. I left the game (Deep Rock Galactic) overnight and it didn’t freeze even after 10.5 hours. The issue probably disappeared somewhere between the last December and the middle of January.

Then the issue appeared again in the middle of January. Meanwhile my driver version has been updated to the version 510.47.03, and kernel has been updated to the version 5.4.0-99. I’ve managed to rollback to the kernel version and I’d want to try to rollback to the previous version of the Nvidia driver 495.46 to check if this will fix the issue, but I couldn’t find any solution on how to do that with Nvidia Prime. Please provide me with the instructions on how to install driver version 495.46 on my system Ubuntu 20.04 without breaking it, it seems that I can’t simply install the driver 495.46 because of Nvidia Prime.

My Proton, Kernel and GPU specifications at the moment when Deep Rock Galactic didn’t crash:
Proton: 1638789187 proton-6.3-8c
Kernel: Linux 5.4.0-92-generic
GPU Driver: v495.46

My current Kernel and GPU specifications:
Kernel: Linux 5.4.0-92-generic
GPU Driver: 510.47.03

The procedure should be

  • remove nvidia driver packages by switching to nouveau in Software&Updates
  • blacklist nouveau by creating /etc/modprobe.d/nouveau.blacklist.conf
blacklist nouveau
nvidia
nvidia-drm
nvidia-modeset
  • run sudo update-initramfs -u
  • add kernel parameter nvidia-drm.modeset=1
  • create /etc/X11/xorg.conf.d/10-nvidia-driver.conf
Section "OutputClass"
    Identifier "nvidia-driver"
    MatchDriver "nvidia-drm"
    Driver "nvidia"
    Option "AllowEmptyInitialConfiguration"
EndSection
  • install package nvidia-prime sudo apt install nvidia-prime
  • reboot

NB: before switching back to the packages, the runfile driver has to be uninstalled by running it again with --uninstall option. The other modifications don’t need to be reversed.