I am using a GeForce GTX 1060 6GB in a 3 monitor configuration as follows:
Each monitor hosts a separate screen on the same X server (they are addressable as DISPLAY=:0.0 DISPLAY=:0.1 DISPLAY=:0.2)
:0.0 is the Left display with a resolution of 1920x1200 - HDMI-0 connection
:0.1 is the Center display with a resolution of 1920x1080 - HDMI-1 connection
:0.2 is the Right display with a resolution of 1080x1920 (rotated to portrait mode). DV-0 connection
I am not using Twinview or Xinerama. Each display has their own separate desktop instance with multiple virtual desktops per monitors (Fluxbox as WM).
This means that I can individually resize their resolutions, but can't drag windows across the screens.
Fullscreen gaming will only occupy 1 monitor instead of all 3.
After an nvidia driver upgrade earlier this year, I started noticing sporadic crashes after running the Linux Steam client.
Iāve narrowed it down to the following reproduction steps:
- Start Steam on the :0.0 Display (1920x1200). Iām currently letting it auto-login, so I let it get to the main store window.
- Exit Steam
- Start the Steam client a second time on the same display. Before it successfully renders a loading or login screen, the below X.org crash occurs in the nvidia_drv.so
[ 320.061] (EE)
[ 320.062] (EE) Backtrace:
[ 320.062] (EE) 0: /usr/bin/X (xorg_backtrace+0x4d) [0x55ed7c346a9d]
[ 320.062] (EE) 1: /usr/bin/X (0x55ed7c1a2000+0x1a8755) [0x55ed7c34a755]
[ 320.062] (EE) 2: /lib64/libpthread.so.0 (0x7f1b19d26000+0x14500) [0x7f1b19d3a500]
[ 320.062] (EE) 3: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (0x7f1b1830a000+0x4c0c2c) [0x7f1b187cac2c]
[ 320.062] (EE)
[ 320.062] (EE) Segmentation fault at address 0x5df9c7dd
[ 320.062] (EE)
Fatal server error:
[ 320.062] (EE) Caught signal 11 (Segmentation fault). Server aborting
[ 320.062] (EE)
[ 320.062] (EE)
Please consult the The X.Org Foundation support
at http://wiki.x.org
for help.
[ 320.062] (EE) Please also check the log file at ā/var/log/Xorg.0.logā for additional information.
[ 320.062] (EE)
[ 320.062] (EE)
[ 320.062] (EE) Backtrace:
[ 320.062] (EE) 0: /usr/bin/X (xorg_backtrace+0x4d) [0x55ed7c346a9d]
[ 320.062] (EE) 1: /usr/bin/X (0x55ed7c1a2000+0x1a8755) [0x55ed7c34a755]
[ 320.062] (EE) 2: /lib64/libpthread.so.0 (0x7f1b19d26000+0x14500) [0x7f1b19d3a500]
[ 320.062] (EE) 3: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (0x7f1b1830a000+0x4c0c2c) [0x7f1b187cac2c]
[ 320.062] (EE)
[ 320.062] (EE) Bus error at address 0x0
[ 320.062] (EE)
FatalError re-entered, aborting
[ 320.062] (EE) Caught signal 7 (Bus error). Server aborting
[ 320.062] (EE)
The 1st error (segfault) always occurs in the logs and always at the same address (0x4c0c2c for the 440.44 driver).
The 2nd error (bus error) does not always occur in the logs, but I suspect itās deterministic on what was left in memory when I restarted Steam.
When the driver segfaults, I am able to ssh into the box and restart X, so itās not causing a kernel panic.
Additional system and software details:
X.Org X Server 1.20.5
X Protocol Version 11, Revision 0
Current Operating System: Linux 5.4.3-gentoo #1 SMP PREEMPT Sat Dec 14 16:44:02 CST 2019 x86_64
Iāve narrowed down the conditions to trigger this crash as follows:
*Condition 1:
Driver version must be > 430.xx series. Iāve reproduced on 435.21 and am currently running 440.44. I just downgraded and tested 430.64 and cannot trigger this crash. Iāve also been running this box for over 2 years and continually keeping the nvidia driver up to date. I never experienced the segfault until I upgraded to the 435 series a few months ago.
*Condition 2:
This segfault only occurs on the 1920x1200 display. I tested restarting Steam on the other 2 displays and could not reproduce the crash.
*Condition 3:
This segfault only occurs when there are multiple monitors active. I tested restarting Steam on the 1920x1200 display as a single monitor and was not able to reproduct the crash.
*Condition 4:
This segfault only occurs on the 1920x1200 monitor if itās in this native resolution. I can ādowngradeā it to 1920x1080 and the crash cannot be reproduced on the same display.
Things that donāt appear to matter:
* Rearranging the monitor layout has no effect. Steam restarts will always trigger an X crash on the 1920x1200 display no matter what monitor is considered at 0 0 in virtual space.
* Switching HDMI connectors has no effect. I can trigger the crash on the 1920x1200 display even if it's on HDMI-1
* Going from 3 to 2 monitors doesn't make a difference. Steam restart on 1920x1200 display will still trigger the crash and can't be reproduced on either of the other display.
* Kernel or GCC version. I know Gentoo's reputation and I don't overconfigure optimzations. I've experienced the crash on multiple different Kernels in 5.x.x series. I've also moved from GCC8 to GCC9 and rebuilt X.org and related libraries. The crashes only appear to show up in the nvidia drivers after 430 series regardless of Kernel or GCC version. I haven't had any other software crashes occurring in drivers or applications on this box.
Further notes:
If I start Steam on the āproblemā monitor and exit it without restarting, the nvidia driver will eventually crash later.
This usually occurs if Iām opening or closing a youtube video in Chrome on the middle monitor. This may take minutes or hours, but eventually does occur.
I havenāt thoroughly tested this behavior to reproduce as above, but it heavily implies to me that something Steam is doing is not getting properly reset in the driver when itās closed and some memory is getting leaked or clobbered.
Hopefully this is enough information to reproduce or narrow down a regression. If I had access to the driver code, I would bisect 430 and 435 branches to see if thereās anything that might have broke due to bad assumptions on monitor resolutions in multimonitor configurations.
My gut instinct is there is likely some kind of memory corruption in the driver occuring when releasing Steamās allocated resources, but only if the monitor its on has a āweirdā resolution.
Iāve preemtively attached a bug report as well. It was produced on 440.44 driver on an ssh session.
nvidia-bug-report.log.gz (47.7 KB)