FYI: nvidia 410.78 driver fails with segmentation fault on Fedora fc29 Workstation with NVS 510 card

I have a system with a NVS 510 card and multiple 4k monitors attached. After upgrading from fc27 with 390.87 driver to fc29 with 410.78 driver, I can no longer start the X11 environment with 410.78 (I tried both Gnome with gdm and xfce with lightdm) and had to fall back on the 390.87 installer with 4.19 kernel patch.

The 410.78 install log indicates the driver seems to install successfully, but Xorg.0.log indicates that the nvidia 410.78 driver now either has a problem with DRI2 (“AIGLX: Screen 0 is not DRI2 capable”) and/or with connecting to the ACPI driver (“NVIDIA(0): ACPI: failed to connect to the ACPI event daemon”) that leads to a “Segmentation fault” and then forces “server aborting”. (I sent the log files to linux-bugs@nvidia.com).

Both of these errors (DRI2 and ACPI) did not exist with 390.87 driver on fc27 – and do not exist on fc29 when using the 390.87 installer with 4.19 kernel patch described on https://www.if-not-true-then-false.com/2015/fedora-nvidia-guide/4/#libglvnd-error. This interim solution (I’d rather use the 410.78 driver than the 390.87 driver) got the system working again and ruled out any dependencies between X.Org X Server 1.19.6 used in fc27 and X.Org X Server 1.20.3 used in fc29.

Note: On fc29, the official nvidia 390.87 installer fails with “ERROR: Failed to run /usr/sbin/dkms build -m nvidia -v 390.87 -k 4.19.5-300.fc29.x86_64: Kernel preparation unnecessary for this kernel. Skipping…” and the " ‘make’ -j8 NV_EXCLUDE_BUILD_MODULES=’’ KERNEL_UNAME=4.19.5-300.fc29.x86_64 modules.(bad exit status: 2) Error! Bad return status for module build on kernel: 4.19.5-300.fc29.x86_64 (x86_64)"). The patched version mentioned above builds on fc29 without this error.

I tried several other installer versions as well (including 415.13 available via https://www.nvidia.com/drivers/beta). In my opinion, there clearly was a change between 390.xx and 410.xx drivers that produces the “Segmentation fault/server aborting” problem described above.

So, for the time being, anybody running into problems on fc29 probably should try the 390.87 installer with 4.19 kernel patch.

410.78-on-fc29-nvidia-installer.log (9.7 KB)
410.78-on-fc29-nvidia-bug-report.log.gz (1.05 MB)

Update: nvidia support advised kindly (and very quickly, thank you very much!) that the following lines in the Xorg.0.log file suggest that the root of the problem seems to be that the “glx” module in the fc29 libglx.so library is compiled against X Server 1.19.3 instead of 1.20.3, and that this mismatch is incompatible with their driver:

X.Org X Server 1.20.3
[…]
[ 9.736] Build ID: xorg-x11-server 1.20.3-1.fc29
[…]
[ 9.767] (II) LoadModule: “glx”
[ 9.770] (II) Loading /usr/lib64/xorg/modules/extensions/libglx.so
[ 9.780] (II) Module glx: vendor=“X.Org Foundation”
[ 9.780] compiled for 1.19.3, module version = 1.0.0
[ 9.780] ABI class: X.Org Server Extension, version 10.0

I filed a bug report for fc29 at https://bugzilla.redhat.com/show_bug.cgi?id=1655801.

I have the same xorg-x11-server 1.20.3-1.fc29 but don’t get this message. I get:

[ 19213.772] (II) LoadModule: “glx”
[ 19213.772] (II) Loading /usr/lib64/xorg/modules/extensions/libglx.so
[ 19213.774] (II) Module glx: vendor=“X.Org Foundation”
[ 19213.774] compiled for 1.20.3, module version = 1.0.0
[ 19213.774] ABI class: X.Org Server Extension, version 10.0

/usr/lib64/xorg/modules/extensions/libglx.so is provided by Xorg not nvidia.

$ rpm -qf /usr/lib64/xorg/modules/extensions/libglx.so
xorg-x11-server-Xorg-1.20.3-1.fc29.x86_64

Nvidia uses /usr/lib64/xorg/modules/extensions/libglxserver_nvidia.so for 410.xx and up.

rpm -qf /usr/lib64/xorg/modules/extensions/libglxserver_nvidia.so 
xorg-x11-drv-nvidia-415.18-1.fc30.x86_64
[     8.351] (II) Loading sub module "glxserver_nvidia"
[     8.351] (II) LoadModule: "glxserver_nvidia"
[     8.351] (II) Loading /usr/lib64/xorg/modules/extensions/libglxserver_nvidia.so
[     8.376] (II) Module glxserver_nvidia: vendor="NVIDIA Corporation"
[     8.376] 	compiled for 4.0.2, module version = 1.0.0
[     8.376] 	Module class: X.Org Server Extension
[     8.377] (II) NVIDIA GLX Module  415.18  Thu Nov 15 21:39:03 CST 2018

Thank you for your replies, @sambo57u and @leigh123linux!

@leigh123linux: Your notion that “Nvidia uses /usr/lib64/xorg/modules/extensions/libglxserver_nvidia.so for 410.xx and up.” seems to be correct.

Nvidia tech support advised yesterday “the NVIDIA GLX vendor module (libglxserver_nvidia.so) can coexist with the xserver 1.20 build of libglx.so, but running it alongside the one from xserver 1.19 is not a supported combination”.

What puzzles me right now is why @sambo57u, using the same the same xorg-x11-server 1.20.3-1.fc29 version, gets:

[ 19213.772] (II) Loading /usr/lib64/xorg/modules/extensions/libglx.so
[ 19213.774] (II) Module glx: vendor="X.Org Foundation"
[ 19213.774] compiled for 1.20.3, module version = 1.0.0

while I get:

[ 9.770] (II) Loading /usr/lib64/xorg/modules/extensions/libglx.so
[ 9.780] (II) Module glx: vendor="X.Org Foundation"
[ 9.780] compiled for 1.19.3, module version = 1.0.0

I’ll try to find out more tonight and/or tomorrow (as time permits).

I solved the problem by completely removing and reinstalling xorg-x11-server 1.20.3-1.fc29 (using “dnf remove” and “dnf install”, as “dnf reinstall” apparently was not forceful enough. I then re-installed the nvidia 410.78 driver and @gnome-desktop and rebooted.

One of these steps must have installed a newer version of /usr/lib64/xorg/modules/extensions/libglx.so:

Before remove/install:

[root@hostsystem /]# ll /usr/lib64/xorg/modules/extensions/libglx.so
-rwxr-xr-x. 1 root root 298136 Aug  4  2017 /usr/lib64/xorg/modules/extensions/libglx.so

After remove/install:

[root@hostsystem /]# ll /usr/lib64/xorg/modules/extensions/libglx.so
-rwxr-xr-x. 1 root root 308888 Nov  1 09:06 /usr/lib64/xorg/modules/extensions/libglx.so

Xorg.0.log now shows the correct X Server version when loading “glx”:

X.Org X Server 1.20.3
[...]
[     9.891] Build ID: xorg-x11-server 1.20.3-1.fc29
[...]
[     9.913] (II) LoadModule: "glx"
[     9.914] (II) Loading /usr/lib64/xorg/modules/extensions/libglx.so
[     9.925] (II) Module glx: vendor="X.Org Foundation"
[     9.925]    compiled for 1.20.3, module version = 1.0.0

So, something must have failed and/or left a stale library along the way of the system upgrades from fc27 to fc29.

The 410.78 driver now seems to work without errors (but I don’t have all my displays back yet).

Thanks again to everybody who provided information – I wouldn’t have been able to figure this out without their help.

Happens on Debian buster (testing) too. Same problem, ultimately fixed by uninstalling xserver-xorg-core (and dependencies) and reinstalling them.
It appears that the upgrade to 2:1.20.3 didn’t update /usr/lib/xorg/modules/extensions/libglx.so.

I’ll file a debian bug report.

After upgrading to fc29, I ran into the same segfault with 410.66, 410.78 and 415.25. It was solved by reinstalling xorg-x11-server-Xorg-1.20.3-2.fc29.x86_64. So thanks for that.

However… although the login screen in gdm and lightdm shows full screen, after logging in only the upper left-quadrant is visible. nvidia-settings says the display is running at 3840x2160. I have a Quadro M1200 Mobile with a 3840x2160 display. This artifact in fc29 is true with every driver build from 390.87-patched through 415.25 (although some builds, like 396.54, fail to install).

I can try downgrading to fc28, but I would like to solve this for fc29. Any insight would be greatly appreciated.

although the login screen in gdm and lightdm shows full screen,
after logging in only the upper left-quadrant is visible.

Apologies. Turns out this is the combination of the upgrade to fc29 and xfwm4 and the following two environment variables from my .bash_login/.bashrc:

GDK_DPI_SCALE=0.7 export GDK_DPI_SCALE
      GDK_SCALE=2 export GDK_SCALE

These were something I added to solve uhd screen size issues with java 8 and gdk-based apps. With these in place, restarting xfwm4 with

xfwm4 --replace

would produce the message:

(xfwm4:18878): xfwm4-WARNING **: 12:17:12.229: output size (1920x1080) and logical screen size (3840x2160) do not match

Removing these two lines from the .bashrc prevented xfwm4 from trying to limit the output size.