X-Server crashes on Debian buster when trying to enable 2-way SLI for 2xGTX-660Ti

Hi,

I am trying to get 2-way SLI running with 2 GTX 660TI, one factory OC, one standard.
I have one monitor connected via HDMI to the first GPU.
The system is running Debian buster with driver 410.104.

When starting X without display manager, as with command

$ startx

the X server crashes with the following error:

[   108.261] (II) NVIDIA(GPU-0): NVIDIA SLI enabled.
[   108.265] (EE) NVIDIA(GPU-0): Failed to select a display subsystem.
[   108.713] (EE) NVIDIA(GPU-0): Only one GPU will be used for this X screen.
[   108.714] (EE) NVIDIA(GPU-0): Failed to select a display subsystem.
[   108.714] (EE) NVIDIA(0): Failing initialization of X screen 0
...
[   108.714] (EE) Screen(s) found, but none have a usable configuration.
[   108.714] (EE) 
Fatal server error:
[   108.714] (EE) no screens found(EE)

xorg.conf.d/…conf:

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "Unknown"
    HorizSync       28.0 - 33.0
    VertRefresh     43.0 - 72.0
    Option         "DPMS"
EndSection
Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "GeForce GTX 660 Ti"
    BusID          "PCI:1:0:0"
EndSection
Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    Option         "SLI" "auto"
#    Option         "Coolbits" "4"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

Where can I attach the ‘nvidia-bug-report.log.gz’?
Edit: found :)

Thanks!
nvidia-bug-report.log.gz (1.54 MB)

Anyone?
I still don’t have any clue, because I cannot find any info on this error message on the internet. Does anyone know what it could mean?

I can provide more info if you tell me what you need!

Apr 28 16:36:56 NOMAC kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000917c:0:0:0x00000033
Apr 28 16:36:56 NOMAC kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000917c:1:0:0x00000033
Apr 28 16:36:56 NOMAC kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000917c:2:0:0x00000033
Apr 28 16:36:56 NOMAC kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000917c:3:0:0x00000033
Apr 28 16:36:56 NOMAC kernel: nvidia-modeset: ERROR: GPU:0: Failed detecting connected display devices

SLI setups in Linux are quite buggy and even more with recent drivers where this specific bug appeared. You can try either upgrading to the latest v430 driver or downgrade to the legacy v390.
Anyway, there’s not much benefit you’re getting from a SLI setup and it’s broken with 5.0 kernels.

Ok, so I tried different stuff.

First try was to upgrade to 418. That worked and there was no crash anymore. However at shutdown there was now a complete freeze, lots of stacktrace on the terminal and no reaction to any key presses, I guess that’s what you’d call a blue screen on Windows.

So I tried the legacy 390 driver. Same freeze, SLI works until you try to shutdown the system.

Now I tried the 430 and I had a similar problem like in the beginning with 410.
The X server crashes at startup.
I tried it with “SLI” “On” and also with “BaseMosaic” “True”, same result.
Error is the same as before:

(EE) NVIDIA(GPU-0): Failed to select a display subsystem.

However I do NOT get the nvidia-modeset ERROR: GPU:0: Failed to query display engine channel state error.

The only thing that I noticed is that the GLX module that is loaded is not the Nvidia one but the Xorg one.
Is that normal?
Logs look something like this (notice the last line)

(II) LoadModule: "glx"
(II) Loading /usr/lib/xorg/modules/extensions/libglx.so
(II) Module glx: vendor="X.Org Foundation"
	compiled for 1.20.3, module version = 1.0.0
	ABI class: X.Org Server Extension, version 10.0(II) 

...

Loading sub module "glxserver_nvidia"
(II) LoadModule: "glxserver_nvidia"
(II) Loading /usr/lib/xorg/modules/extensions/libglxserver_nvidia.so
(II) Module glxserver_nvidia: vendor="NVIDIA Corporation"
 	compiled for 1.6.99.901, module version = 1.0.0
 	Module class: X.Org Server Extension
(II) NVIDIA GLX Module  430.09  Thu Apr 18 02:29:10 CDT 2019

...

(II) Initializing extension GLX
(II) Initializing extension GLX
(II) Indirect GLX disabled.
(II) GLX: Another vendor is already registered for screen 0

So I have SLI working on 390 and 418, but I cannot shutdown the system properly.
And then 410 and 430 don’t like SLI at all, that sucks.

Does that mean that SLI is just completely broken for Linux, or is there something I can do to get rid of lets say the shutdown freezes?
nvidia-bug-report.log.gz (1.63 MB)

The dual glx modules being loaded is normal, the driver layout has changed starting with v410, preparing for render offloading.
You should consider SLI broken. Even if it worked, it’s completely, utterly, ultimately useless on Linux.