2x GTX 970, unable to get SLI working, "video link was not detected"

I just got a duplicate video card and a bigger PSU to help power it. My motherboard (Asus Z170-A) claims to support SLI (though at 8x/8x – I hear this is not practically any different from 16x/16x), and came with a ribbon cable to connect two cards.

After installing the second card I fitted the ribbon cable. It wasn’t clear which way around to put it, or on which of the two possible connectors on each card, though from a few Google searches it appears that it shouldn’t matter.

I’ve used nvidia-xconfig --sli=on to configure SLI to be used (I tried auto first). I don’t want mosaic mode or anything – I just have two displays, and want performance in games when I play them. I don’t know if it makes a difference, but because of the two displays I generally run games in windowed mode but my window manager has no decorations on such “pseudo-fullscreen” windows. I run the compositor compton usually, unless it causes problems.

I rebooted after changing this configuration.

When I look at the /var/log/Xorg.0.log file I see the following:

[    11.445] (EE) NVIDIA(GPU-0): Failed to find a valid SLI configuration.
[    11.445] (EE) NVIDIA(GPU-0): Invalid SLI configuration 1 of 1:
[    11.445] (EE) NVIDIA(GPU-0): GPUs:
[    11.445] (EE) NVIDIA(GPU-0):     1) NVIDIA GPU at PCI:1:0:0
[    11.445] (EE) NVIDIA(GPU-0):     2) NVIDIA GPU at PCI:2:0:0
[    11.445] (EE) NVIDIA(GPU-0): Errors:
[    11.445] (EE) NVIDIA(GPU-0):     - The video link was not detected
[    11.445] (WW) NVIDIA(GPU-0): Failed to find a valid SLI configuration for the NVIDIA
[    11.445] (WW) NVIDIA(GPU-0):     graphics device PCI:1:0:0. Please see Chapter 28:
[    11.445] (WW) NVIDIA(GPU-0):     Configuring SLI and Multi-GPU FrameRendering in the README
[    11.445] (WW) NVIDIA(GPU-0):     for troubleshooting suggestions.
[    11.649] (EE) NVIDIA(GPU-0): Only one GPU will be used for this X screen.

This says to look at a particular part of the README. From this page https://www.nvidia.com/Download/driverResults.aspx/137211/ the README link is dead – it points to http://us.download.nvidia.com/XFree86/Linux-x86/396.54/README/index.html which is a 404. A google search takes me to here: http://download.nvidia.com/XFree86/Linux-x86/340.104/README/sli.html – please point me to a later one if it exists.

Anyway, this readme gives troubleshooting tips. I’m going through them.

  • Make sure that ACPI is enabled in your kernel.

I don’t know how to do this. I haven’t knowingly disabled it. The computer can self-power-off and self-reboot when I tell it to, if that says anything. I’ve googled and there’s no obvious way to “run this command to check whether ACPI is enabled” that I can find.

  • Run lspci to check that multiple NVIDIA GPUs can be identified by the operating system

No problem – yes, both are found.

01:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1)
02:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1)
  • Make sure you have the most recent SBIOS available for your motherboard

OK, I updated the BIOS to the latest and there was no change.

  • The PCI Express slots on the motherboard must provide a minimum link width

Yes, it says it supports 16x for a single card, or 8x/8x for two cards. So that should be fine.

  • How can I determine if my kernel correctly detects my PCI Bridge?

This section describes a command to run to figure out if the PCI bridge was detected. I can’t understand the description of the “good” output and the “bad” output in a way that helps me tell whether my own is “good” or “bad”. I can’t make head nor tail of it. So here’s my output, and maybe someone here can tell me if it’s good or bad, and what to do about it if it’s bad:

-[0000:00]-+-00.0
           +-01.0-[01]--+-00.0
           |            \-00.1
           +-01.1-[02]--+-00.0
           |            \-00.1
           +-14.0
           +-16.0
           +-17.0
           +-1b.0-[03]--
           +-1c.0-[04]----00.0
           +-1c.2-[05-06]----00.0-[06]--
           +-1d.0-[07]--
           +-1f.0
           +-1f.2
           +-1f.3
           +-1f.4
           \-1f.6

I’ve also tried switching the direction of the ribbon cable and attaching the ribbon to the other pair of connectors on the cards. I have the same results.

Any idea what could be wrong?

nvidia-bug-report output is here: https://drive.google.com/file/d/1a2Vimrfhn8N5L4_pWqXx2q2BVtX_aGA7/view

I have driver 396.54, and am running Ubuntu 18.04 on x86_64 architecture.

It turns out that my two cards, though both EVGA GTX 970 boards, were not compatible for SLI.

And then it turns out, now that I’ve bought a compatible one, that SLI on Linux doesn’t support multi-monitor in any way which actually improves performance.

Well I should have done more research before spending money.