X has unknown error with sli MOSAIC on Ubuntu 18.04 2xGTX 960 4Gb SLI

Hi,

I’ve moved from an older pre-UEFI motherboard with my SLI GTX960’s, wherein I was running ubuntu 16.04 and progressively upgraded to 18.04 without too many issues, even using ubuntu’s own alternate-drivers. I’ve now retired my intel i7, asus p7p55d-e pro, for an AMD 2700X, Asrock x470 taichi. Same video cards though.

I have two 1920x1080 Samsung monitors, they are HDMI only. One is plugged into the HDMI port of one card, the other is plugged into the HDMI port of the other card. Both cards have an SLI bridge. Even without an xorg.conf file, Nouveau detects both cards, both monitors, spans both screens and I have a full desktop … but I can’t play games reliably and way back when I was heavily into gaming, I remembered nvidia were far more supported in linux than alternatives. So have kept up with nvidia cards in my linux boxes since and installing nvidia drivers to get GLX/OpenGl support.

Over the last two weeks, I’ve spent d_a_y_s and a fair chunk of nights, uninstalling and reinstalling ubuntu drivers for 390 and 396. I’ve done all the tricks listed on various websites, turned off secure boot, tried keeping it on and signing the modules and using mokutil which didn’t work, tried to work out how to install the key into my bios, didn’t work, so keeping UEFI but turning off secure boot got me this far. I’ve installed lightdm, turned Wayland off in gdm3’s config (uncommented that ‘False’ line), blacklisted nouveau, (then update-initramfs -u) set nomodeset in my /etc/default/grub file (then update-grub), rmmod nouveau, reboot etc. The distribution verion of the drivers are not working. In the end, I did all of the above, and downloaded NVIDIA-Linux-x86_64-396.51. Installing I’ve made sure the system has rebooted clean to either recovery mode or to the login, where I’ve dropped to console, turned off lightdm, and I’ve used with and without -Z -X --dkms.

I have X working on my primary monitor via the one HDMI port right now. Hardware accelerated etc. nvidia-settings sees both GPU-0 and GPU-1, but only has one monitor setup under Xscreen0. Under X server display configuration, it sees the second monitor connected to the second GPU however I can only configure it as a new X screen. If I do that and reboot, I get the login screen on one monitor, but not the other, however can pan my mouse over to the second monitor where it changes to an X. If I login, it will only display X on the main monitor, and I can move the mouse from one monitor to the other, but can’t drag windows or output anything to the second monitor. I could unblacklist nouveau, uninstall nvidia and be back to dual monitors fairly quickly, but thats not what I’m here for. I’m looking for a solution to having the nvidia cards in all their glory.

I know ultimately, I want SLI and MOSAIC to work, with hardware accelerated 2D/3D. However in the absence of that I’ve tried baseMOSAIC (crashes) and Xinerama (limited operation, gets to login screen spanned across two screens but won’t let me past that).

I have tried and tried again to try to setup:

  • nvidia-xconfig --sli=On, or --sli=auto, it crashes the system, I go to a black screen and it dies there with a hard reset required
  • Xinerama (the system with coaxing will boot to the login screen spanned across both my screens, but after I login it goes briefly to black then back to the login screen)
  • setup sli=MOSAIC, however Xorg.0.log tells me there are no valid configurations. I've tried using, --no-composition and a range of options without a great deal of luck
nvidia-xconfig --sli=MOSAIC --metamodes="GPU-0.DFP-0: 1920x1080 +0+0, GPU-1.DFP-1: 1920x1080 +1920+0"
nvidia-xconfig --sli=MOSAIC --metamodes="GPU-0.DFP-0: nvidia-auto-select +0+0, GPU-1.DFP-1: nvidia-auto-select +1920+0"

The xorg conf gets a bit confused when I start messing with Server Configs putting Screen 1 to the RightOf Screen 0 and mixing it up with MOSAIC etc, so I’ve gone back to scratch numerous times but have effectively gotten mixed up between the different technologies. I’ve tried with and without Force composition pipeline, or Force full composition pipeline.

My xorg.conf

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0" 0 0
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
    Option         "Xinerama" "0"
EndSection

Section "Files"
EndSection

Section "InputDevice"

    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/psaux"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"

    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "Samsung S32F351"
    HorizSync       30.0 - 81.0
    VertRefresh     50.0 - 60.0
    Option         "DPMS"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "GeForce GTX 960"
    BusID          "PCI:46:0:0"
EndSection

Section "Screen"

# Removed Option "MetaModes" "GPU-0.DFP-1:1920x1080 +0+0, GPU-1.DFP-1:1920x1080 +1920+0"
# Removed Option "SLI" "MOSAIC"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    Option         "nvidiaXineramaInfo" "False"
    Option         "Stereo" "0"
    Option         "nvidiaXineramaInfoOrder" "DFP-1"
    Option         "metamodes" "GPU-4f3c652e-3afb-ca78-a50b-c9f247d4bab7.HDMI-0: 1920x1080 +0+0 {ForceCompositionPipeline=On, ForceFullCompositionPipeline=On}, GPU-0a441ad9-05fd-9576-12c7-8cf483159aba.HDMI-0: 1920x1080 +1920+0 {ForceCompositionPipeline=On, ForceFullCompositionPipeline=On}"
    Option         "MultiGPU" "Off"
    Option         "SLI" "off"
    Option         "BaseMosaic" "on"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

An alternate xorg.conf I tried to put together
This one I thought had promise. I tried to edit it by hand, and with and without the second Section “Screen” which from the logs looks like it was ignored anyway.
Running it boots to black screen and hangs the computer.

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0" 0 0
    Screen      1  "Screen1" RightOf "Screen0"
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
    Option         "Xinerama" "0"
EndSection

Section "Files"
EndSection

Section "InputDevice"

    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/psaux"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"

    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Samsung"
    ModelName      "Samsung S32F351"
    HorizSync       30.0 - 81.0
    VertRefresh     50.0 - 60.0
    Option         "DPMS"
EndSection

Section "Monitor"
    Identifier     "Monitor1"
    VendorName     "Samsung"
    ModelName      "Samsung S32F351"
    HorizSync       30.0 - 81.0
    VertRefresh     50.0 - 60.0
    Option	    "DPMS"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "GeForce GTX 960"
    BusID          "PCI:46:0:0"
EndSection

Section "Device"
    Identifier     "Device1"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "GeForce GTX 960"
    BusID          "PCI:47:0:0"
EndSection

Section "Screen"

    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    Option         "nvidiaXineramaInfo" "FALSE"
    Option         "Stereo" "0"
    Option         "nvidiaXineramaInfoOrder" "DFP-1"
    Option         "metamodes" "GPU-0.DFP-1: nvidia-auto-select +0+0, 1920x1080 +0+0, GPU-1.DFP-1: nvidia-auto-select +1920+0, 1920x1080 +1920+0"
    Option         "SLI" "MOSAIC"
    Option         "MultiGPU" "Off"
    Option         "BaseMosaic" "off"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

Section "Screen"
    Identifier     "Screen1"
    Device         "Device1"
    Monitor        "Monitor1"
    DefaultDepth    24
    Option         "Stereo" "0"
    Option         "metamodes" "nvidia-auto-select +0+0 {AllowGSYNC=Off}"
    Option         "SLI" "Off"
    Option         "MultiGPU" "Off"
    Option         "BaseMosaic" "off"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

Here is the output from Xorg.0.log

[   372.093] (**) NVIDIA(0): Depth 24, (--) framebuffer bpp 32
[   372.093] (==) NVIDIA(0): RGB weight 888
[   372.093] (==) NVIDIA(0): Default visual is TrueColor
[   372.093] (==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)
[   372.094] (**) NVIDIA(0): Option "nvidiaXineramaInfo" "False"
[   372.094] (**) NVIDIA(0): Option "SLI" "MOSAIC"
[   372.094] (**) NVIDIA(0): NVIDIA SLI Mosaic mode selected.
[   372.094] (**) NVIDIA(0): Option "MetaModes" "GPU-0.DFP-1:1920x1080 +0+0, GPU-1.DFP-1:1920x1080 +1920+0"
[   372.094] (**) NVIDIA(0): Enabling 2D acceleration
[   372.699] (EE) NVIDIA(GPU-0): Failed to initialize SLI Mosaic Mode. This mode is only
[   372.699] (EE) NVIDIA(GPU-0):     available in certain configurations.  Please see Chapter
[   372.699] (EE) NVIDIA(GPU-0):     28: Configuring SLI and Multi-GPU FrameRendering for more
[   372.699] (EE) NVIDIA(GPU-0):     information.
[   372.699] (EE) NVIDIA(GPU-0): Failed to find a valid SLI configuration.
[   372.700] (EE) NVIDIA(GPU-0): Invalid SLI configuration 1 of 1:
[   372.700] (EE) NVIDIA(GPU-0): GPUs:
[   372.700] (EE) NVIDIA(GPU-0):     1) NVIDIA GPU at PCI:46:0:0
[   372.700] (EE) NVIDIA(GPU-0):     2) NVIDIA GPU at PCI:47:0:0
[   372.700] (EE) NVIDIA(GPU-0): Errors:
[   372.700] (EE) NVIDIA(GPU-0):     - Unknown error
[   372.700] (WW) NVIDIA(GPU-0): Failed to find a valid SLI configuration for the NVIDIA
[   372.700] (WW) NVIDIA(GPU-0):     graphics device PCI:46:0:0. Please see Chapter 28:
[   372.700] (WW) NVIDIA(GPU-0):     Configuring SLI and Multi-GPU FrameRendering in the README
[   372.700] (WW) NVIDIA(GPU-0):     for troubleshooting suggestions.
[   372.741] (EE) NVIDIA(GPU-0): Only one GPU will be used for this X screen.
[   373.081] (--) NVIDIA(0): Valid display device(s) on GPU-0 at PCI:46:0:0
[   373.081] (--) NVIDIA(0):     CRT-0
[   373.081] (--) NVIDIA(0):     DFP-0
[   373.081] (--) NVIDIA(0):     DFP-1 (boot)
[   373.081] (--) NVIDIA(0):     DFP-2
[   373.081] (--) NVIDIA(0):     DFP-3
[   373.081] (--) NVIDIA(0):     DFP-4
[   373.082] (II) NVIDIA(0): NVIDIA GPU GeForce GTX 960 (GM206-A) at PCI:46:0:0 (GPU-0)
[   373.082] (--) NVIDIA(0): Memory: 4194304 kBytes
[   373.082] (--) NVIDIA(0): VideoBIOS: 84.06.44.00.4e
[   373.082] (II) NVIDIA(0): Detected PCI Express Link width: 16X
[   373.087] (--) NVIDIA(GPU-0): CRT-0: disconnected
[   373.087] (--) NVIDIA(GPU-0): CRT-0: 400.0 MHz maximum pixel clock
[   373.087] (--) NVIDIA(GPU-0): 
[   373.091] (--) NVIDIA(GPU-0): DFP-0: disconnected
[   373.091] (--) NVIDIA(GPU-0): DFP-0: Internal TMDS
[   373.091] (--) NVIDIA(GPU-0): DFP-0: 330.0 MHz maximum pixel clock
[   373.091] (--) NVIDIA(GPU-0): 
[   373.122] (--) NVIDIA(GPU-0): Samsung S32F351 (DFP-1): connected
[   373.122] (--) NVIDIA(GPU-0): Samsung S32F351 (DFP-1): Internal TMDS
[   373.122] (--) NVIDIA(GPU-0): Samsung S32F351 (DFP-1): 600.0 MHz maximum pixel clock
[   373.122] (--) NVIDIA(GPU-0): 
[   373.122] (--) NVIDIA(GPU-0): DFP-2: disconnected
[   373.122] (--) NVIDIA(GPU-0): DFP-2: Internal DisplayPort
[   373.122] (--) NVIDIA(GPU-0): DFP-2: 960.0 MHz maximum pixel clock
[   373.122] (--) NVIDIA(GPU-0): 
[   373.122] (--) NVIDIA(GPU-0): DFP-3: disconnected
[   373.122] (--) NVIDIA(GPU-0): DFP-3: Internal TMDS
[   373.122] (--) NVIDIA(GPU-0): DFP-3: 165.0 MHz maximum pixel clock
[   373.122] (--) NVIDIA(GPU-0): 
[   373.122] (--) NVIDIA(GPU-0): DFP-4: disconnected
[   373.122] (--) NVIDIA(GPU-0): DFP-4: Internal TMDS
[   373.122] (--) NVIDIA(GPU-0): DFP-4: 330.0 MHz maximum pixel clock
[   373.122] (--) NVIDIA(GPU-0): 
[   373.125] (II) NVIDIA(0): Validated MetaModes:
[   373.125] (II) NVIDIA(0):     "GPU-0.DFP-1:1920x1080+0+0,GPU-1.DFP-1:1920x1080+1920+0"
[   373.125] (II) NVIDIA(0): Virtual screen size determined to be 1920 x 1080
[   373.128] (--) NVIDIA(0): DPI set to (69, 70); computed from "UseEdidDpi" X config
[   373.128] (--) NVIDIA(0):     option
[   373.128] (--) Depth 24 pixmap format is 32 bpp
[   373.150] (--) NVIDIA(0): Valid display device(s) on GPU-1 at PCI:47:0:0
[   373.150] (--) NVIDIA(0):     CRT-0
[   373.150] (--) NVIDIA(0):     DFP-0
[   373.150] (--) NVIDIA(0):     DFP-1 (boot)
[   373.150] (--) NVIDIA(0):     DFP-2
[   373.150] (--) NVIDIA(0):     DFP-3
[   373.150] (--) NVIDIA(0):     DFP-4
[   373.155] (--) NVIDIA(GPU-1): CRT-0: disconnected
[   373.155] (--) NVIDIA(GPU-1): CRT-0: 400.0 MHz maximum pixel clock
[   373.155] (--) NVIDIA(GPU-1): 
[   373.160] (--) NVIDIA(GPU-1): DFP-0: disconnected
[   373.160] (--) NVIDIA(GPU-1): DFP-0: Internal TMDS
[   373.160] (--) NVIDIA(GPU-1): DFP-0: 330.0 MHz maximum pixel clock
[   373.160] (--) NVIDIA(GPU-1): 
[   373.192] (--) NVIDIA(GPU-1): Samsung S32F351 (DFP-1): connected
[   373.192] (--) NVIDIA(GPU-1): Samsung S32F351 (DFP-1): Internal TMDS
[   373.192] (--) NVIDIA(GPU-1): Samsung S32F351 (DFP-1): 600.0 MHz maximum pixel clock
[   373.192] (--) NVIDIA(GPU-1): 
[   373.192] (--) NVIDIA(GPU-1): DFP-2: disconnected
[   373.192] (--) NVIDIA(GPU-1): DFP-2: Internal DisplayPort
[   373.192] (--) NVIDIA(GPU-1): DFP-2: 960.0 MHz maximum pixel clock
[   373.192] (--) NVIDIA(GPU-1): 
[   373.192] (--) NVIDIA(GPU-1): DFP-3: disconnected
[   373.192] (--) NVIDIA(GPU-1): DFP-3: Internal TMDS
[   373.192] (--) NVIDIA(GPU-1): DFP-3: 165.0 MHz maximum pixel clock
[   373.192] (--) NVIDIA(GPU-1): 
[   373.192] (--) NVIDIA(GPU-1): DFP-4: disconnected
[   373.192] (--) NVIDIA(GPU-1): DFP-4: Internal TMDS
[   373.192] (--) NVIDIA(GPU-1): DFP-4: 330.0 MHz maximum pixel clock
[   373.192] (--) NVIDIA(GPU-1): 
[   373.245] (II) NVIDIA(GPU-1): NVIDIA GPU GeForce GTX 960 (GM206-A) at PCI:47:0:0 (GPU-1)
[   373.245] (--) NVIDIA(GPU-1): Memory: 4194304 kBytes
[   373.245] (--) NVIDIA(GPU-1): VideoBIOS: 84.06.44.00.4e
[   373.245] (II) NVIDIA(GPU-1): Detected PCI Express Link width: 16X
[   373.247] (II) NVIDIA: Using 6144.00 MB of virtual memory for indirect memory
[   373.247] (II) NVIDIA:     access.
[   373.266] (II) NVIDIA(0): Setting mode "GPU-0.DFP-1:1920x1080+0+0,GPU-1.DFP-1:1920x1080+1920+0"
[   373.303] (==) NVIDIA(0): Disabling shared memory pixmaps
[   373.303] (==) NVIDIA(0): Backing store enabled
[   373.303] (==) NVIDIA(0): Silken mouse enabled
[   373.303] (**) NVIDIA(0): DPMS enabled
[   373.304] (II) Loading sub module "dri2"
[   373.304] (II) LoadModule: "dri2"
[   373.304] (II) Module "dri2" already built-in
[   373.304] (II) NVIDIA(0): [DRI2] Setup complete
[   373.304] (II) NVIDIA(0): [DRI2]   VDPAU driver: nvidia
[   373.304] (--) RandR disabled
[   373.306] (II) SELinux: Disabled on system
[   373.306] (II) Initializing extension GLX
[   373.306] (II) Indirect GLX disabled.

Select dmesg output

vicomte@Hercules:~$ dmesg | grep -A 50 nvidia
[   17.090092] nvidia: loading out-of-tree module taints kernel.
[   17.090099] nvidia: module license 'NVIDIA' taints kernel.
[   17.090100] Disabling lock debugging due to kernel taint
[   17.094801] PKCS#7 signature not signed with a trusted key
[   17.116232] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[   17.157737] kvm: disabled by bios
[   17.166152] nvidia-nvlink: Nvlink Core is being initialized, major device number 235
[   17.166548] nvidia 0000:2e:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[   17.166735] nvidia 0000:2f:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[   17.166845] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  396.51  Tue Jul 31 10:43:06 PDT 2018 (using threaded interrupts)
[   17.298002] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  396.51  Tue Jul 31 14:52:09 PDT 2018
[   17.324831] PKCS#7 signature not signed with a trusted key
[   17.324837] PKCS#7 signature not signed with a trusted key
[   17.345011] [drm] [nvidia-drm] [GPU ID 0x00002e00] Loading driver
[   17.345012] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:2e:00.0 on minor 0
[   17.345065] [drm] [nvidia-drm] [GPU ID 0x00002f00] Loading driver
[   17.345066] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:2f:00.0 on minor 1
[   17.346242] kvm: disabled by bios

lsmod output

vicomte@Hercules:~$ lsmod | grep nvid
nvidia_drm             40960  4
nvidia_modeset       1089536  7 nvidia_drm
nvidia              14028800  370 nvidia_modeset
drm_kms_helper        172032  1 nvidia_drm
drm                   401408  7 nvidia_drm,drm_kms_helper
ipmi_msghandler        53248  2 nvidia,ipmi_devintf

Select lspci -vvk output

2e:00.0 VGA compatible controller: NVIDIA Corporation GM206 [GeForce GTX 960] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: Gigabyte Technology Co., Ltd GM206 [GeForce GTX 960]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 109
	Region 0: Memory at f6000000 (32-bit, non-prefetchable) 
	Region 1: Memory at e0000000 (64-bit, prefetchable) 
	Region 3: Memory at f0000000 (64-bit, prefetchable) 
	Region 5: I/O ports at f000 
	[virtual] Expansion ROM at f7000000 [disabled] 
	Capabilities: <access denied>
	Kernel driver in use: nvidia
	Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

2f:00.0 VGA compatible controller: NVIDIA Corporation GM206 [GeForce GTX 960] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: Gigabyte Technology Co., Ltd GM206 [GeForce GTX 960]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 110
	Region 0: Memory at f4000000 (32-bit, non-prefetchable) 
	Region 1: Memory at c0000000 (64-bit, prefetchable) 
	Region 3: Memory at d0000000 (64-bit, prefetchable) 
	Region 5: I/O ports at e000 
	[virtual] Expansion ROM at f5000000 [disabled] 
	Capabilities: <access denied>
	Kernel driver in use: nvidia
	Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

nvidia-bug-report.log.gz (147 KB)

Reading other forum posts, it suggests there are issues with SLI being broken. I note that in nvidia settings, without my bridge,I only get Xinerama as an advanced option.

If i have the bridge on, I get baseMosaic or Xinerama. Nothing in nvidia-settings gives me the option to use /modify SLI types. Any use of SLI=On, Auto, Mosaic S arguments to nvidia-xconfig results in a locked up computer with the white underline symbol at the top left of a black screen, not flickering but solid. Ctrl-Alt-F* doesn’t work, num lock and caps lock are unresponsive. The Xorg.0.conf file starts but just stops.

From memory, I can create an X screen for monitor1, and can span using Xinerama, and get to the login prompt, but the login prompt just cycles. If I don’t use Xinerama, with an X screen0 and screen1, i get past the login but my desktop only appears on screen0, monitor0, device0, while screen1 on monitor1 and device1 only shows a black screen with a ‘x’ for the mouse cursor.

To be honest, all the different things I’ve tried and the results are starting to merge in my bleary eyed head. I am happy to try anything in some coordinated fashion to test something, but may need advice on what logs to turn to when I try something that locks up the computer and I am forced to reboot to recovery mode to edit the xorg.conf.

If I had the cash, I’d consider upgrading to a non SLI card but like most, we are waiting on the new range and I’ll still be trying to sort out the issues with the driver and the xorg.conf after changing cards, so I want some confidence in going with nvidia drivers in the future. Happy to write up whatever works, I can’t be the only person who has this issue.

Try disabling iommu and start by just using one monitor and a simple xorg.conf just turning on SLI and maybe set
Option “CoolBits” “2”
Then see if ‘simple’ SLI is working before extending it to a Mosaic setup.

SLI wouldn’t work at all, whether I tried SLI=On or CoolBits “2” or both. SLI Mosaic was being picked up but telling me no valid configurations. Somewhere I googled on my phone suggests its because earlier NVidia drivers would allow (though not support) SLI Mosaic on most GPU’s but in reality support for Mosaic was only intended or allowed for configuations with QUADRO cards. I gave up on the Mosaic dream and temporarily walked away from SLI.

I played for a while without the bridge just trying to get Xinerama going and had all but given up, but with the amd_iommu=off option in my kernel parameters in grub, I struck gold as I cycled through different config options. That IOMMU issue appears to have really nailed it, thanks for pointing it out to me. I’m running kernel 4.15.0-30 generic through the ubuntu distro. I hear there is a heap of IOMMU improvements in 4.16, though I don’t know if they’ll solve this issue. Probably worth noting for ubuntu dev’s that switching iommu in the distro installs of the nvidia drivers on the generic kernel would be a good idea - at least till its sorted.

SLI=off, baseMosaic works! but only if I have the bridge on the cards!

Full span across screens! glmark2 indicates GLX/2D/3D acceleration across the whole lot, and I can drag 3D rendered content from one screen to the other without issue (I know some posts who have indicated 3D acceleration has only worked on one X screen for them).

Oddly, the system will only recognise the baseMosaic option if I have the bridge on the cards. If I take the bridge off, baseMosaic won’t work Xorg.0.log says ‘video link error’ or some such which I inferred was bridge related.

Here’s my xorg.conf below (yeah I know I can delete Screen “Screen1” but I don’t want to mess with something thats working right now, xorg will ignore it anyway.

# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig:  version 396.51  (buildmeister@swio-display-x64-rhel04-14)  Tue Jul 31 16:04:54 PDT 2018


Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0"
    Screen      1  "Screen1" RightOf "Screen0"
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
EndSection

Section "Files"
EndSection

Section "InputDevice"

    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/psaux"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"

    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Samsung"
    ModelName      "Samsung S32F351"
    HorizSync       30.0 - 81.0
    VertRefresh     50.0 - 60.0
    Option         "DPMS"
EndSection

Section "Monitor"
    Identifier     "Monitor1"
    VendorName     "Samsung"
    ModelName      "Samsung S32F351"
    HorizSync       30.0 - 81.0
    VertRefresh     50.0 - 60.0
    Option         "DPMS"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "GeForce GTX 960"
    BusID          "PCI:46:0:0"
EndSection

Section "Device"
    Identifier     "Device1"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "GeForce GTX 960"
    BusID          "PCI:47:0:0"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    Option         "BaseMosaic" "True"
    Option         "SLI" "off"
    Option         "MetaModes" "GPU-0.DFP-1: 1920x1080 +0+0, GPU-1.DFP-1: 1920x1080 +1920+0"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

Section "Screen"
    Identifier     "Screen1"
    Device         "Device1"
    Monitor        "Monitor1"
    DefaultDepth    24
    Option         "BaseMosaic" "True"
    Option         "SLI" "off"
    Option         "MetaModes" "GPU-0.DFP-1: 1920x1080 +0+0, GPU-1.DFP-1: 1920x1080 +1920+0"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

BaseMosaic on geforce type cards has been limited to three displays starting with ~v300 driver opposed to 32(?) displays on quadros.
IOMMU/ACS can stop sli/mosaic from working due to device isolation depending on cpu/chipset.
The rest of the config, as you already found out, is highly erratic and follows no transparent logic or even the sparse documentation.