Multi GPU primary selection

I read many posts about this, but I just realized that GPU order is not as expected after installation.
nvidia driver seems to order GPUs with their UUID and not as in slot installation.

So just review what I have:

lspci -t

+-02.0-[03]--+-00.0
|            \-00.1
+-03.0-[01]--+-00.0
|            \-00.1

nvidia-xconfig --query-gpu-info

GPU #0:
  Name      : GeForce GTX 960
  UUID      : GPU-c32807bd-39a2-1d84-8ddb-f14ba57a691a
  PCI BusID : PCI:3:0:0
  Number of Display Devices: 0
GPU #1:
  Name      : GeForce GTX 960
  UUID      : GPU-8c621f08-00f4-cb9e-8c8d-2abb698d3191
  PCI BusID : PCI:1:0:0
  Number of Display Devices: 3

As you can see PCI BusID: PCI:1:0:0 GPU #1 is at my first pcie slot and GPU #0 is on 4th slot (as recommended for Asus X99-S motherboard). My graphic cards are MSI GTX960 Armor OC 2GB GD5.

I just set the following config in /etc/udev/rules.d/99-nvidia.rules

SUBSYSTEM=="pci",ATTRS{vendor}=="0x10de",DRIVER=="nvidia",TAG+="seat",TAG+="master-of-seat"

That allow to set primary GPU #1 instead of GPU #0 as follows:

nvidia-smi -L
GPU 0: GeForce GTX 960 (UUID: GPU-8c621f08-00f4-cb9e-8c8d-2abb698d3191)
GPU 1: GeForce GTX 960 (UUID: GPU-c32807bd-39a2-1d84-8ddb-f14ba57a691a)

But this was not suficient to get SLI working with GPU in first slot.
Xorg always view PCI:3:0:0 as the parent device:

[  7067.040] (**) NVIDIA(0): Depth 24, (--) framebuffer bpp 32
[  7067.040] (==) NVIDIA(0): RGB weight 888
[  7067.040] (==) NVIDIA(0): Default visual is TrueColor
[  7067.040] (==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)
[  7067.040] (**) NVIDIA(0): Option "Stereo" "0"
[  7067.040] (**) NVIDIA(0): Option "nvidiaXineramaInfoOrder" "DFP-7"
[  7067.040] (**) NVIDIA(0): Option "SLI" "On"
[  7067.040] (**) NVIDIA(0): Option "MultiGPU" "Off"
[  7067.040] (**) NVIDIA(0): Option "BaseMosaic" "off"
[  7067.040] (**) NVIDIA(0): Stereo disabled by request
[  7067.040] (**) NVIDIA(0): NVIDIA SLI auto-select rendering option.
[  7067.040] (**) NVIDIA(0): NVIDIA Multi-GPU disabled.
[  7067.040] (**) NVIDIA(0): Option "MetaModes" "DP-3: nvidia-auto-select +0+1024, DVI-I-0: nvidia-auto-select +336+0, DP-5: nvidia-auto-select +1920+1024"
[  7067.040] (**) NVIDIA(0): Enabling 2D acceleration
[  7068.438] (EE) NVIDIA(GPU-0): The NVIDIA graphics device PCI:1:0:0 bound to this SLI X
[  7068.438] (EE) NVIDIA(GPU-0):     screen is not the SLI parent device.  This configuration
[  7068.438] (EE) NVIDIA(GPU-0):     is not currently supported.  Please add 'BusID
[  7068.438] (EE) NVIDIA(GPU-0):     "PCI:3:0:0"' to the SLI "Device" section in the X
[  7068.438] (EE) NVIDIA(GPU-0):     configuration file.

In Windows 10 Enterprise this doesn’t happen. It works with SLI and use as parent graphic card PCI:1:0:0 mentioned in Linux.

I found another possible way that could use CUDA_VISIBLE_DEVICES to set the order of GPUs, but I can’t find a way to use it with Xorg.
Anyone knows how to do it?

My /etc/X11/xorg.conf.d/01-nvidia.conf configuration:

Section "InputDevice"
    # generated from data in "/etc/conf.d/gpm"
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol"
    Option         "Device" "/dev/input/mice"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
    # HorizSync source: edid, VertRefresh source: edid
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "Ancor Communications Inc ASUS VX239"
    HorizSync       24.0 - 83.0
    VertRefresh     50.0 - 75.0
    Option         "DPMS"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "GeForce GTX 960"
    BusID          "PCI:1:0:0"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    Option         "Stereo" "0"
    Option         "nvidiaXineramaInfoOrder" "DFP-7"
    Option         "metamodes" "DP-3: nvidia-auto-select +0+1024, DVI-I-0: nvidia-auto-select +336+0, DP-5: nvidia-auto-select +1920+1024"
    Option         "SLI" "On"
    Option         "MultiGPU" "Off"
    Option         "BaseMosaic" "off"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

I was trying to use SLI in Linux with latest nvidia 358 driver, but as already mentioned in forum, it is not possible. In Windows, Nvidia driver has many more features and capabilities than those available to linux guys. I thought it was better now, since physx support is available to linux.
I didn’t know the SLI limitation in Linux for multi-monitor using the same screen.

Although is possible to use one screen with SLI using only one monitor, and then another screen using the remaining screens without SLI.

As a developer I was interested to explore all nvidia features and test it in my linux desktop, but it seems not to be possible for now. I hope to have all features in Linux as available in Windows with nvidia-settings in the near future.

For now I will only use 2nd graphic card in VMs using kvm passthrough or using both when using only one monitor with SLI.

I already try to exchange GPUs in different slots but all the same.

In this last tests I used linux kernel 4.4.0-r1 and nvidia GLX Module 361.18.

My Xorg configuration for nvidia sli:

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0" 0 0
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
    Option         "Xinerama" "0"
EndSection

Section "InputDevice"
        Identifier  "Keyboard0"
        Driver      "kbd"
EndSection

Section "InputDevice"
        Identifier  "Mouse0"
        Driver      "mouse"
        Option      "Protocol" "auto"
        Option      "Device" "/dev/input/mice"
        Option      "ZAxisMapping" "4 5 6 7"
EndSection

Section "Monitor"
    # HorizSync source: edid, VertRefresh source: edid
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "Ancor Communications Inc ASUS VX239"
    HorizSync       24.0 - 83.0
    VertRefresh     50.0 - 75.0
    Option         "DPMS"
EndSection

Section "Monitor"
    # HorizSync source: edid, VertRefresh source: edid
    Identifier     "Monitor1"
    VendorName     "Unknown"
    ModelName      "LG Electronics LG TV"
    HorizSync       30.0 - 83.0
    VertRefresh     58.0 - 62.0
    Option         "DPMS"
EndSection

Section "Device"
        Identifier  "Device0"
        Driver      "nvidia"
        VendorName  "NVIDIA Corporation"
        BoardName   "GeForce GTX 960"
        BusID       "PCI:3:0:0"
        Option      "SLI" "On"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    Option         "Stereo" "0"
    Option         "nvidiaXineramaInfoOrder" "DFP-7"
    Option         "metamodes" "DP-3: nvidia-auto-select +0+0, DP-5: nvidia-auto-select +1920+0"
    Option         "SLI" "On"
    Option         "MultiGPU" "Off"
    Option         "BaseMosaic" "Off"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

My motherboard is Asus X99-S.
The GPU that it refers as being the parent is the one in 4th slot and not in the 1st slot. With my actual CPU it allows slot1 with pcie 16x and slot2 with pcie 8x.

The parent choice is the most strange thing here. Do anyone knows how is parent id defined?

With Windows nvidia driver there is no problem with parent.