Regression with multiple GPUs and X screens in 415.25

I have a setup with two nvidia cards (Geforce GT 640 and GTX 980) set up as two different X screens. With the 410.78 driver it works fine, but with 415.25 I end up with a single X screen, the GTX 980 turns off (and I don’t see its outputs in xrandr either, but nvidia-settings shows both GPUs). I use the following minimalist xorg config

Section "ServerFlags"
        Option          "AutoAddGPU" "false"
EndSection

Section "ServerLayout"
        Identifier      "Layout"
        Screen          0 "Screen0"
        Screen          1 "Screen1" RightOf "Screen0"
EndSection

Section "Device"
        Identifier      "Card0"
        BusID           "PCI:2:0:0"
        Driver          "nvidia"
EndSection

Section "Screen"
        Identifier      "Screen0"
        Device          "Card0"
        Option          "Metamodes" "HDMI-0: nvidia-auto-select +0+0"
EndSection

Section "Device"
        Identifier      "Card1"
        BusID           "PCI:1:0:0"
        Driver          "nvidia"
EndSection

Section "Screen"
        Identifier      "Screen1"
        Device          "Card1"
        Option          "metamodes" "DVI-D-1: nvidia-auto-select +0+0"
EndSection

I tried to generate a config with nvidia-settings and two screens, but the result are the same, only the GT 640 works (PCI:1:0:0). However if I remove Screen0 from the config and only leave Screen1, then the GTX 980 works (so I can use either cards, but not at the same time).

I’ve run nvidia-bug-report.sh with both the old and new driver.
410.78: http://0x0.st/s58_.gz
415.25: http://0x0.st/s58L.gz

You’re getting

[    3.708885] NVRM: The NVIDIA probe routine was not called for 1 device(s).
[    3.708886] NVRM: This can occur when a driver such as: 
               NVRM: nouveau, rivafb, nvidiafb or rivatv 
               NVRM: was loaded and obtained ownership of the NVIDIA device(s).
[    3.708887] NVRM: Try unloading the conflicting kernel module (and/or
               NVRM: reconfigure your kernel without the conflicting
               NVRM: driver(s)), then try loading the NVIDIA kernel module
               NVRM: again.
[    3.708889] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  415.25  Wed Dec 12 10:22:08 CST 2018 (using threaded interrupts)

but there’s actually no other driver loaded. You’re getting a lot of iommu errors, tried disabling it?

Um, yes, but 410.78 prints that message too, and it’s working fine afterwards. IOMMU is crucial for my qemu/kvm setup so I’d like to avoid disabling it.

Ok, didn’t check for that but then it’s rather odd that it’s working with 410. Disabling iommu would be just for troubleshooting.

Okay, I’ve booted with intel_iommu=off, but no difference. Log: http://0x0.st/s55n.gz

Oh, and I forgot that I have an init script that manually binds the second card to the nvidia driver, so that’s why it works despite the warning message from nvidia.

echo 0000:02:00.1 > /sys/bus/pci/devices/0000\:02\:00.1/driver/unbind
echo 0000:02:00.0 > /sys/bus/pci/devices/0000\:02\:00.0/driver/unbind
echo 0000:02:00.0 > /sys/bus/pci/drivers/nvidia/bind
echo 0000:02:00.1 > /sys/bus/pci/drivers/snd_hda_intel/bind

You are on Gentoo as I am, you should be the master of your kernel. Telling by the stuff you’re doing, you aren’t. Everything is spilling errors and all you’re doing is workarounds. Rebuild it from the grounds.

https://getyarn.io/yarn-clip/1a979996-de03-40e3-9d49-a4a93be1ed04