nvidia-xconfig --sli=on causes infinite loop on Ubuntu 18.04.1

Hello,

I have the following setup:

Motherboard:Gigabyte AORUS X399 xtreme
CPU:AMD 1950x threadripper
GPU:2 NVIDIA 1080 ti graphics cards connected via SLI bridge
Operating System: Ubuntu 18.04.1 LTS
Kernel Version: 4.15.0-43-generic(x86_64)
Driver version: 390.77

When I try to enable SLI by using nvidia-xconfig --sli=on and reboot the system, Ubuntu does not succeed with normal bootup and keeps coming back to login - thus going into infinite loop.

Going into recovery mode and entering shell as root lets me see that SLI = on change was made in /etc/X11/xorg.conf.

It seems Ubuntu 16.04 LTS also suffered from a similar issue. Nvidia created a ticket #1929896. But the issue never got resolved.

Please let me know if anyone has run into a similar problem.

nvidia-smi output::
Thu Dec 20 18:28:53 2018
±----------------------------------------------------------------------------+
| NVIDIA-SMI 390.77 Driver Version: 390.77 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108… Off | 00000000:0A:00.0 Off | N/A |
| 0% 27C P8 13W / 250W | 2MiB / 11178MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 GeForce GTX 108… Off | 00000000:42:00.0 On | N/A |
| 0% 29C P8 14W / 250W | 735MiB / 11175MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 1 1735 G /usr/lib/xorg/Xorg 30MiB |
| 1 1773 G /usr/bin/gnome-shell 50MiB |
| 1 2069 G /usr/lib/xorg/Xorg 257MiB |
| 1 2204 G /usr/bin/gnome-shell 195MiB |
| 1 2774 G …uest-channel-token=17643524323342171964 198MiB |
±----------------------------------------------------------------------------+

nvidia-bug-report.log.old.gz (154 KB)

Please run nvidia-bug-report.sh as root with sli enabled and attach the resulting .gz file to your post. Hovering the mouse over an existing post of yours will reveal a paperclip icon.
https://devtalk.nvidia.com/default/topic/1043347/announcements/attaching-files-to-forum-topics-posts/

Even though the file says nvidia-bug-report.log.old.gz - its the one corresponding to the error behavior with SLI=on .

[    30.652] (EE) NVIDIA(GPU-0): Failed to initialize DMA.

Please disable iommu in bios or try with kernel parameter
iommu=off

Disabled iommu through BIOS and kept SLI to Auto. It seems to be using both GPUs now. Hopefully this can be added to the instructions for enabling SLI for linux.

Unfortunately disabling iommu results in loss of all USB ports. So not an acceptable work around.

Please upgrade bios, if that doesn’t help, try using the kernel parameter instead of the bios switch.

Upgraded bios and set the grub kernel param to iommu=soft - SLI works - but USB ports 2.0 and 3.0 are still non functional.

Do you have settings in bios for EHCI and XHCI Handoff? If so, set to enable.

EHCI and XHCI are set to enabled as is USB mass drive support.

The weird behavior is that I tried a usb stick and it works fine on the ports. But when I try my external harddrive - it starts to beep and does not recognize the drive.

I eliminated problems with the drive, I have a laptop with Ubuntu 18.04 on it and the external drive works just fine there - so its not an issue with the connector cable or the drive itself.

This external drive is my backup location to be used with Ubuntu’s backup utility. But these drives do not work on the ports. The only thing I can think of is that somehow the voltage to these ports is altered - I read online that enabling IOMMU enables the usb. But of course doing so interferes with nvidia sli.

I should also mention that I have updated /etc/gdm3/custom.conf with the following:

GDM configuration storage

See /usr/share/gdm/gdm.schemas for a list of available options.

[daemon]

Uncoment the line below to force the login screen to use Xorg

WaylandEnable=false

Enabling automatic login

AutomaticLoginEnable = true

AutomaticLogin = user1

Enabling timed login

TimedLoginEnable = true

TimedLogin = user1

TimedLoginDelay = 10

[security]

[xdmcp]

[chooser]

[debug]

Uncomment the line below to turn on debugging

More verbose logs

Additionally lets the X server dump core if it crashes

#Enable=true

This seems to be a “special feature” of some Gigabyte mainboards, the iommu setting interfering with the usb controllers, depending on setting usb2 or usb3 not working. I guess the thumbdrive is usb2 and works while the harddrive is usb3 and doesn’t work.
I didn’t mean the settings to enable the x/ehci controllers but a e/xhci handoff feature, which handoffs usb controll from bios to os.

EHCI, XHCI and USB mass support flags are all set to enabled on the motherboard with bios setup.
3.0 ports still not working.

My motherboard is gigabyte aorus x399 xtreme.