Dual GeForce GTX or Titan V on mobo, unable to display upon launching Ubuntu 18.04

I have two GeForce GTX 1080 and two Titan V,
all have 12 GB each.
OS is Ubuntu 18.04.
One GeForce GTX 1080 GPU is an ASUS brand, and
the other GeForce GTX 1080 is an Zotec brand.
Open source NVIDIA GPU driver is 410.48.
Also tried proprietary NVIDIA driver 390.xx

Mobo is X399 GAMING
Adequate power supply for 4 GPUs.

Use case:
I plan to have all of the above GPU on said mobo,
and only need ONE display output from only ONE GPU.
I am using it for machine learning, not gaming.

Here are my observations on the problem I have:

  1. Single GPU installation on mobo:
    1.1) When there is ONLY either the above Zotex or Asus GPU, on cold boot,
    the boot loader menu displayed a list of OSes to run.
    I do not have Windows as a OS choice, only Linux.
    Then I selected Ubuntu 18.04, the OS was able to boot up.
    and the display with HDMI cable works great.
    There were no crashes or blank screen.

The same observation applies to single Titan V installation.

  1. Two GPUs of same chip on mobo:
    2.1) When I have both Zotec and Asus GPUs on the mobo,
    and the HDMI connector is connected to either GPU, the boot loader runs on cold boot.
    Then after I selected Ubuntu 18.04, a command line prompt appears, and
    screen background becomes black with the blinking command line. Then nothing happens.

I also tried using other available HDMI output on the GPUs with same black screen.

2.2) Likewise, when I only have one Titan V GPU on the mobo, everything works.
When I have TWO Titan V GPUs, Ubuntu 18.04 cannot boot up upon its selection
at boot menu time.

From 1) and 2) above, it seems the display driver is working, at least for single GPU installation.

Note that I tried both drivers:
Open source NVIDIA GPU driver is 410.48.
Proprietary NVIDIA driver 390.xx.
Both have same problem as described above!

Why do I have this display black screen problem when
I have these two (or more) GPUs on mobo?

Is there a way to select ONLY one GPU card for display via BIOS/UEFI?

What can I do from Ubuntu 18.04 side?

What else did I missed?

Please advice. Thank you.

Please run nvidia-bug-report.sh as root and attach the resulting .gz file to your post. Hovering the mouse over an existing post of yours will reveal a paperclip icon.
https://devtalk.nvidia.com/default/topic/1043347/announcements/attaching-files-to-forum-topics-posts/

Hi generix:

Thank you for helping.
I will attach the nvidia-bug-report.log.gz after this post.
Keep in mind the attached file is a bug report when
there is only ONE Nvidia Titan V GPU installed.

I am not able to get a report for when two or more GPUs
are in the PCIe slots because the screen went blank under these
conditions.

nvidia-bug-report.log.gz (592 KB)

Please remove your current xorg.conf and replace it with one that only contains this:

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "TITAN V"
    BusID          "PCI:65:0:0"
    Option         "AllowEmptyInitialConfiguration" "true"
EndSection

Reboot to make sure it still works, then add another card with monitor still connected to the current Titan V.

Hi generix:

I tried it with Nvidia 390 proprietary driver, one GPU works as usual,
but it did not work when using two GPUs.

Then I downloaded version 418.xx from Nvidia site and installed it.
Single GPU works, but not 2 GPUs.
The installed 418 driver shows up as open source Nvidia driver
in Software & Updates under “additional drivers” tab!

The xorg.conf file was changed when I ran NVIDIA-Linux-x86_64-418.56.run.
The xorg.conf now has this for Nvidia device:

Section “Device”
Identifier “Device0”
Driver “nvidia”
VendorName “NVIDIA Corporation”
EndSection

I will attach the nvidia-bug-report file for driver 418.xx also.
The bug report has significantly much less error warnings now!
Thanks again.
nvidia-bug-report.log.gz (1.08 MB)

Did you use the provided xorg.conf from post #4 at all?

Oops, I forgot to mention I did do what you suggested…

Yes, I replaced all four definitions, Device0, …, Device3
with one from your suggestion.

Again the single GPU works, and with
two GPUs the boot menu did not even come up!
The display was always blank!

You shouldn’t replace the device sections, you should only use the provided snippet as complete xorg.conf.

Issue #1) Do you mean the xorg.conf should only have the
following ONE section definition, and nothing else?
Section “Device”
Identifier “Device0”
Driver “nvidia”
VendorName “NVIDIA Corporation”
BoardName “TITAN V”
BusID “PCI:65:0:0”
Option “AllowEmptyInitialConfiguration” “true”
EndSection

Issue#2) Now that I have 418.xx drivers (NVIDIA open source?) installed,
do I replace the existing xorg.conf with just one section as described in
Issue#1 above?

As another observation, why does the new xorg .conf created during installation
of 418.xx drivers not include BusID in ‘Section “Device”’?

Issue#3) Do you want me to revert back to Nvidia proprietary drivers version 390.xx
and then try the single ‘Section “Device”’ suggestion?

Thank you for helping.

  1. Yes.
    2)+3) You shouldn’t have used the .run installer in the first place. Please run it again with the --uninstall option to uninstall, then use the driver from the Ubuntu graphics ppa: https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa
    To install the driver from that use
sudo apt install nvidia-driver-418

Hi:

Here are the sequences of actions and messages:

$ sudo ./NVIDIA-Linux-x86_64-418.56.run --uninstall

Msg to revert to previous xorg.conf file.


WARNING: Your driver installation has been altered since it was initially installed;
this may happen, for example, if you have since installed the NVIDIA driver through a mechanism other than nvidia-installer (such as your distribution’s native package management system). nvidia-installer will attempt to uninstall as best it can. Please see the file
‘/var/log/nvidia-uninstall.log’ for details.

ERROR: Unable to create ‘/lib/modules/4.15.0-43-generic/updates/dkms/nvidia-uvm.ko’ for copying (No such file or directory)

ERROR: Unable to create ‘/lib/modules/4.15.0-43-generic/updates/dkms/nvidia-modeset.ko’ for copying (No such file or directory)
ERROR: Unable to create ‘/lib/modules/4.15.0-43-generic/updates/dkms/nvidia.ko’ for copying (No such file or directory)
ERROR: Unable to create ‘/usr/lib/mpich/lib/nvidia/xorg/nvidia_drv.so’ for copying (No such file or directory)
ERROR: Unable to create ‘/usr/lib/mpich/lib/tls/libnvidia-tls.so.390.116’ for copying (No such file or directory)
ERROR: Unable to create ‘/usr/lib/i386-linux-gnu/tls/libnvidia-tls.so.390.116’ for copying (No such file or directory)
ERROR: Unable to create ‘/usr/share/egl/egl_external_platform.d/10_nvidia_wayland.json’ for copying (No such file or directory)
WARNING: Failed to restore some backed up files/symlinks, and/or their attributes. See /var/log/nvidia-uninstall.log for details
WARNING: Failed to delete some directories. See /var/log/nvidia-uninstall.log for details.
Uninstallation of existing driver: NVIDIA Accelerated Graphics Driver for Linux-x86_64 (418.56) is complete.

I could not find the log file:
$ cat var/log/nvidia-uninstall.log
cat: var/log/nvidia-uninstall.log: No such file or directory

Then did this:
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update

$ sudo apt install nvidia-driver-418
Reading package lists… Done
Building dependency tree
Reading state information… Done
nvidia-driver-418 is already the newest version (418.56-0ubuntu0~gpu18.04.1).
… …

Now I make /etc/X11/xorg.conf file with only this:

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "TITAN V"
    BusID          "PCI:65:0:0"
    Option         "AllowEmptyInitialConfiguration" "true"
EndSection

After the change to xorg.conf, we get this:
$ cat /etc/X11/xorg.conf
Section “Device”
Identifier “Device0”
Driver “nvidia”
VendorName “NVIDIA Corporation”
BoardName “TITAN V”
BusID “PCI:65:0:0”
Option “AllowEmptyInitialConfiguration” “true”
EndSection

Reboot with single GPU, Titan V.
Display works.

Now shutdown, install 2nd Titan V GPU.
Keep display connected to same GPU as before installing 2nd GPU.
Boot.
Result: Display is blank!

What did I missed?

Odd, shouldn’t happen. Can you switch to VT using ctrl-alt-f1 while the second gpu is installed to run nvidia-bug-report.sh? If not, can you log in using ssh?

Hi generix:

When I have one GPU, and I hit Ctrl-alt-f1, the screen went blank.
I rebooted and it still went blank!
The bootloader now does not even shows up!
And now I cannot find out what is it local host address.
So SSH is useless at the moment.

So… I am thinking of starting over with a new OS install.
Actually Ubuntu is not my preferred OS.
It was there originally since this is not a new system.
So it is a good time to switch while I am having unproductive problems like these.

I am going to install Linux Mint 19.1 (a fork of Ubuntu Bionic Bdaver, 18.xx)
with all 4 GPUs installed before installing Linux Mint 19.1.

Since there is no nvidia driver in the Linux Mint ISO image,
it will be running in software graphics rendering mode (i.e. without video hardware).

After installation, reboot will load Linux Mint in
software graphics rendering mode.

I have looked at these without much help since they are associated with Ubuntu 16.xx:

  1. https://devtalk.nvidia.com/default/topic/1030445/cuda-setup-and-installation/dual-gpu-system-in-ubuntu-16-04/

  2. https://devtalk.nvidia.com/default/topic/1003017/how-do-i-set-one-gpu-for-display-and-the-other-two-gpus-for-cuda-computing-/

Recall I only want to use one GPU for display, and all 4 for deep learning.

My hunch is that you want me to install nvdia drivers via PPA, and then configure or create the /etc/X11/xorg.conf file.
Is there a script that would generate xorg.conf file automatically based on the nvidia GPUs in
PCIe slots?

As I am responding to your request, Linux Mint 19.1 is being installed.

Please advice step by step what to do next
to install the nvidia drivers.

Thanks.

You should use the driver package from ppa, it also contains an xorg config snippet which autoloads the nvidia driver. You could also use the xorg.conf from your earlier experiments which contains all nvidia gpus and add

Option "AllowEmptyInitialConguration" "true"

to each device section which allows the nvidia driver to start without having a mnitor attached. The Problem with that setup is that every gpu will have its own screen while the DE will probably only use screen 0, all eventual displays on other gpus will display a black screen.
The black screen which persists on reboot just on switching to VT is something really weird which rather points in a different direction, maybe some problem with the monitor or even power supply. So you should really use ssh to check what’s going on when the monitor goes blank.

Hi Generic:

I finally installed Linux Mint 19.1 and at first ran in software rendering mode
since the OS was not able to recognize the GPU cards.
When I tried to add Nvidia driver PPA I get errors similar to this link:

https://forums.linuxmint.com/viewtopic.php?f=90&t=293443&p=1629530#p1629530

Do you have any idea why?

Then I add the open source Nvidia driver 430 via the Driver Manager GUI.
And was able to reboot and not have the software rendering mode anymore.

I tried moving the monitor’s connector to another Nvidia card, but it does not display
anything at all. So it seems the only GPU card that can display now is
the Nvidia card that has a monitor’s cable plugged in when I first install the
430 Nvidia driver.

Furthermore, when I ran performance test using :

phoronix-test-suite default-benchmark openarena xonotic tesseract gputest unigine-valley

as suggested via the Nvidia PPA, the GPU with the ONLY monitor connected gets HOT!
The other three GPUs are not HOT!
This means only one GPU card that was attached to the display is working!
In addition, this phoronix test also notice only ONE Nvidia GPU during test setup.

Q1:
a) How do I go about enabling ALL GPU cards to display when only one is connected to a monitor.
Meaning how can I easily move the ONLY display cable I have, from one GPU card to another
and each time. the display still works!

b) If I only want another single GPU to drive a monitor, how do I switch to this GPU and disable display of the other GPU that was connected to the monitor?

Q2:
I want to use this system for machine learning.
Like run Nvidia deep learning tools and other such framework (Tensorflow, CNTK, …etc)
What do I have to do to enable ALL the GPUs to function as oppose to
one GPU that is now working.

Note ALL 4 GPUS are now plugged into PCIe slots,
and only one is able to display on monitor.

Thanks.

Graphics applications will always use just one GPU (unless the application is developed to use Vulkan/DX12 explicit Multi-GPU), there’s no magic way to stack graphics performance by adding another GPU.

Q1) a)This is not easily achievable and also not a very sensible setup. A theoretical way: as prerequisite, enabling all GPUs in xorg.conf requires using the AllowEmptyInitialConguration option set for all gpus. This will then give you 4 separate screens 0…4. The DE will only display on screen 0. So you would have to enable Xinerama to create one virtual screen. If this setup doesn’t crash, you might be able to move the display cable from one card to another.
Downsides of a Xinerama setup are:

  1. having to turn off Compositing, thus limiting the choice of working window managers.
  2. possibly falling back to software rendering, depending on which gpu the monitor is currently connected to.
  3. not very stable.

b) for a single GPU setup, use the minimal xorg.conf from post #4 and adjust the BusID to the GPU to be used (in decimal, lspci displays hex values!), then restart X.

Q2) CUDA will per default use all available Nvidia GPUs, it is completely independent of running an Xserver on it (unless you use CUDA/GL interop). A running Xserver even can have adverse effects on it:
https://devtalk.nvidia.com/default/topic/1043126/linux/xid-8-in-various-cuda-deep-learning-applications-for-nvidia-gtx-1080-ti/post/5291377/#5291377
Just run the deviceQuery cuda sample and you should see all GPUs. Running multiple GPUs require nvidia-persistenced to be running for stability.

Hi generix:

It has been a while.

As a summary, here is my status:
I have two GeForce GTX 1080 and two Titan V,
all have 12 GB each. OS is Ubuntu 18.04.

I am able to display graphics with only one
of the 4 Nvidia cards.
Not a big deal.

Now I want to install the proper Nvidia tools
for Tensorflow.

Tensorflow will be installed directly onto the OS.

I looked here:
https://www.tensorflow.org/install/gpu
and here : https://www.nvidia.com/Download/index.aspx?lang=en-us

It is quite confusing.
Since I have two different kinds of Nvidia GPUs,
it looks like I need to install two kinds of drivers, correct?

Is there a simple script I could run to do a one shot installation for
both GPU card types?

Please advice how to install and where to do one shot install of the correct Nvidia tools
compatible with Tensorflow.

Thank you.

Please don’t follow that Tensorflow howto, it will probably break your system.
All your gpus are supported by the latest driver (430), please install that from ppa.
Afterwards, download the needed cuda .deb

  • add the repo to your system (first three steps from install instructions on cuda download page)
  • don’t install cuda
  • instead, run sudo apt install cuda-toolkit-10-0

" Please don’t follow that Tensorflow howto, it will probably break your system."

It is breaking because I am using two different GPU models, correct?

If I use only one model, e,g. Titan V,
will the Tensorflow HowTos still break my system?

Thanks.

The different gpu models only matter in terms of the maximum compute capability usable by your cuda kernels. Apart from that there’s no problem, both are supported by the latest driver.
That howto tells to first install the driver and then install ‘cuda’. ‘cuda’ is a metapackage, though, installing another driver over the first one and then installing cuda-toolkit.