Dual Monitors not working w/ Centos 7.5 (Kernel 4.18) running two RTX 2080 TI's with nvidia 410.57 drivers

PLEASE REFER TO THE FOLLOWING GITLAB REPOSITORY I HAVE CREATED FOR STORING DEBUG FILES, LOG FILES, SCREENSHOTS AND PICTURES THAT YOU GUYS HAVE REQUESTED.

GITLAB REPOSITORY LINK
https://gitlab.com/shanedora/dual_monitors_do_not_work_centos7_running_two_rtx_2080_tis_with_nvidia_driver_410

Software Environment

Distribution: CentOS 7.5.1804 (Core) “GNOME Shell 3.25.4”
Linux Kernel: 4.18.14-1.el7.elrepo.x86_64
Nvidia Drivers: 410.57

Hardware Environment:

Monitors: (Quantity: 2) Asus MG279 Monitors
Graphic Cards: (Quantity: 2) Asus Dual GeForce RTX 2080TI’s
Motherboard: X399 AORUS XTREME SocketTR4
CPU: Threadripper AMD 16 Core

INITIAL SETUP AND INSTALLATION

NOTE:
FOR THIS SECTION PLEASE REFER TO ALL FILES IN THE “initial_setup_and_installation” FOLDER IN MY GITLAB REPOSITORY

I have currently installed CentOS 7.5.1804 from the (CentOS-7-x86_64-Everything-1804.iso) without any noticeable errors/issues during the install process.

I have currently updated the kernel from 3.10 (which came with the iso) to 4.18 without any noticeable errors/issues during the install process.

I have went to the nvidia site https://www.nvidia.com/Download/index.aspx?lang=en-us and entered the following information then clicked “searched”. It gave me the driver 410.57.

Product Type: GeForce
Product Series: GeForce RTX 20 Series
Product: GeForce RTX 2080 Ti
Operating System: Linux 64bit
Language: English

Once I had this driver downloaded I executed these exact instructions in the following order…

sudo yum install epel-release
sudo yum install dkms
Blacklist the nouveau driver
sudo vim /etc/default/grub
Append to GRUB_CMDLINE_LINUX this “rdblacklist=nouveau"
sudo vim /etc/modprobe.d/blacklist.conf
Append “blacklist nouveau”
sudo dracut /boot/initramfs-$(uname -r).img $(uname -r) –-force
sudo systemctl disable gdm
reboot
execute the nvidia driver “.sh” script as root
Say yes about using dkms during the installation process
Say yes that you want it to automatically run the nvidia config utility
systemctl enable gdm
reboot

I was able to install the drivers without any noticeable errors/issues during the install process of the nvidia drivers.

After I rebooted I was presented with a log in screen as I expected. At this point the second monitor was black but I expected this to be the case since usually the login screen appears only on one monitor. Though even after logging in only one monitor was being utilized. See my image titled “after_log_in” in my gitlab repositoory. The other screen was not being used and remained black. I could not even move my cursor over to the second monitor.

Please see the file “xorg.conf_backup” in my repository for what my /etc/Xll/xorg.conf file looks like. At this point in time I have not touched any settings in the nvidia-settings controls panel. I should also note that the two graphics cards are not bridged in anyway (such as a nvlink sli adapter). I currently have a nvlink sli adapater on order but I have not received it yet.

My two monitors have both display ports and hdmi ports. I have tried both of them. At this moment in time when running your nvidia-bug-report.sh script I am using the display ports as such…

Monitor_1 Display Port ---------> GraphicsCard_1 Display Port
Monitor_2 Display Port ---------> GraphicsCard_2 Display Port

I have tried HDMI cables but the same result occurs (being the black screen). I have tried plugging in both monitors to GraphicsCard_1 but same result occurs (being the black screen). I have tried pluggin in both monitors to GraphicsCard_2 and BOTH MONITORS GO BLACK. CAN’T EVEN SEE MY CURSOR.

When I go into Applications->Settings->Devices->Displays there is only one monitor being detected which is titled as “Ancor Communications Inc 27”.

I’m going to throw pictures of what my nvidia-settings control panel options look like at this current moment. These files are named as follows…

nvidiasettings_x_server_inforrmation
nvidiasettings_X_Server_Display_Configuration
nvidiasettings_X_screen_0
nvidiasettings_nvidia-settings_configuration
nvidiasettings_GPU_1
nvidiasettings_GPU_0

Here is the output when I run xrandr…

[user@host Downloads]$ xrandr
Screen 0: minimum 8 x 8, current 2560 x 1440, maximum 32767 x 32767
DP-0 connected primary 2560x1440+0+0 (normal left inverted right x axis y axis) 597mm x 336mm
   2560x1440    143.86*+ 119.88    59.95  
   1920x1080     60.00    59.94    50.00  
   1440x900      59.89  
   1440x576      50.00  
   1440x480      59.94  
   1280x1024     75.02    60.02  
   1280x720      60.00    59.94    50.00  
   1152x864      75.00  
   1024x768      75.03    70.07    60.00  
   800x600       75.00    72.19    60.32    56.25  
   720x576       50.00  
   640x480       75.00    59.94  
DP-1 disconnected (normal left inverted right x axis y axis)
HDMI-0 disconnected (normal left inverted right x axis y axis)
DP-2 disconnected (normal left inverted right x axis y axis)
DP-3 disconnected (normal left inverted right x axis y axis)
DP-4 disconnected (normal left inverted right x axis y axis)
DP-5 disconnected (normal left inverted right x axis y axis)
DP-6 disconnected (normal left inverted right x axis y axis)

TWEAKED NVIDIA SETTINGS BY ENABLING THE “DISABLED” DISPLAY

NOTE:
FOR THIS SECTION PLEASE REFER TO THE FOLDER “tweaked_nvidia_settings” IN MY GITLAB REPOSITORY

In this section I wanted to point out some troubleshooting I did that I feel is important to mention. I actually did quite a bit of troubleshooting but I’m not going to write a novel here and instead keep it short but as descriptive as I can with what I believe is relevant.

You’ll notice that in the image “/initial_setup_and_installation/nvidiasettings_X_Server_Display_Configuration” one of the displays (shown as the color orange with big bold letters titled “Disabled”) is disabled. I took the liberty in enabling this by the option provided by the drop down menu “Configuration”. The only option in the drop down menu was "New X screen (requires X restart). So i did that and then restarted my computer. I didn’t just restart X because for one I don’t know how so I figured a full restart will do the job.

Just to be clear after selecting that option (before i closed nvidia-settings console) I ensured that I clicked “Apply” and then “Save to X configuration File”. It told me it was going to merge with the existing file which I agreed to (checked the box) and proceeded to click saved. At this point I did a reboot.

This time around I was presented with the same login screen as before just as I expected. The second monitor was black but I figured that to be normal. After I logged in only one monitor was displaying the desktop background, icons, etc, etc (which looked normal just like before). However, this time I was able to move my cursor over to the second display. The second display was still black but when I moved my cursor over to it the cursor went from a normal looking mouse pointer to a black “x” looking cursor. See file “/tweaked_nvidia_settings/black_x_cursor”. I couldn’t drag a terminal window (for example) over to it either.

I ran the “nvidia-bug-report.sh” again in case it reported anything new. This new log.gz version I threw in the “tweaked_nvidia_settings” folder in my repo to keep it separate from the other. I also threw in the new “xorg.conf” file which I renamed to “xorg.conf_backup_tweaked_nvidia_settings”. You’ll find it in the “tweaked_nvidia_settings” folder as well. I took some pictures of the nvidia-settings control panel this go around in case you were curious in seeing that too.

ONE FINAL TROUBLESHOOTING STEP

I have not saved off any debug info for this final “small” test nor have I taken any pictures. I figure an explanation will do it but if you prefer I recreate this issue and provide some more info just let me know and I will.

I enabled this feature called Xinerama. Once I did that I clicked apply and save to x configuration file just like before. After I rebooted both screens were black. However, this go around I could see my cursor and I could move my cursor from one monitor over to the other monitor. The black “x” looking cursor issue wasn’t there anymore. The only way I was able to recover from this weird state was to press Ctrl + Alt + F2 which presented me with a bash prompt. I reverted my xorg.conf file back to where it was in the beginning with this “Xinerama” feature disabled.

Please let me know if there is anything else I can provide you with.

You’ll have to configure just one gpu in your xorg.conf and enable BaseMosaic. You can also run
nvidia-xconfig --base-mosaic
to generate a new xorg.conf.

@generix ~

I will give this a try when i return home tonight. However, would you mind answering these questions below because I’m not sure how I’m suppose to configure just one GPU when I have two plugged in. A bit more detail would be helpful.

  1. If I’m to configure for one GPU then what happens to the second one? Am I using it?

  2. How do you want me to plug my monitors in? For example, do I plug them both into GPU_1 and will this still provide me with extended displays like one usually gets with dual monitors?

  3. Why can’t I enable both GPU’s? Does this mean the other GPU is not going to be used at all?

  4. If I do CUDA development will I be able to utilize both GPU’s or is there always going to be one GPU never used?

  5. I assume by running

nvidia-xconfig --base-mosaic

this will do everything for me? Or will I need to do that as well as go in and remove certain sections of the xorg.conf file referencing the second “unused” GPU? If so I’m going to need to some more details on what exactly to remove.

  1. BaseMosaic is a master/slave config, you’ll configure the master and it will then autoadd all useable slave gpus. Unless there’s a driver bug.
  2. If you plug both monitors into one gpu, you don’t need BaseMosaic or any xorg.conf, it should then work ootb, rendering will only happen on that gpu.
  3. see 1), if you configure both gpus,you are getting two independent screens without the ability to move windows between those.
  4. cuda by default uses all available gpus in the system unless you explicitly disable any.
  5. yes, it should create a ready-to-go config.

@generix

Typing the following command did NOT work even after a reboot…
nvidia-xconfig --base-mosaic

The second monitor is still black and I can’t even move my mouse over to it. The first monitor is still working fine. Here is how I’ve connected my monitors just to be extra clear…

Monitor_1 Display Port ----Cable----> GPU_1 Display Port
Monitor_2 Display Port ----Cable----> GPU_2 Display Port

I’m going to paste in the contents of my new xorg.conf file for you.

# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig:  version 410.57

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0"
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
EndSection

Section "Files"
    FontPath        "/usr/share/fonts/default/Type1"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/input/mice"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "Unknown"
    HorizSync       28.0 - 33.0
    VertRefresh     43.0 - 72.0
    Option         "DPMS"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    Option         "BaseMosaic" "True"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

I’m not sure what good this will do but here is the output of xrandr. I’m not sure why only one monitor being DP-0 is being shown as connected. I obviously have two monitors hooked up to the display ports of GPU_1 and GPU_2…

Screen 0: minimum 8 x 8, current 2560 x 1440, maximum 32767 x 32767
DP-0 connected primary 2560x1440+0+0 (normal left inverted right x axis y axis) 597mm x 336mm
   2560x1440    143.86*+ 119.88    59.95  
   1920x1080     60.00    59.94    50.00  
   1440x900      59.89  
   1440x576      50.00  
   1440x480      59.94  
   1280x1024     75.02    60.02  
   1280x720      60.00    59.94    50.00  
   1152x864      75.00  
   1024x768      75.03    70.07    60.00  
   800x600       75.00    72.19    60.32    56.25  
   720x576       50.00  
   640x480       75.00    59.94  
DP-1 disconnected (normal left inverted right x axis y axis)
HDMI-0 disconnected (normal left inverted right x axis y axis)
DP-2 disconnected (normal left inverted right x axis y axis)
DP-3 disconnected (normal left inverted right x axis y axis)
DP-4 disconnected (normal left inverted right x axis y axis)
DP-5 disconnected (normal left inverted right x axis y axis)
DP-6 disconnected (normal left inverted right x axis y axis)

nvidia-bug-report.log.gz (1.51 MB)

Please add

BusID          "PCI:10:0:0"

to the device section of your xorg.conf, reboot, then run nvidia-bug-report.sh as root and attach the resulting .gz file to your post. Hovering the mouse over an existing post of yours will reveal a paperclip icon.

@generix

I’ve attached the file above in the previous post per your request. However, it says “Scanning…Please Wait nvidia-bug-report.log.gz”. I’ve been waiting for 5 minutes. I went and ahead and uploaded it to my repository as well under the folder “trial2_per_generix_request”. Repository link is located at the start of my first post above.

The config is right now but BaseMosaic is failing due to the gpus not being able to communicate

[    24.071] (EE) NVIDIA(GPU-0): Failed to find a valid Base Mosaic configuration.
[    24.071] (EE) NVIDIA(GPU-0): Invalid Base Mosaic configuration 1 of 1:
[    24.071] (EE) NVIDIA(GPU-0): GPUs:
[    24.071] (EE) NVIDIA(GPU-0):     1) NVIDIA GPU at PCI:10:0:0
[    24.071] (EE) NVIDIA(GPU-0):     2) NVIDIA GPU at PCI:66:0:0
[    24.071] (EE) NVIDIA(GPU-0): Errors:
[    24.071] (EE) NVIDIA(GPU-0):     - The video link was not detected
[    24.071] (WW) NVIDIA(GPU-0): Failed to find a valid Base Mosaic configuration for the
[    24.071] (WW) NVIDIA(GPU-0):     NVIDIA graphics device PCI:10:0:0. Please see Chapter 28:
[    24.071] (WW) NVIDIA(GPU-0):     Configuring SLI and Multi-GPU FrameRendering in the README
[    24.071] (WW) NVIDIA(GPU-0):     for troubleshooting suggestions.
[    24.139] (EE) NVIDIA(GPU-0): Only one GPU will be used for this X screen.

Among other reasons, this might be due to iommu device isolation. Please check with kernel parameters
amd_iommu=off iommu=off
or disable it in bios.

@generix

Would you be so kind to provide details on how to verify kernel parameters please. I’m not sure how to accomplish this.

Howto for Centos:
https://www.thegeekdiary.com/centos-rhel-7-how-to-modify-the-kernel-command-line/

@generix

IOMMU & SVM

I went into my bios and found two parameters that were enabled and changed them to disabled. I attached pictures of these settings in my repository under the folder titled “October_17_1241_MST/”. These two parameters are…

Chipset -> IOMMU
I changed IOMMU from enabled to disabled

M.I.T -> Advanced Frequency Settings -> Advanced CPU Core Settings --> SVM Mode
I changed SVM Mode (AMD Secure Virtual Machine Technology) from enabled to disabled

I saved this bios configuration and rebooted. After rebooted I went back into the bios settings and verified that the new settings took place and they did.

Kernel Parameters

I was not able to find a kernel parameter called iommu or amd_iommu when typing the command sysctl --all in the command line. I piped this output to a text file called “kernel_params.txt” for you and I’m attaching it to my repository in the folder “October_17_1241_MST/”. This file is also attached to this post for convenience. Therefore, I didn’t change any kernel parameters or generate a new grub.cfg file as a result. If you would like me to do this anyways then let me know otherwise I won’t.

Second display still not working but it’s not black anymore!

I’m not sure if it’s important to point out but the display that use to be black isn’t anymore. Instead it has a gray background with the centos “7” logo displayed on log in and after log in. I can’t move my mouse over to it though. My mouse is completely restricted to one monitor. I still have the same setup as far as how I connect my monitors to my two graphics cards. For completeness it is…

Monitor_1 Display Port ----Cable----> GPU_1 Display Port
Monitor_2 Display Port ----Cable----> GPU_2 Display Port

I’ve attached a picture of how this looks (after i log in) in my repository under the folder “October_17_1241_MST/”. This picture is saved as “my_display_after_login.jpg”.

My xorg.conf file

I attached my xorg.conf file for you to look at in my repository under the folder “October_17_1241_MST/” just to be extra thorough should you recommend I change something else or want to double check it. It’s important to note that I have NOT changed any settings in the nvidia-settings control panel. The only things I have changed is what I’ve mentioned above specifically being the bios settings. I figured it might be helpful to you that I take a screenshot specifically of the nvidia-settings -> X Server Display Configuration page. I’ve attached this picture to my repository under the same folder “October_17_1241_MST/” named “October_17_XServerDisplayConfiguration.png”. Should I enable that one screen that says “disabled” and enable the xinerama option as well? I have NOT done that but thought I would ask.

New nvidia bug report for you

I also generated a new nvidia bug report for you. I have saved this in my repository under the folder “Octoboer_17_1241_MST/”. I’ve also attached it to this post for convenience.

xrandr output

Just to be extra thourough here is the output of xrandr as follows.

Screen 0: minimum 8 x 8, current 2560 x 1440, maximum 32767 x 32767
DP-0 connected primary 2560x1440+0+0 (normal left inverted right x axis y axis) 597mm x 336mm
   2560x1440    143.86*+ 119.88    59.95  
   1920x1080     60.00    59.94    50.00  
   1440x900      59.89  
   1440x576      50.00  
   1440x480      59.94  
   1280x1024     75.02    60.02  
   1280x720      60.00    59.94    50.00  
   1152x864      75.00  
   1024x768      75.03    70.07    60.00  
   800x600       75.00    72.19    60.32    56.25  
   720x576       50.00  
   640x480       75.00    59.94  
DP-1 disconnected (normal left inverted right x axis y axis)
HDMI-0 disconnected (normal left inverted right x axis y axis)
DP-2 disconnected (normal left inverted right x axis y axis)
DP-3 disconnected (normal left inverted right x axis y axis)
DP-4 disconnected (normal left inverted right x axis y axis)
DP-5 disconnected (normal left inverted right x axis y axis)
DP-6 disconnected (normal left inverted right x axis y axis)

Some more Device Information if helpful

In case it’s helpful here is some lscpi output…

lspci | grep -i --color ‘vga|3d|2d’

0a:00.0 VGA compatible controller: NVIDIA Corporation Device 1e07 (rev a1)
42:00.0 VGA compatible controller: NVIDIA Corporation Device 1e07 (rev a1)

sudo lspci -v -s 0a:00.0

0a:00.0 VGA compatible controller: NVIDIA Corporation Device 1e07 (rev a1) (prog-if 00 [VGA controller])
	Subsystem: ASUSTeK Computer Inc. Device 8667
	Flags: bus master, fast devsel, latency 0, IRQ 187, NUMA node 0
	Memory at d7000000 (32-bit, non-prefetchable) 
	Memory at c0000000 (64-bit, prefetchable) 
	Memory at d0000000 (64-bit, prefetchable) 
	I/O ports at 3000 
	[virtual] Expansion ROM at d8000000 [disabled] 
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Legacy Endpoint, MSI 00
	Capabilities: [100] Virtual Channel
	Capabilities: [250] Latency Tolerance Reporting
	Capabilities: [258] L1 PM Substates
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [420] Advanced Error Reporting
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Capabilities: [900] #19
	Capabilities: [bb0] #15
	Kernel driver in use: nvidia
	Kernel modules: nouveau, nvidia_drm, nvidia

sudo lspci -v -s 42:00.0

42:00.0 VGA compatible controller: NVIDIA Corporation Device 1e07 (rev a1) (prog-if 00 [VGA controller])
	Subsystem: ASUSTeK Computer Inc. Device 8667
	Flags: bus master, fast devsel, latency 0, IRQ 188
	Memory at 9e000000 (32-bit, non-prefetchable) 
	Memory at 80000000 (64-bit, prefetchable) 
	Memory at 90000000 (64-bit, prefetchable) 
	I/O ports at 4000 
	[virtual] Expansion ROM at 000c0000 [disabled] 
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Legacy Endpoint, MSI 00
	Capabilities: [100] Virtual Channel
	Capabilities: [250] Latency Tolerance Reporting
	Capabilities: [258] L1 PM Substates
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [420] Advanced Error Reporting
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Capabilities: [900] #19
	Capabilities: [bb0] #15
	Kernel driver in use: nvidia
	Kernel modules: nouveau, nvidia_drm, nvidia

In conclusion:
It still looks like we have the same error from looking at the report.

[    19.572] (II) NVIDIA GLX Module  410.57  Tue Sep 18 23:27:13 CDT 2018
[    22.088] (EE) NVIDIA(GPU-0): Failed to find a valid Base Mosaic configuration.
[    22.088] (EE) NVIDIA(GPU-0): Invalid Base Mosaic configuration 1 of 1:
[    22.088] (EE) NVIDIA(GPU-0): GPUs:
[    22.088] (EE) NVIDIA(GPU-0):     1) NVIDIA GPU at PCI:10:0:0
[    22.088] (EE) NVIDIA(GPU-0):     2) NVIDIA GPU at PCI:66:0:0
[    22.088] (EE) NVIDIA(GPU-0): Errors:
[    22.088] (EE) NVIDIA(GPU-0):     - The video link was not detected
[    22.088] (WW) NVIDIA(GPU-0): Failed to find a valid Base Mosaic configuration for the
[    22.088] (WW) NVIDIA(GPU-0):     NVIDIA graphics device PCI:10:0:0. Please see Chapter 28:
[    22.088] (WW) NVIDIA(GPU-0):     Configuring SLI and Multi-GPU FrameRendering in the README
[    22.088] (WW) NVIDIA(GPU-0):     for troubleshooting suggestions.
[    22.149] (EE) NVIDIA(GPU-0): Only one GPU will be used for this X screen.

Again, for anyone else reading this my repository site for all this documentation, pictures, debug reports and more is as follows…
https://gitlab.com/shanedora/dual_monitors_do_not_work_centos7_running_two_rtx_2080_tis_with_nvidia_driver_410
kernel_params.txt (117 KB)
nvidia-bug-report.log.gz (1.51 MB)

Turning iommu off in bios was sufficient, the logs confirm that iommu is off.
Still, the same error is logged in /var/log/Xorg.0.log, meaning driver bug or unsupported hardware.
So I’m sorry but you should revert to connect both monitors to one gpu and use the second one for cuda only.

@generix

Can I still utilize dual monitors by plugging the both into GPU_1?? Do I need to revert any changes to my xorg.conf file? If so would you be so kind to specify what exactly.

Just set The BaseMosaic option to false and plug the second monitor into the same card as the working monitor. Everything else should be fine.

@generix

My nvlink sli adapter will be arriving soon in the mail. Do you think it’s worth trying this over with this adapter involved. Could this resolve the missing communication link between the two GPU’s?

From my experience, I’d say 50% chance, the other 50% it will still display the “video-link error” or change to “unknown error”. Still worth a try.