I can’t make two monitors connected to two different Quadro RTX 6000 work. Only one that is setup as primary in BIOS works in Linux. If I try to select the second one in whatever UI the current distro uses, the primary monitor blinks for a second and the second monitor remains in the “disabled” status.
If I setup SLI Mosaic in nvidia-settings, the OS hangs on reboot. I tried many various settings, nothing works after reboot.
There are no error messages in dmesg or in Xorg.0.log.
I tried 6 differen distros based on Ubuntu and ArchLinux, I tried GNOME, Xfce, and KDE, I tried differend Display Managers, I tried kernels 5.4, 5.11, 5.12, I tried NVIDIA drivers 450, 460, and 465. The result is always the same.
I managed to get error messages from X when using Mosaic SLI. In brief the driver tries setting up both GPUs to create a big virtual screen spanning two displays. It successfully initializes the first GPU, when it comes to second, the driver thinks it’s already occupied by X and skips it. Everything goes to hell after that.
[ 110.105] (II) NVIDIA(0): Display device(s) assigned to X screen 0:
[ 110.105] (II) NVIDIA(0): Samsung S32D850 (DFP-0)
...
[ 110.105] (II) NVIDIA(0): Ancor Communications Inc PB328 (DFP-0)
...
[ 110.105] (II) NVIDIA(0): Using MetaMode string:
[ 110.105] (II) NVIDIA(0): "GPU-b4211f92-aa40-d5f4-8d6e-d40ff79e65cc.DP-0: 2560x1440
[ 110.105] (II) NVIDIA(0): +2560+0 {ForceCompositionPipeline=On,
[ 110.105] (II) NVIDIA(0): ForceFullCompositionPipeline=On},
[ 110.105] (II) NVIDIA(0): GPU-2b6da70b-3ee1-9b07-b2eb-775de3327a0f.DP-1-0: 2560x1440
[ 110.105] (II) NVIDIA(0): +0+0 {ForceCompositionPipeline=On,
[ 110.105] (II) NVIDIA(0): ForceFullCompositionPipeline=On}"
[ 110.105] (II) NVIDIA(0): Requested modes:
[ 110.105] (II) NVIDIA(0):
[ 110.105] (II) NVIDIA(0): "GPU-b4211f92-aa40-d5f4-8d6e-d40ff79e65cc.DP-0:2560x1440+2560+0{ForceCompositionPipeline=On,ForceFullCompositionPipeline=On},GPU-2b6da70b-3ee1-9b07-b2eb-775de3327a0f.DP-1-0:2560x1440+0+0{ForceCompositionPipeline=On,ForceFullCompositionPipeline=On}"
[ 110.106] (II) NVIDIA(0): Validated MetaModes:
[ 110.106] (II) NVIDIA(0): MetaMode "GPU-b4211f92-aa40-d5f4-8d6e-d40ff79e65cc.DP-0:2560x1440+2560+0{ForceCompositionPipeline=On,ForceFullCompositionPipeline=On},GPU-2b6da70b-3ee1-9b07-b2eb-775de3327a0f.DP-1-0:2560x1440+0+0{ForceCompositionPipeline=On,ForceFullCompositionPipeline=On}":
[ 110.106] (II) NVIDIA(0): Size: 5120 x 1440
[ 110.106] (II) NVIDIA(0): Samsung S32D850 (DFP-0): "2560x1440"
[ 110.106] (II) NVIDIA(0): Size : 2560 x 1440
[ 110.106] (II) NVIDIA(0): Offset : +2560 +0
[ 110.106] (II) NVIDIA(0): Panning
[ 110.106] (II) NVIDIA(0): Domain : 2560 x 1440
[ 110.106] (II) NVIDIA(0): Tracking Area: 5120 x 1440 +0 +0
[ 110.106] (II) NVIDIA(0): Border : 0,0,0,0
[ 110.106] (II) NVIDIA(0): Virtual screen size determined to be 5120 x 1440
[ 110.107] (II) NVIDIA(0): Adding implicit MetaMode: "GPU-0.DP-0: nvidia-auto-select"
...
[ 110.148] (II) NVIDIA(0): Computing DPI using physical size from Samsung S32D850
[ 110.148] (II) NVIDIA(0): (DFP-0)'s EDID and first mode to be programmed on Samsung
[ 110.148] (II) NVIDIA(0): S32D850 (DFP-0):
[ 110.148] (II) NVIDIA(0): width : 2560 pixels 710 mm (DPI: 91)
[ 110.148] (II) NVIDIA(0): height : 1440 pixels 400 mm (DPI: 91)
[ 110.148] (--) NVIDIA(0): DPI set to (91, 91); computed from "UseEdidDpi" X config
[ 110.148] (--) NVIDIA(0): option
[ 110.148] (II) NVIDIA(G0): NVIDIA Quadro RTX 6000 (GPU-1) already has an X screen
[ 110.148] (II) NVIDIA(G0): assigned; skipping this GPU screen
[ 110.148] (EE) NVIDIA(G0): Failing initialization of X screen
[ 110.148] (II) UnloadModule: "nvidia"
[ 110.148] (II) UnloadSubModule: "wfb"
[ 110.148] (II) UnloadSubModule: "fb"
[ 110.149] (II) NVIDIA: Reserving 24576.00 MB of virtual memory for indirect memory
[ 110.149] (II) NVIDIA: access.
[ 113.153] (EE) NVIDIA(GPU-0): Failed to initialize DMA.
[ 113.154] (EE) NVIDIA(0): Failed to allocate push buffer
Can you please describe the configuration you actually want here? I.e. do you want one big desktop spanning both monitors, two completely separate desktop seats, etc.?
If you just want one desktop spanning both monitors, your best bet is to simply plug both displays into one GPU. SLI Mosaic is really intended for situations where you need more than four displays in a desktop configuration.
SLI Mosaic is really intended for situations where you need more than four displays in a desktop configuration.
Hi Aaron,
There will be 8 monitors eventually. So, yes, I do need SLI Mosaic with one desktop spanning several monitors. I know that two monitors connected to one card do work. But this is not what I need.
I excluded the X server as a culprit. There is a line in the log that (GPU-1) already has an X screen, so my thought was that X server automatically binds the second GPU to a screen, NVIDIA driver ignores it and then fails. Not true. The driver fails even without X interfering.
The line (GPU-1) already has an X screen disappeared, but the driver still fails:
[ 348.522] (II) NVIDIA: The X server supports PRIME Render Offload.
[ 351.716] (II) NVIDIA(GPU-0): NVIDIA SLI enabled.
[ 351.872] (--) NVIDIA(0): Valid display device(s) on GPU-0 at PCI:33:0:0
[ 351.872] (--) NVIDIA(0): DFP-0 (boot)
...
[ 351.872] (--) NVIDIA(0): Valid display device(s) on GPU-1 at PCI:1:0:0
[ 351.872] (--) NVIDIA(0): DFP-0 (boot)
...
[ 351.875] (II) NVIDIA(0): NVIDIA GPU NVIDIA Quadro RTX 6000 (TU102GL-A) at PCI:33:0:0
[ 351.875] (II) NVIDIA(0): (GPU-0)
[ 351.875] (II) NVIDIA(0): GPU UUID: GPU-b4211f92-aa40-d5f4-8d6e-d40ff79e65cc
...
[ 351.875] (II) NVIDIA(0): NVIDIA GPU NVIDIA Quadro RTX 6000 (TU102GL-A) at PCI:1:0:0
[ 351.875] (II) NVIDIA(0): (GPU-1)
[ 351.875] (II) NVIDIA(0): GPU UUID: GPU-2b6da70b-3ee1-9b07-b2eb-775de3327a0f
...
[ 351.888] (II) NVIDIA(0): Using HorizSync/VertRefresh ranges from the EDID for display
[ 351.888] (II) NVIDIA(0): device Samsung S32D850 (DFP-0).
[ 351.888] (II) NVIDIA(GPU-0):
[ 351.888] (II) NVIDIA(GPU-0): --- Building ModePool for Samsung S32D850 (DFP-0) ---
...
[ 351.946] (II) NVIDIA(0): Using HorizSync/VertRefresh ranges from the EDID for display
[ 351.946] (II) NVIDIA(0): device Ancor Communications Inc PB328 (DFP-0).
[ 351.946] (II) NVIDIA(GPU-1):
[ 351.946] (II) NVIDIA(GPU-1): --- Building ModePool for Ancor Communications Inc PB328
...
[ 352.014] (II) NVIDIA(0): Display device(s) assigned to X screen 0:
[ 352.014] (II) NVIDIA(0): Samsung S32D850 (DFP-0)
...
[ 352.014] (II) NVIDIA(0): Ancor Communications Inc PB328 (DFP-0)
...
[ 352.014] (II) NVIDIA(0): Using MetaMode string:
[ 352.014] (II) NVIDIA(0): "GPU-b4211f92-aa40-d5f4-8d6e-d40ff79e65cc.DP-0: 2560x1440
[ 352.014] (II) NVIDIA(0): +2560+0 {ForceCompositionPipeline=On,
[ 352.014] (II) NVIDIA(0): ForceFullCompositionPipeline=On},
[ 352.014] (II) NVIDIA(0): GPU-2b6da70b-3ee1-9b07-b2eb-775de3327a0f.DP-1-0: 2560x1440
[ 352.014] (II) NVIDIA(0): +0+0 {ForceCompositionPipeline=On,
[ 352.014] (II) NVIDIA(0): ForceFullCompositionPipeline=On}"
[ 352.014] (II) NVIDIA(0): Requested modes:
[ 352.014] (II) NVIDIA(0):
[ 352.014] (II) NVIDIA(0): "GPU-b4211f92-aa40-d5f4-8d6e-d40ff79e65cc.DP-0:2560x1440+2560+0{ForceCompositionPipeline=On,ForceFullCompositionPipeline=On},GPU-2b6da70b-3ee1-9b07-b2eb-775de3327a0f.DP-1-0:2560x1440+0+0{ForceCompositionPipeline=On,ForceFullCompositionPipeline=On}"
[ 352.015] (II) NVIDIA(0): Validated MetaModes:
[ 352.015] (II) NVIDIA(0): MetaMode "GPU-b4211f92-aa40-d5f4-8d6e-d40ff79e65cc.DP-0:2560x1440+2560+0{ForceCompositionPipeline=On,ForceFullCompositionPipeline=On},GPU-2b6da70b-3ee1-9b07-b2eb-775de3327a0f.DP-1-0:2560x1440+0+0{ForceCompositionPipeline=On,ForceFullCompositionPipeline=On}":
[ 352.015] (II) NVIDIA(0): Size: 5120 x 1440
[ 352.015] (II) NVIDIA(0): Samsung S32D850 (DFP-0): "2560x1440"
[ 352.015] (II) NVIDIA(0): Size : 2560 x 1440
[ 352.015] (II) NVIDIA(0): Offset : +2560 +0
[ 352.015] (II) NVIDIA(0): Panning
[ 352.015] (II) NVIDIA(0): Domain : 2560 x 1440
[ 352.015] (II) NVIDIA(0): Tracking Area: 5120 x 1440 +0 +0
[ 352.015] (II) NVIDIA(0): Border : 0,0,0,0
[ 352.015] (II) NVIDIA(0): Virtual screen size determined to be 5120 x 1440
...
[ 352.057] (II) NVIDIA(0): Computing DPI using physical size from Samsung S32D850
[ 352.057] (II) NVIDIA(0): (DFP-0)'s EDID and first mode to be programmed on Samsung
[ 352.057] (II) NVIDIA(0): S32D850 (DFP-0):
[ 352.057] (II) NVIDIA(0): width : 2560 pixels 710 mm (DPI: 91)
[ 352.057] (II) NVIDIA(0): height : 1440 pixels 400 mm (DPI: 91)
[ 352.057] (--) NVIDIA(0): DPI set to (91, 91); computed from "UseEdidDpi" X config
[ 352.057] (--) NVIDIA(0): option
[ 352.058] (II) NVIDIA: Reserving 24576.00 MB of virtual memory for indirect memory
[ 352.058] (II) NVIDIA: access.
[ 355.062] (EE) NVIDIA(GPU-0): Failed to initialize DMA.
[ 355.063] (EE) NVIDIA(0): Failed to allocate push buffer
NVIDIA, can somebody from the RM team take a look?
Ah yes, if you need 8 displays total than you will indeed need SLI Mosaic.
The “already has an X screen assigned” message comes from the new code to support so-called “GPU screens”, which are X screens that don’t have their own root windows. Due to some technical limitations, we don’t support having a real X screen and a GPU screen on the same GPU, but that doesn’t matter for your use case. The error isn’t fatal to the X server, it just means you won’t have a GPU screen (the “NVIDIA(G0)” screen mentioned in the log) on that particular GPU.
The real error here is the “Failed to initialize DMA” message. That indicates that there is a basic failure in communication between the GPU and the driver software. Can you please run sudo nvidia-bug-report.sh in the failing configuration and attach the bug report log here?
Yeah, it does seem like the driver is unable to communicate successfully with the GPU. Is there any chance you have AMD’s “secure memory encryption” feature enabled? If so can you please try disabling it by passing mem_encrypt=off on the kernel command line?
If that doesn’t work, please try disabling the IOMMU by passing amd_iommu=off on the kernel command line.
This new crash looks different: it’s a CPU page fault while the other errors were faults triggered by memory access by the GPU.
Is the CPU crash still from the 460.84 driver? Could you please try 465.31 and generate a new bug report log if this is still a problem with the newer driver?
The installer log is for 460.84, I didn’t see that it had been replaced by 465.31. People change things all the time between generating a bug report log and getting stack traces like these so it’s always best to confirm.