Two monitors connected to two Quadro RTX don't work in any capacity

vaihoheso · June 4, 2021, 8:28am

I can’t make two monitors connected to two different Quadro RTX 6000 work. Only one that is setup as primary in BIOS works in Linux. If I try to select the second one in whatever UI the current distro uses, the primary monitor blinks for a second and the second monitor remains in the “disabled” status.

If I setup SLI Mosaic in nvidia-settings, the OS hangs on reboot. I tried many various settings, nothing works after reboot.

There are no error messages in dmesg or in Xorg.0.log.

I tried 6 differen distros based on Ubuntu and ArchLinux, I tried GNOME, Xfce, and KDE, I tried differend Display Managers, I tried kernels 5.4, 5.11, 5.12, I tried NVIDIA drivers 450, 460, and 465. The result is always the same.

What am I doing wrong?

vaihoheso · June 4, 2021, 9:23pm

NVIDIAAAA… NVIDIAAAA… NVIDIAAAA…

Is anybody home?

vaihoheso · June 7, 2021, 8:09pm

I managed to get error messages from X when using Mosaic SLI. In brief the driver tries setting up both GPUs to create a big virtual screen spanning two displays. It successfully initializes the first GPU, when it comes to second, the driver thinks it’s already occupied by X and skips it. Everything goes to hell after that.

[   110.105] (II) NVIDIA(0): Display device(s) assigned to X screen 0:
[   110.105] (II) NVIDIA(0):   Samsung S32D850 (DFP-0)
...
[   110.105] (II) NVIDIA(0):   Ancor Communications Inc PB328 (DFP-0)
...
[   110.105] (II) NVIDIA(0): Using MetaMode string:
[   110.105] (II) NVIDIA(0):     "GPU-b4211f92-aa40-d5f4-8d6e-d40ff79e65cc.DP-0: 2560x1440
[   110.105] (II) NVIDIA(0):     +2560+0 {ForceCompositionPipeline=On,
[   110.105] (II) NVIDIA(0):     ForceFullCompositionPipeline=On},
[   110.105] (II) NVIDIA(0):     GPU-2b6da70b-3ee1-9b07-b2eb-775de3327a0f.DP-1-0: 2560x1440
[   110.105] (II) NVIDIA(0):     +0+0 {ForceCompositionPipeline=On,
[   110.105] (II) NVIDIA(0):     ForceFullCompositionPipeline=On}"
[   110.105] (II) NVIDIA(0): Requested modes:
[   110.105] (II) NVIDIA(0):    
[   110.105] (II) NVIDIA(0):     "GPU-b4211f92-aa40-d5f4-8d6e-d40ff79e65cc.DP-0:2560x1440+2560+0{ForceCompositionPipeline=On,ForceFullCompositionPipeline=On},GPU-2b6da70b-3ee1-9b07-b2eb-775de3327a0f.DP-1-0:2560x1440+0+0{ForceCompositionPipeline=On,ForceFullCompositionPipeline=On}"
[   110.106] (II) NVIDIA(0): Validated MetaModes:
[   110.106] (II) NVIDIA(0): MetaMode "GPU-b4211f92-aa40-d5f4-8d6e-d40ff79e65cc.DP-0:2560x1440+2560+0{ForceCompositionPipeline=On,ForceFullCompositionPipeline=On},GPU-2b6da70b-3ee1-9b07-b2eb-775de3327a0f.DP-1-0:2560x1440+0+0{ForceCompositionPipeline=On,ForceFullCompositionPipeline=On}":
[   110.106] (II) NVIDIA(0):     Size: 5120 x 1440
[   110.106] (II) NVIDIA(0):     Samsung S32D850 (DFP-0): "2560x1440"
[   110.106] (II) NVIDIA(0):         Size          : 2560 x 1440
[   110.106] (II) NVIDIA(0):         Offset        : +2560 +0
[   110.106] (II) NVIDIA(0):         Panning
[   110.106] (II) NVIDIA(0):          Domain       : 2560 x 1440
[   110.106] (II) NVIDIA(0):          Tracking Area: 5120 x 1440 +0 +0
[   110.106] (II) NVIDIA(0):          Border       : 0,0,0,0
[   110.106] (II) NVIDIA(0): Virtual screen size determined to be 5120 x 1440
[   110.107] (II) NVIDIA(0): Adding implicit MetaMode: "GPU-0.DP-0: nvidia-auto-select"
...
[   110.148] (II) NVIDIA(0): Computing DPI using physical size from Samsung S32D850
[   110.148] (II) NVIDIA(0):     (DFP-0)'s EDID and first mode to be programmed on Samsung
[   110.148] (II) NVIDIA(0):     S32D850 (DFP-0):
[   110.148] (II) NVIDIA(0):   width  : 2560 pixels  710  mm (DPI: 91)
[   110.148] (II) NVIDIA(0):   height : 1440 pixels  400  mm (DPI: 91)
[   110.148] (--) NVIDIA(0): DPI set to (91, 91); computed from "UseEdidDpi" X config
[   110.148] (--) NVIDIA(0):     option
[   110.148] (II) NVIDIA(G0): NVIDIA Quadro RTX 6000 (GPU-1) already has an X screen
[   110.148] (II) NVIDIA(G0):     assigned; skipping this GPU screen
[   110.148] (EE) NVIDIA(G0): Failing initialization of X screen
[   110.148] (II) UnloadModule: "nvidia"
[   110.148] (II) UnloadSubModule: "wfb"
[   110.148] (II) UnloadSubModule: "fb"
[   110.149] (II) NVIDIA: Reserving 24576.00 MB of virtual memory for indirect memory
[   110.149] (II) NVIDIA:     access.
[   113.153] (EE) NVIDIA(GPU-0): Failed to initialize DMA.
[   113.154] (EE) NVIDIA(0): Failed to allocate push buffer

vaihoheso · June 7, 2021, 11:12pm

Here is a similar case reported one year ago:

NVIDIAAAA… Do you care?

aplattner · June 7, 2021, 11:44pm

Can you please describe the configuration you actually want here? I.e. do you want one big desktop spanning both monitors, two completely separate desktop seats, etc.?

If you just want one desktop spanning both monitors, your best bet is to simply plug both displays into one GPU. SLI Mosaic is really intended for situations where you need more than four displays in a desktop configuration.

vaihoheso · June 8, 2021, 12:36am

SLI Mosaic is really intended for situations where you need more than four displays in a desktop configuration.

Hi Aaron,

There will be 8 monitors eventually. So, yes, I do need SLI Mosaic with one desktop spanning several monitors. I know that two monitors connected to one card do work. But this is not what I need.

vaihoheso · June 8, 2021, 3:16am

I excluded the X server as a culprit. There is a line in the log that (GPU-1) already has an X screen, so my thought was that X server automatically binds the second GPU to a screen, NVIDIA driver ignores it and then fails. Not true. The driver fails even without X interfering.

I disabled automatic binding:

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0" 0 0
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
    Option         "Xinerama" "0"
    Option         "AutoAddGPU" "off"
    Option         "AutoBindGPU" "off"
    Option         "Debug" "on"
EndSection

The line (GPU-1) already has an X screen disappeared, but the driver still fails:

[   348.522] (II) NVIDIA: The X server supports PRIME Render Offload.
[   351.716] (II) NVIDIA(GPU-0): NVIDIA SLI enabled.
[   351.872] (--) NVIDIA(0): Valid display device(s) on GPU-0 at PCI:33:0:0
[   351.872] (--) NVIDIA(0):     DFP-0 (boot)
...
[   351.872] (--) NVIDIA(0): Valid display device(s) on GPU-1 at PCI:1:0:0
[   351.872] (--) NVIDIA(0):     DFP-0 (boot)
...
[   351.875] (II) NVIDIA(0): NVIDIA GPU NVIDIA Quadro RTX 6000 (TU102GL-A) at PCI:33:0:0
[   351.875] (II) NVIDIA(0):     (GPU-0)
[   351.875] (II) NVIDIA(0): GPU UUID: GPU-b4211f92-aa40-d5f4-8d6e-d40ff79e65cc
...
[   351.875] (II) NVIDIA(0): NVIDIA GPU NVIDIA Quadro RTX 6000 (TU102GL-A) at PCI:1:0:0
[   351.875] (II) NVIDIA(0):     (GPU-1)
[   351.875] (II) NVIDIA(0): GPU UUID: GPU-2b6da70b-3ee1-9b07-b2eb-775de3327a0f
...
[   351.888] (II) NVIDIA(0): Using HorizSync/VertRefresh ranges from the EDID for display
[   351.888] (II) NVIDIA(0):     device Samsung S32D850 (DFP-0).
[   351.888] (II) NVIDIA(GPU-0): 
[   351.888] (II) NVIDIA(GPU-0): --- Building ModePool for Samsung S32D850 (DFP-0) ---
...
[   351.946] (II) NVIDIA(0): Using HorizSync/VertRefresh ranges from the EDID for display
[   351.946] (II) NVIDIA(0):     device Ancor Communications Inc PB328 (DFP-0).
[   351.946] (II) NVIDIA(GPU-1): 
[   351.946] (II) NVIDIA(GPU-1): --- Building ModePool for Ancor Communications Inc PB328
...
[   352.014] (II) NVIDIA(0): Display device(s) assigned to X screen 0:
[   352.014] (II) NVIDIA(0):   Samsung S32D850 (DFP-0)
...
[   352.014] (II) NVIDIA(0):   Ancor Communications Inc PB328 (DFP-0)
...
[   352.014] (II) NVIDIA(0): Using MetaMode string:
[   352.014] (II) NVIDIA(0):     "GPU-b4211f92-aa40-d5f4-8d6e-d40ff79e65cc.DP-0: 2560x1440
[   352.014] (II) NVIDIA(0):     +2560+0 {ForceCompositionPipeline=On,
[   352.014] (II) NVIDIA(0):     ForceFullCompositionPipeline=On},
[   352.014] (II) NVIDIA(0):     GPU-2b6da70b-3ee1-9b07-b2eb-775de3327a0f.DP-1-0: 2560x1440
[   352.014] (II) NVIDIA(0):     +0+0 {ForceCompositionPipeline=On,
[   352.014] (II) NVIDIA(0):     ForceFullCompositionPipeline=On}"
[   352.014] (II) NVIDIA(0): Requested modes:
[   352.014] (II) NVIDIA(0):    
[   352.014] (II) NVIDIA(0):     "GPU-b4211f92-aa40-d5f4-8d6e-d40ff79e65cc.DP-0:2560x1440+2560+0{ForceCompositionPipeline=On,ForceFullCompositionPipeline=On},GPU-2b6da70b-3ee1-9b07-b2eb-775de3327a0f.DP-1-0:2560x1440+0+0{ForceCompositionPipeline=On,ForceFullCompositionPipeline=On}"
[   352.015] (II) NVIDIA(0): Validated MetaModes:
[   352.015] (II) NVIDIA(0): MetaMode "GPU-b4211f92-aa40-d5f4-8d6e-d40ff79e65cc.DP-0:2560x1440+2560+0{ForceCompositionPipeline=On,ForceFullCompositionPipeline=On},GPU-2b6da70b-3ee1-9b07-b2eb-775de3327a0f.DP-1-0:2560x1440+0+0{ForceCompositionPipeline=On,ForceFullCompositionPipeline=On}":
[   352.015] (II) NVIDIA(0):     Size: 5120 x 1440
[   352.015] (II) NVIDIA(0):     Samsung S32D850 (DFP-0): "2560x1440"
[   352.015] (II) NVIDIA(0):         Size          : 2560 x 1440
[   352.015] (II) NVIDIA(0):         Offset        : +2560 +0
[   352.015] (II) NVIDIA(0):         Panning
[   352.015] (II) NVIDIA(0):          Domain       : 2560 x 1440
[   352.015] (II) NVIDIA(0):          Tracking Area: 5120 x 1440 +0 +0
[   352.015] (II) NVIDIA(0):          Border       : 0,0,0,0
[   352.015] (II) NVIDIA(0): Virtual screen size determined to be 5120 x 1440
...
[   352.057] (II) NVIDIA(0): Computing DPI using physical size from Samsung S32D850
[   352.057] (II) NVIDIA(0):     (DFP-0)'s EDID and first mode to be programmed on Samsung
[   352.057] (II) NVIDIA(0):     S32D850 (DFP-0):
[   352.057] (II) NVIDIA(0):   width  : 2560 pixels  710  mm (DPI: 91)
[   352.057] (II) NVIDIA(0):   height : 1440 pixels  400  mm (DPI: 91)
[   352.057] (--) NVIDIA(0): DPI set to (91, 91); computed from "UseEdidDpi" X config
[   352.057] (--) NVIDIA(0):     option
[   352.058] (II) NVIDIA: Reserving 24576.00 MB of virtual memory for indirect memory
[   352.058] (II) NVIDIA:     access.
[   355.062] (EE) NVIDIA(GPU-0): Failed to initialize DMA.
[   355.063] (EE) NVIDIA(0): Failed to allocate push buffer

NVIDIA, can somebody from the RM team take a look?

aplattner · June 8, 2021, 6:40am

Ah yes, if you need 8 displays total than you will indeed need SLI Mosaic.

The “already has an X screen assigned” message comes from the new code to support so-called “GPU screens”, which are X screens that don’t have their own root windows. Due to some technical limitations, we don’t support having a real X screen and a GPU screen on the same GPU, but that doesn’t matter for your use case. The error isn’t fatal to the X server, it just means you won’t have a GPU screen (the “NVIDIA(G0)” screen mentioned in the log) on that particular GPU.

The real error here is the “Failed to initialize DMA” message. That indicates that there is a basic failure in communication between the GPU and the driver software. Can you please run sudo nvidia-bug-report.sh in the failing configuration and attach the bug report log here?

vaihoheso · June 8, 2021, 7:13am

After disabling auto-binding in the X config, I actually see now the error messages from RM in the kernel log:

[  186.043496] audit: type=1130 audit(1623135575.831:150): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=lightdm comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  186.835492] resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
[  186.835499] caller _nv000712rm+0x1af/0x200 [nvidia] mapping multiple BARs
[  189.217189] nvidia 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000d address=0xfff34400 flags=0x0000]
[  189.217549] nvidia 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000d address=0xfff34400 flags=0x0000]
[  189.242559] NVRM: GPU at PCI:0000:01:00: GPU-2b6da70b-3ee1-9b07-b2eb-775de3327a0f
[  189.242566] NVRM: Xid (PCI:0000:01:00): 32, pid=2761, Channel ID 00000000 intr 00008000
[  189.265673] NVRM: Xid (PCI:0000:01:00): 32, pid=2761, Channel ID 00000000 intr 00008000
[  189.322616] nvidia 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000d address=0xffd10000 flags=0x0000]
[  189.322624] nvidia 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000d address=0xffd10000 flags=0x0000]
[  189.322684] NVRM: Xid (PCI:0000:01:00): 56, pid=2761, CMDre 00000000 00000000 00000000 00000001 00000000
[  189.625186] nvidia 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000d address=0xffc50000 flags=0x0000]
[  189.637329] NVRM: Xid (PCI:0000:01:00): 32, pid=2761, Channel ID 00000008 intr 00008000
[  189.645964] NVRM: Xid (PCI:0000:01:00): 32, pid=2761, Channel ID 00000008 intr 00008000
[  192.653137] nvidia 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000d address=0xffd02000 flags=0x0000]
[  192.653144] nvidia 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000d address=0xffd02000 flags=0x0000]
[  192.653207] NVRM: Xid (PCI:0000:01:00): 56, pid=2761, CMDre 00000001 00000000 00000000 00000001 00000000
[  192.716443] nvidia 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000d address=0xffcda000 flags=0x0000]
[  192.716450] nvidia 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000d address=0xffcda000 flags=0x0000]
[  192.716518] NVRM: Xid (PCI:0000:01:00): 56, pid=2761, CMDre 00000002 00000000 00000000 00000001 00000000

This happens right before the push buffer allocation bails out on timeout:

[   189.931] (II) NVIDIA: Reserving 24576.00 MB of virtual memory for indirect memory
[   189.931] (II) NVIDIA:     access.
[   192.936] (EE) NVIDIA(GPU-0): Failed to initialize DMA.
[   192.937] (EE) NVIDIA(0): Failed to allocate push buffer

So I would say your very first push buffer operation causes a page fault.

vaihoheso · June 8, 2021, 7:29am

nvidia-bug-report.log.gz (1.2 MB)

aplattner · June 8, 2021, 7:10pm

Yeah, it does seem like the driver is unable to communicate successfully with the GPU. Is there any chance you have AMD’s “secure memory encryption” feature enabled? If so can you please try disabling it by passing mem_encrypt=off on the kernel command line?

If that doesn’t work, please try disabling the IOMMU by passing amd_iommu=off on the kernel command line.

vaihoheso · June 8, 2021, 9:15pm

I can disable AMD virtualization in system BIOS. I still can see the page fault, it’s just being caught by the kernel now instead of the supervisor:

[   15.686723] RIP: 0010:_nv009227rm+0x24/0x50 [nvidia]
[   15.687333] Code: 1f 80 00 00 00 00 53 e8 ba a1 af ff 48 85 c0 74 25 48 8b 98 98 23 00 00 48 89 c7 48 89 de e8 83 fd ff ff 48 8b 83 e0 17 00 00 <c6> 40 28 00 5b c3 66 0f 1f 44 00 00 5b be 00 00 6d 03 bf 4c 57 8a
[   15.687335] RSP: 0018:ffffb407c9793e08 EFLAGS: 00010286
[   15.687337] RAX: 0000000000000000 RBX: ffff97848fb2c008 RCX: ffff97847df99380
[   15.687338] RDX: ffff97847df99360 RSI: ffff97848fb2c008 RDI: ffff97848fa8c008
[   15.687339] RBP: ffff97848f9de000 R08: 0000000000000001 R09: 0000000000000100
[   15.687340] R10: ffff97848fa8c000 R11: 0000000000000000 R12: ffff97848f9db000
[   15.687342] R13: ffff97848f9db000 R14: ffff97847db7e520 R15: ffff97847db7ddc0
[   15.687345]  ? _nv009227rm+0x1d/0x50 [nvidia]
[   15.687942]  ? rm_execute_work_item+0x108/0x120 [nvidia]
[   15.688365]  ? os_execute_work_item+0x46/0x60 [nvidia]
[   15.688723]  ? _main_loop+0x83/0x130 [nvidia]
[   15.689080]  ? nvidia_modeset_resume+0x20/0x20 [nvidia]
[   15.689435]  ? kthread+0x133/0x150
[   15.689437]  ? kthread_associate_blkcg+0xc0/0xc0
[   15.689440]  ? ret_from_fork+0x22/0x30
[   15.689444] ---[ end trace b37821d779bc70c7 ]---
[   15.689446] BUG: kernel NULL pointer dereference, address: 0000000000000028
[   15.689449] #PF: supervisor write access in kernel mode
[   15.689451] #PF: error_code(0x0002) - not-present page
[   15.689453] PGD 0 P4D 0 
[   15.689456] Oops: 0002 [#1] PREEMPT SMP NOPTI
[   15.689458] CPU: 47 PID: 2164 Comm: nv_queue Tainted: P        W  OE     5.12.8-1-MANJARO #1
[   15.689461] Hardware name: Gigabyte Technology Co., Ltd. TRX40 DESIGNARE/TRX40 DESIGNARE, BIOS F4q 04/12/2021
[   15.689462] RIP: 0010:_nv009227rm+0x24/0x50 [nvidia]
[   15.690065] Code: 1f 80 00 00 00 00 53 e8 ba a1 af ff 48 85 c0 74 25 48 8b 98 98 23 00 00 48 89 c7 48 89 de e8 83 fd ff ff 48 8b 83 e0 17 00 00 <c6> 40 28 00 5b c3 66 0f 1f 44 00 00 5b be 00 00 6d 03 bf 4c 57 8a
[   15.690068] RSP: 0018:ffffb407c9793e08 EFLAGS: 00010286
[   15.690070] RAX: 0000000000000000 RBX: ffff97848fb2c008 RCX: ffff97847df99380
[   15.690072] RDX: ffff97847df99360 RSI: ffff97848fb2c008 RDI: ffff97848fa8c008
[   15.690073] RBP: ffff97848f9de000 R08: 0000000000000001 R09: 0000000000000100
[   15.690075] R10: ffff97848fa8c000 R11: 0000000000000000 R12: ffff97848f9db000
[   15.690076] R13: ffff97848f9db000 R14: ffff97847db7e520 R15: ffff97847db7ddc0
[   15.690078] FS:  0000000000000000(0000) GS:ffff97a33dbc0000(0000) knlGS:0000000000000000
[   15.690080] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   15.690082] CR2: 0000000000000028 CR3: 00000005b6410000 CR4: 0000000000350ee0
[   15.690084] Call Trace:
[   15.690086]  ? rm_execute_work_item+0x108/0x120 [nvidia]
[   15.690509]  ? os_execute_work_item+0x46/0x60 [nvidia]
[   15.690867]  ? _main_loop+0x83/0x130 [nvidia]
[   15.691224]  ? nvidia_modeset_resume+0x20/0x20 [nvidia]
[   15.691579]  ? kthread+0x133/0x150
[   15.691583]  ? kthread_associate_blkcg+0xc0/0xc0
[   15.691586]  ? ret_from_fork+0x22/0x30

vaihoheso · June 8, 2021, 9:31pm

This doesn’t change anything. Still doesn’t work, the logs are the same.

vaihoheso · June 8, 2021, 9:32pm

Please, ask the RM team to take a look. It’s a show stopper for my client.

aplattner · June 8, 2021, 10:37pm

This new crash looks different: it’s a CPU page fault while the other errors were faults triggered by memory access by the GPU.

Is the CPU crash still from the 460.84 driver? Could you please try 465.31 and generate a new bug report log if this is still a problem with the newer driver?

vaihoheso · June 8, 2021, 11:11pm

The first stack trace is about a page_fault exception being caught inside the NVIDIA driver:

[   15.682620] CPU: 47 PID: 2164 Comm: nv_queue Tainted: P           OE     5.12.8-1-MANJARO #1
[   15.682623] Hardware name: Gigabyte Technology Co., Ltd. TRX40 DESIGNARE/TRX40 DESIGNARE, BIOS F4q 04/12/2021
[   15.682624] RIP: 0010:kfence_protect_page+0x39/0xc0
[   15.682628] Code: 25 28 00 00 00 48 89 44 24 08 31 c0 48 8d 74 24 04 c7 44 24 04 00 00 00 00 e8 93 34 dc ff 48 85 c0 74 07 83 7c 24 04 01 74 06 <0f> 0b 31 c0 eb 4c 48 8b 38 48 89 c2 84 db 75 59 48 89 f8 0f 1f 40
[   15.682630] RSP: 0018:ffffb407c9793c98 EFLAGS: 00010046
[   15.682633] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffb407c9793c9c
[   15.682634] RDX: ffffb407c9793c9c RSI: 0000000000000000 RDI: 0000000000000000
[   15.682636] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[   15.682637] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
[   15.682638] R13: ffffb407c9793d58 R14: 0000000000000028 R15: 0000000000000000
[   15.682640] FS:  0000000000000000(0000) GS:ffff97a33dbc0000(0000) knlGS:0000000000000000
[   15.682641] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   15.682643] CR2: 0000000000000028 CR3: 00000005b6410000 CR4: 0000000000350ee0
[   15.682645] Call Trace:
[   15.682650]  kfence_unprotect+0x13/0x30
[   15.682653]  page_fault_oops+0x9d/0x2d0
[   15.682658]  ? _nv009058rm+0x39/0x1a0 [nvidia]
[   15.683275]  exc_page_fault+0x67/0x170
[   15.683279]  asm_exc_page_fault+0x1e/0x30

I can’t tell if it is originated from CPU, but considering the overall bug, it’s safe to assume it’s the same bug.

The report is from 465.31.

aplattner · June 8, 2021, 11:33pm

The installer log is for 460.84, I didn’t see that it had been replaced by 465.31. People change things all the time between generating a bug report log and getting stack traces like these so it’s always best to confirm.

I’ll do some quick analysis and file a bug.

Edit: Filed internal bug number 3323148.

vaihoheso · June 8, 2021, 11:49pm

Thank you! You have my email. Please, include me to the bug.

aplattner · June 8, 2021, 11:53pm

Filed internal bug number 3323148.

vaihoheso · June 9, 2021, 12:13am

Thanks!

Topic		Replies	Views
Dual Monitors not working w/ Centos 7.5 (Kernel 4.18) running two RTX 2080 TI's with nvidia 410.57 drivers Linux	15	7049	October 18, 2018
X has unknown error with sli MOSAIC on Ubuntu 18.04 2xGTX 960 4Gb SLI Linux	5	1699	October 14, 2021
Dual GPU problem with multiple displays in GNU/Linux Linux	12	10278	October 12, 2021
Two NVIDIA cards Base Mosaic / MultiGPU error "GPU PCI IDs do not match" Linux	4	1779	July 22, 2016
Enabling SLI makes all the windows start flashing on Ubuntu 14.04 Linux	21	16139	October 7, 2014
Reproducible: NVRM: GPU at 0000:01:00.0 has fallen off the bus. -- Both screens black, Xorg at 100% Linux	24	51009	December 16, 2015
Dual screen with separate X screens fails with generated nvidia-settings config Linux	4	3125	July 5, 2017
X window system fails to initialize in multi-GPU setup. Linux	1	2243	September 1, 2017
nvidia 325.15 for kernel 3.11 crashes on any operation on Quadro K1000M, "gpu has fallen off th Linux	2	2578	September 28, 2013
Ubuntu doesn't detect my second HDMI display Linux	90	154612	November 11, 2024

Two monitors connected to two Quadro RTX don't work in any capacity

Related topics