The bug report log file indicates that the GPU driver is unable to start the GPU at PCI address 3b.
lspci indicates the device was configured by the BIOS/OS:
/usr/bin/lspci -d "10de:*" -v -xxx
3b:00.0 VGA compatible controller: NVIDIA Corporation TU102GL [Quadro RTX 6000/8000] (rev a1) (prog-if 00 [VGA controller])
Subsystem: Dell TU102GL [Quadro RTX 6000/8000]
Flags: bus master, fast devsel, latency 0, IRQ 123, NUMA node 0
Memory at ab000000 (32-bit, non-prefetchable) [size=16M]
Memory at 38bfe0000000 (64-bit, prefetchable) [size=256M]
Memory at 38bff0000000 (64-bit, prefetchable) [size=32M]
I/O ports at 6000 [size=128]
Expansion ROM at ac080000 [virtual] [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Legacy Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [258] L1 PM Substates
Capabilities: [128] Power Budgeting <?>
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] Secondary PCI Express
Capabilities: [bb0] Physical Resizable BAR
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
00: de 10 30 1e 07 00 10 00 a1 00 00 03 00 00 80 00
10: 00 00 00 ab 0c 00 00 e0 bf 38 00 00 0c 00 00 f0
20: bf 38 00 00 01 60 00 00 00 00 00 00 28 10 9e 12
30: 00 00 00 00 60 00 00 00 00 00 00 00 ff 01 00 00
40: 28 10 9e 12 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 01 00 00 00 ce d6 23 00 00 00 00 00
60: 01 68 c3 c9 08 00 00 00 05 78 80 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 10 00 12 00 e1 8d 2c 11
80: 1e 21 10 00 03 3d 45 00 40 01 01 11 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 13 00 04 00
a0: 06 00 00 00 0e 00 00 00 03 00 1f 00 00 00 00 00
b0: 00 00 00 00 09 00 14 01 00 00 13 0b 80 00 00 00
c0: e6 7b 7e c8 00 00 00 00 11 00 05 00 00 00 b9 00
d0: 00 00 ba 00 00 00 00 00 00 00 00 00 28 10 9e 12
e0: 28 10 9e 12 03 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3b:00.1 Audio device: NVIDIA Corporation TU102 High Definition Audio Controller (rev a1)
Subsystem: Dell TU102 High Definition Audio Controller
Flags: bus master, fast devsel, latency 0, IRQ 121, NUMA node 0
Memory at ac050000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
00: de 10 f7 10 06 00 10 00 a1 00 03 04 00 00 80 00
10: 00 00 05 ac 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 9e 12
30: 00 00 00 00 60 00 00 00 00 00 00 00 ff 02 00 00
40: 28 10 9e 12 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 ce d6 23 00 00 00 00 00
60: 01 68 03 00 08 00 00 00 05 78 80 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 10 00 02 00 e1 8d 2c 01
80: 1e 29 09 00 03 3d 45 00 43 01 01 11 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 13 00 04 00
a0: 06 00 00 00 0e 00 00 00 00 00 01 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3b:00.2 USB controller: NVIDIA Corporation TU102 USB 3.1 Host Controller (rev a1) (prog-if 30 [XHCI])
Subsystem: Dell TU102 USB 3.1 Host Controller
Flags: fast devsel, IRQ 55, NUMA node 0
Memory at ac000000 (64-bit, prefetchable) [size=256K]
Memory at ac040000 (64-bit, prefetchable) [size=64K]
Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [b4] Power Management version 3
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci
00: de 10 d6 1a 02 04 10 00 a1 30 03 0c 20 00 80 00
10: 0c 00 00 ac 00 00 00 00 00 00 00 00 0c 00 04 ac
20: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 9e 12
30: 00 00 00 00 68 00 00 00 00 00 00 00 ff 03 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 31 60 00 00 00 00 00 00 05 78 81 00 b8 00 e0 fe
70: 00 00 00 00 00 00 00 00 10 b4 02 00 e0 8d 2c 01
80: 1e 29 19 00 03 3d 45 00 40 00 01 11 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 13 00 04 00
a0: 06 00 00 00 0e 00 00 00 00 00 01 00 00 00 00 00
b0: 00 00 00 00 01 00 43 c8 0b 01 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3b:00.3 Serial bus controller: NVIDIA Corporation TU102 USB Type-C UCSI Controller (rev a1)
Subsystem: Dell TU102 USB Type-C UCSI Controller
Flags: bus master, fast devsel, latency 0, IRQ 47, NUMA node 0
Memory at ac054000 (32-bit, non-prefetchable) [size=4K]
Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [b4] Power Management version 3
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: nvidia-gpu
Kernel modules: i2c_nvidia_gpu
00: de 10 d7 1a 06 04 10 00 a1 00 80 0c 00 00 80 00
10: 00 40 05 ac 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 9e 12
30: 00 00 00 00 68 00 00 00 00 00 00 00 ff 04 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 05 78 81 00 78 00 e0 fe
70: 00 00 00 00 00 00 00 00 10 b4 02 00 e0 8d 2c 01
80: 1e 29 19 00 03 3d 45 00 40 00 01 11 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 13 00 04 00
a0: 06 00 00 00 0e 00 00 00 00 00 01 00 00 00 00 00
b0: 00 00 00 00 01 00 43 c8 0b 01 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d8:00.0 VGA compatible controller: NVIDIA Corporation TU102GL [Quadro RTX 6000/8000] (rev a1) (prog-if 00 [VGA controller])
Subsystem: NVIDIA Corporation Quadro RTX 8000
Flags: bus master, fast devsel, latency 0, IRQ 126, NUMA node 1
Memory at ef000000 (32-bit, non-prefetchable) [size=16M]
Memory at 39ffe0000000 (64-bit, prefetchable) [size=256M]
Memory at 39fff0000000 (64-bit, prefetchable) [size=32M]
I/O ports at e000 [size=128]
Expansion ROM at f0080000 [virtual] [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Legacy Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [258] L1 PM Substates
Capabilities: [128] Power Budgeting <?>
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] Secondary PCI Express
Capabilities: [bb0] Physical Resizable BAR
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
00: de 10 30 1e 07 04 10 00 a1 00 00 03 00 00 80 00
10: 00 00 00 ef 0c 00 00 e0 ff 39 00 00 0c 00 00 f0
20: ff 39 00 00 01 e0 00 00 00 00 00 00 de 10 9e 12
30: 00 00 00 00 60 00 00 00 00 00 00 00 ff 01 00 00
40: de 10 9e 12 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 01 00 00 00 ce d6 23 00 00 00 00 00
60: 01 68 c3 c9 08 00 00 00 05 78 81 00 18 01 e0 fe
70: 00 00 00 00 00 00 00 00 10 00 12 00 e1 8d 2c 11
80: 1e 21 10 00 03 3d 46 00 40 01 01 11 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 13 00 04 00
a0: 06 00 00 00 0e 00 00 00 03 00 1f 00 00 00 00 00
b0: 00 00 00 00 09 00 14 01 01 00 13 0b 80 00 00 00
c0: 11 8e 3e d9 00 00 00 00 11 00 05 00 00 00 b9 00
d0: 00 00 ba 00 00 00 00 00 00 00 00 00 de 10 9e 12
e0: de 10 9e 12 03 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d8:00.1 Audio device: NVIDIA Corporation TU102 High Definition Audio Controller (rev a1)
Subsystem: NVIDIA Corporation TU102 High Definition Audio Controller
Flags: bus master, fast devsel, latency 0, IRQ 122, NUMA node 1
Memory at f0050000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
00: de 10 f7 10 06 00 10 00 a1 00 03 04 00 00 80 00
10: 00 00 05 f0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 de 10 9e 12
30: 00 00 00 00 60 00 00 00 00 00 00 00 ff 02 00 00
40: de 10 9e 12 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 ce d6 23 00 00 00 00 00
60: 01 68 03 00 08 00 00 00 05 78 80 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 10 00 02 00 e1 8d 2c 01
80: 1e 29 09 00 03 3d 45 00 43 01 01 11 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 13 00 04 00
a0: 06 00 00 00 0e 00 00 00 00 00 01 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d8:00.2 USB controller: NVIDIA Corporation TU102 USB 3.1 Host Controller (rev a1) (prog-if 30 [XHCI])
Subsystem: NVIDIA Corporation TU102 USB 3.1 Host Controller
Flags: fast devsel, IRQ 57, NUMA node 1
Memory at f0000000 (64-bit, prefetchable) [size=256K]
Memory at f0040000 (64-bit, prefetchable) [size=64K]
Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [b4] Power Management version 3
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci
00: de 10 d6 1a 02 04 10 00 a1 30 03 0c 20 00 80 00
10: 0c 00 00 f0 00 00 00 00 00 00 00 00 0c 00 04 f0
20: 00 00 00 00 00 00 00 00 00 00 00 00 de 10 9e 12
30: 00 00 00 00 68 00 00 00 00 00 00 00 ff 03 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 31 60 00 00 00 00 00 00 05 78 81 00 b8 00 e0 fe
70: 00 00 00 00 00 00 00 00 10 b4 02 00 e0 8d 2c 01
80: 1e 29 19 00 03 3d 46 00 40 00 01 11 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 13 00 04 00
a0: 06 00 00 00 0e 00 00 00 00 00 01 00 00 00 00 00
b0: 00 00 00 00 01 00 43 c8 0b 01 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d8:00.3 Serial bus controller: NVIDIA Corporation TU102 USB Type-C UCSI Controller (rev a1)
Subsystem: NVIDIA Corporation TU102 USB Type-C UCSI Controller
Flags: bus master, fast devsel, latency 0, IRQ 53, NUMA node 1
Memory at f0054000 (32-bit, non-prefetchable) [size=4K]
Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [b4] Power Management version 3
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: nvidia-gpu
Kernel modules: i2c_nvidia_gpu
00: de 10 d7 1a 06 04 10 00 a1 00 80 0c 00 00 80 00
10: 00 40 05 f0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 de 10 9e 12
30: 00 00 00 00 68 00 00 00 00 00 00 00 ff 04 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 05 78 81 00 78 00 e0 fe
70: 00 00 00 00 00 00 00 00 10 b4 02 00 e0 8d 2c 01
80: 1e 29 19 00 03 3d 46 00 40 00 01 11 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 13 00 04 00
a0: 06 00 00 00 0e 00 00 00 00 00 01 00 00 00 00 00
b0: 00 00 00 00 01 00 43 c8 0b 01 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
But the driver is unable to start the device:
/var/log/dmesg:
[ 5.930340] kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 508
[ 6.025948] kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module 515.65.01 Wed Jul 20 14:00:58 UTC 2022
[ 6.064369] kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 515.65.01 Wed Jul 20 13:43:59 UTC 2022
[ 6.074221] kernel: [drm] [nvidia-drm] [GPU ID 0x00003b00] Loading driver
[ 6.897433] kernel: NVRM: GPU 0000:3b:00.0: RmInitAdapter failed! (0x25:0xffff:1428)
[ 6.897579] kernel: NVRM: GPU 0000:3b:00.0: rm_init_adapter failed, device minor number 0
[ 6.898608] kernel: [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00003b00] Failed to allocate NvKmsKapiDevice
[ 6.908882] kernel: [drm:nv_drm_probe_devices [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00003b00] Failed to register device
[ 6.909612] kernel: [drm] [nvidia-drm] [GPU ID 0x0000d800] Loading driver
[ 7.573856] kernel: NVRM: GPU 0000:3b:00.0: RmInitAdapter failed! (0x25:0xffff:1428)
[ 7.573941] kernel: NVRM: GPU 0000:3b:00.0: rm_init_adapter failed, device minor number 0
[ 12.784922] kernel: NVRM: GPU 0000:3b:00.0: RmInitAdapter failed! (0x23:0x65:1382)
[ 12.808434] kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:d8:00.0 on minor 1
[ 12.819560] kernel: NVRM: GPU 0000:3b:00.0: rm_init_adapter failed, device minor number 0
Further information isn’t available from the logs.
Try isolating the specific GPU. Plug one GPU into the system, power it up, note behavior. Then power down, remove the GPU, plug the other GPU into the system in exactly the same way (same slot, same aux power dongle), power up, note behavior. If one GPU works and the other doesn’t it is probably a GPU HW failure.
Quadro RTX cards should have a warranty to the original purchaser, check to see if it is expired or not.
I probably won’t be able to provide further assistance here.