Nvidia-smi only show one gpu, but there are two 2080ti on pc

command: nvidia-smi

±----------------------------------------------------------------------------+
| NVIDIA-SMI 440.44 Driver Version: 440.44 CUDA Version: 10.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208… On | 00000000:01:00.0 Off | N/A |
| 25% 35C P8 12W / 257W | 0MiB / 11016MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

command: lspci |grep -i nvidia

01:00.0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 10f7 (rev a1)
01:00.2 USB controller: NVIDIA Corporation Device 1ad6 (rev a1)
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad7 (rev a1)
03:00.0 VGA compatible controller: NVIDIA Corporation GV102 (rev a1)
03:00.1 Audio device: NVIDIA Corporation Device 10f7 (rev a1)
03:00.2 USB controller: NVIDIA Corporation Device 1ad6 (rev a1)
03:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad7 (rev a1)

command: dmesg |grep -i nvrm

[ 1161.218136] NVRM: GPU 0000:03:00.0: RmInitAdapter failed! (0x24:0x65:1185)
[ 1161.218154] NVRM: GPU 0000:03:00.0: rm_init_adapter failed, device minor number 1
[ 1298.639679] NVRM: GPU 0000:03:00.0: RmInitAdapter failed! (0x24:0x65:1185)
[ 1298.639717] NVRM: GPU 0000:03:00.0: rm_init_adapter failed, device minor number 1
[ 1314.406979] NVRM: GPU 0000:03:00.0: RmInitAdapter failed! (0x24:0x65:1185)
[ 1314.407017] NVRM: GPU 0000:03:00.0: rm_init_adapter failed, device minor number 1

It missed one of then, and how can I solve it?I had swap slots two gpu, and restart my pc,but no help.When I taked the erro gpu on other pc, it worked.
nvidia-bug-report.log.gz (1.2 MB)

Hello,

since this is the same problem I’m facing, and I haven’t seen a solution to this on the forum, I’m replying to this thread. Problem:

deviceQuery and nvidia-smi only recognize 1 GPU (of 2 RTX 2080ti), and lspci recognizes both. Perhaps you found a solution? If not, it would be great, if anyone else could help me. Thank you in advance!

Here are the relevant diagnostic outputs:

$ lspci -vvv

[...]
5e:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 133
        NUMA node: 0
        Region 0: Memory at c4000000 (32-bit, non-prefetchable) [size=16M]
        Region 1: Memory at 38ffe0000000 (64-bit, prefetchable) [size=256M]
        Region 3: Memory at 38fff0000000 (64-bit, prefetchable) [size=32M]
        Region 5: I/O ports at b000 [size=128]
        Expansion ROM at c5000000 [virtual] [disabled] [size=512K]
        Capabilities: <access denied>
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

[...]

af:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 135
        NUMA node: 1
        Region 0: Memory at ed000000 (32-bit, non-prefetchable) [size=16M]
        Region 1: Memory at d0000000 (64-bit, prefetchable) [size=256M]
        Region 3: Memory at e0000000 (64-bit, prefetchable) [size=32M]
        Region 5: I/O ports at e000 [size=128]
        Expansion ROM at ee000000 [virtual] [disabled] [size=512K]
        Capabilities: <access denied>
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

[...]
$ nvidia-smi
Tue Sep  6 17:36:19 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:AF:00.0  On |                  N/A |
|  0%   36C    P8    46W / 250W |    331MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1971      G   /usr/lib/xorg/Xorg                189MiB |
|    0   N/A  N/A      2451      G   /usr/bin/gnome-shell              140MiB |
+-----------------------------------------------------------------------------+
$ ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 2080 Ti"
  CUDA Driver Version / Runtime Version          11.7 / 11.7
  CUDA Capability Major/Minor version number:    7.5
  Total amount of global memory:                 10986 MBytes (11519590400 bytes)
  (68) Multiprocessors, ( 64) CUDA Cores/MP:     4352 CUDA Cores
  GPU Max Clock rate:                            1545 MHz (1.54 GHz)
  Memory Clock rate:                             7000 Mhz
  Memory Bus Width:                              352-bit
  L2 Cache Size:                                 5767168 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1024
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 175 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.7, CUDA Runtime Version = 11.7, NumDevs = 1, Device0 = NVIDIA GeForce RTX 2080 Ti
Result = PASS

I’ve also attached the debug report.
nvidia-bug-report.log.gz (544.4 KB)

Please check for a gpu damage by using them single on after the other in the same slot.

Hi generix,

thank you for your quick reply. It would be a bit difficult to do that, since the machine is water-cooled. Is there perhaps another way to do this?

Best,

David