Second GPU not detected on Ubuntu 18.04.4

Hi everyone!

I tried to find a topic with the same issue I encountered but I couldn’t so decided to post my problem here.

First, here is my system:

  • Motherboard: Asus WS X299 SAGE
  • GPUs: x2 NVIDIA RTX P8000
  • Driver: 450.57
  • Cuda compilation tools, release 10.0, V10.0.130
  • Ubuntu 18.04.4

Here is an example of what I got with nvidia-smi when everything was normal:

±----------------------------------------------------------------------------+
| NVIDIA-SMI 450.57 Driver Version: 450.57 CUDA Version: 11.0 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro RTX 8000 On | 00000000:67:00.0 Off | Off |
| 33% 32C P8 11W / 260W | 1MiB / 48601MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 1 Quadro RTX 8000 On | 00000000:68:00.0 Off | Off |
| 33% 36C P8 14W / 260W | 18MiB / 48598MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 1 N/A N/A 1434 G /usr/lib/xorg/Xorg 10MiB |
| 1 N/A N/A 1510 G /usr/bin/gnome-shell 6MiB |
±----------------------------------------------------------------------------+

Last week I couldn’t detect my GPUs, when I ran nvidia-smi I got Unable to determine the device handle for GPU 0000:6700.0: Unknown Error. So I restarted my workstation but since I have been able to detect only one GPU. Here is nvidia-smi now:

±----------------------------------------------------------------------------+
| NVIDIA-SMI 450.57 Driver Version: 450.57 CUDA Version: 11.0 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro RTX 8000 On | 00000000:68:00.0 Off | Off |
| 33% 30C P8 9W / 260W | 336MiB / 48598MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1317 G /usr/lib/xorg/Xorg 39MiB |
| 0 N/A N/A 1373 G /usr/bin/gnome-shell 29MiB |
| 0 N/A N/A 2385 G /usr/lib/xorg/Xorg 122MiB |
| 0 N/A N/A 2518 G /usr/bin/gnome-shell 95MiB |
| 0 N/A N/A 2572 G …mviewer/tv_bin/TeamViewer 16MiB |
| 0 N/A N/A 27076 G gnome-control-center 27MiB |
±----------------------------------------------------------------------------+

It would be great if you guys could give some help.

Thanks in advance.

Hi! Maybe i have the same problem!

Can you show output from this:

dmidecode --type baseboard

uname -r

lspci -s 67:00.0 -vv

lspci -s 68:00.0 -vv

cat /var/log/syslog | grep 67:

cat /var/log/syslog | grep 68:

Hi! here are de successive output:

  • dmidecode --type baseboard:

# dmidecode 3.1
Getting SMBIOS data from sysfs.
SMBIOS 3.0.0 present.

Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
Manufacturer: ASUSTeK COMPUTER INC.
Product Name: WS X299 SAGE
Version: Rev 1.xx
Serial Number: 191060840100289
Asset Tag: Default string
Features:
Board is a hosting board
Board is replaceable
Location In Chassis: Default string
Chassis Handle: 0x0003
Type: Motherboard
Contained Object Handles: 0

Handle 0x0030, DMI type 10, 6 bytes
On Board Device Information
Type: Video
Status: Enabled
Description: To Be Filled By O.E.M.

Handle 0x0064, DMI type 41, 11 bytes
Onboard Device
Reference Designation: Onboard - Other
Type: Other
Status: Enabled
Type Instance: 1
Bus Address: 0000:00:00.0

Handle 0x0065, DMI type 41, 11 bytes
Onboard Device
Reference Designation: Onboard - Other
Type: Other
Status: Enabled
Type Instance: 2
Bus Address: 0000:00:04.0

Handle 0x0066, DMI type 41, 11 bytes
Onboard Device
Reference Designation: Onboard - Other
Type: Other
Status: Enabled
Type Instance: 3
Bus Address: 0000:00:04.1

Handle 0x0067, DMI type 41, 11 bytes
Onboard Device
Reference Designation: Onboard - Other
Type: Other
Status: Enabled
Type Instance: 4
Bus Address: 0000:00:04.2

Handle 0x0068, DMI type 41, 11 bytes
Onboard Device
Reference Designation: Onboard - Other
Type: Other
Status: Enabled
Type Instance: 5
Bus Address: 0000:00:04.3

Handle 0x0069, DMI type 41, 11 bytes
Onboard Device
Reference Designation: Onboard - Other
Type: Other
Status: Enabled
Type Instance: 6
Bus Address: 0000:00:04.4

Handle 0x006A, DMI type 41, 11 bytes
Onboard Device
Reference Designation: Onboard - Other
Type: Other
Status: Enabled
Type Instance: 7
Bus Address: 0000:00:04.5

Handle 0x006B, DMI type 41, 11 bytes
Onboard Device
Reference Designation: Onboard - Other
Type: Other
Status: Enabled
Type Instance: 8
Bus Address: 0000:00:04.6

Handle 0x006C, DMI type 41, 11 bytes
Onboard Device
Reference Designation: Onboard - Other
Type: Other
Status: Enabled
Type Instance: 9
Bus Address: 0000:00:04.7

Handle 0x006D, DMI type 41, 11 bytes
Onboard Device
Reference Designation: Onboard - Other
Type: Other
Status: Enabled
Type Instance: 10
Bus Address: 0000:00:05.0

Handle 0x006E, DMI type 41, 11 bytes
Onboard Device
Reference Designation: Onboard - Other
Type: Other
Status: Enabled
Type Instance: 11
Bus Address: 0000:00:05.2

Handle 0x006F, DMI type 41, 11 bytes
Onboard Device
Reference Designation: Onboard - Other
Type: Other
Status: Enabled
Type Instance: 12
Bus Address: 0000:00:05.4

Handle 0x0070, DMI type 41, 11 bytes
Onboard Device
Reference Designation: Onboard - Other
Type: Other
Status: Enabled
Type Instance: 13
Bus Address: 0000:00:08.0

Handle 0x0071, DMI type 41, 11 bytes
Onboard Device
Reference Designation: Onboard - Other
Type: Other
Status: Enabled
Type Instance: 14
Bus Address: 0000:00:08.1

Handle 0x0072, DMI type 41, 11 bytes
Onboard Device
Reference Designation: Onboard - Other
Type: Other
Status: Enabled
Type Instance: 15
Bus Address: 0000:00:08.2

Handle 0x0073, DMI type 41, 11 bytes
Onboard Device
Reference Designation: Onboard - Other
Type: Other
Status: Enabled
Type Instance: 16
Bus Address: 0000:00:14.0

Handle 0x0074, DMI type 41, 11 bytes
Onboard Device
Reference Designation: Onboard - Other
Type: Other
Status: Enabled
Type Instance: 17
Bus Address: 0000:00:14.2

Handle 0x0075, DMI type 41, 11 bytes
Onboard Device
Reference Designation: Onboard - Other
Type: Other
Status: Enabled
Type Instance: 18
Bus Address: 0000:00:16.0

Handle 0x0076, DMI type 41, 11 bytes
Onboard Device
Reference Designation: Onboard - SATA
Type: SATA Controller
Status: Enabled
Type Instance: 1
Bus Address: 0000:00:17.0

Handle 0x0077, DMI type 41, 11 bytes
Onboard Device
Reference Designation: Onboard - Other
Type: Other
Status: Enabled
Type Instance: 19
Bus Address: 0000:00:1f.0

Handle 0x0078, DMI type 41, 11 bytes
Onboard Device
Reference Designation: Onboard - Other
Type: Other
Status: Enabled
Type Instance: 20
Bus Address: 0000:00:1f.2

Handle 0x0079, DMI type 41, 11 bytes
Onboard Device
Reference Designation: Onboard - Sound
Type: Sound
Status: Enabled
Type Instance: 1
Bus Address: 0000:00:1f.3

Handle 0x007A, DMI type 41, 11 bytes
Onboard Device
Reference Designation: Onboard - Other
Type: Other
Status: Enabled
Type Instance: 21
Bus Address: 0000:00:1f.4

Handle 0x007B, DMI type 41, 11 bytes
Onboard Device
Reference Designation: Onboard - Ethernet
Type: Ethernet
Status: Enabled
Type Instance: 1
Bus Address: 0000:00:1f.6

  • uname -r:

5.4.0-53-generic

  • lspci -s 67:00.0 -vv:
    nothing returned

  • lspci -s 68:00.0 -vv:

68:00.0 VGA compatible controller: NVIDIA Corporation Device 1e30 (rev a1) (prog-if 00 [VGA controller])
Subsystem: NVIDIA Corporation Device 129e
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 106
NUMA node: 0
Region 0: Memory at d7000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at c0000000 (64-bit, prefetchable) [size=256M]
Region 3: Memory at d0000000 (64-bit, prefetchable) [size=32M]
Region 5: I/O ports at b000 [size=128]
[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities:
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

  • cat /var/log/syslog | grep 67::
    nothing returned

  • cat /var/log/syslog | grep 68::
    nothing returned

Hm… You GPU don’t available on PCI-line. Can you try switch GPU it places.
And show output:

lspci -tvv

cat /var/log/syslog

dmidecode --type 0

Hi!
Sorry for the delay. So today I tried power off + start (instead of restart) and the second GPU reapeared…So it seems to work again now. I don’t know what happened…

on our machine it detects the card but the 2nd card shows the monitors just black and we can not span the desktop over both cards. (nvidia p4000)

Had you any issues getting both cards running properly under ubuntu 18.04lts?
thank you