A100 not recognized by nvidia-smi but recognized by lspci

There are two Graphics Cards on my PC. One is RTX 3090 and another one is A100. My main purpose is to install the A100 card. When I install A100 card on this PC, I enabled the CSM, and 4G encoding (Above 4G). However I cannot switch to integrated gpu graphics which disappeared when I enabled CSM. So I used a RTX 3090 at hand to output the graphics and it worked. But when I installed ubuntu server and sudo apt install nvidia-driver-###-server/open/None. I can only find the 3090 card in nvidia-smi. Meantime, all these can be finished in an SSH terminal because the HDMI output is totally black. I tried many combinations like download drivers from the official website or desktop or server.

This is the output of lshw and I smmarized with ChatGPT.

Based on the information provided in the output of the sudo lshw -short command, here are the details of the hardware components of the system:

  • CPU: 13th Gen Intel(R) Core™ i9-13900K
  • Motherboard: ROG MAXIMUS Z790 HERO
  • Graphics Card 1: GA100 [A100 PCIe 80GB]
  • Graphics Card 2: GA102 [GeForce RTX 3090]
  • PCI Device 1: Samsung SSD 980 PRO 2TB
  • PCI Device 2: Intel Corporation Network Interface (wlp0s20f3)
  • RAM: 128GB System Memory

The command nvidia-smi worked before. But when I write the topic it outputs like this
$ nvidia-smi
Unable to determine the device handle for GPU0000:02:00.0: Unknown Error

My next try is buying an AMD graphic card RX580. I want to use this card to cheat the hardware and system.

What can I do to make A100 available? Thank you for your help in advance.

1 Like

I encountered a similar issue. Initially, I believed it to be a driver problem and attempted to use different drivers, but I encountered the same error with all of them. Below, you’ll find the output for nvidia-smi and lshw :

NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

$sudo lshw -C video
*-display UNCLAIMED
description: VGA compatible controller
product: SVGA II Adapter
vendor: VMware
physical id: f
bus info: pci@0000:00:0f.0
version: 00
width: 32 bits
clock: 33MHz
capabilities: vga_controller cap_list
configuration: latency=64
resources: ioport:1070(size=16) memory:e8000000-efffffff memory:fe000000-fe7fffff memory:c0000-dffff
*-display UNCLAIMED
description: 3D controller
product: GA100 [A100 PCIe 40GB]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:0b:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm cap_list
configuration: latency=64
resources: memory:fc000000-fcffffff memory:e4000000-e5ffffff

also i use ubuntu 22.04 and i installed latest driver.
$dkms status
nvidia/535.86.05, 5.19.0-1027-oracle, x86_64: installed`

have you guys gotten any fixes for this ?

I have the same problem on my Linux server with one of the two installed A100 GPUs disappearing from the nvidia-smi output.
Finally, this issue was addressed by checking and re-plugging the power supply. Namely, the power supply may be a critical issue for running the A100 GPU, while the lspci only lists the original recognized devices at the system start. Therefore the A100 GPU may not work correctly with its energy consumption up when under a discontinuous or insufficient power supply, yielding the disappearance in the real-time checking nvidia-smi output.
Hope this experience may help.