Nvidia-smi is failing

nvidia-persistenced: Failed to query NVIDIA devices. Please ensure that the NVIDIA device files (/dev/nvidia*) exist, and that user 0 has read and write permissions for those files.

root@asimov-229:/var/log# nvidia-smi
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

root@asimov-229:/var/log# systemctl status nvidia-persistenced.service
● nvidia-persistenced.service - NVIDIA Persistence Daemon
Loaded: loaded (/lib/systemd/system/nvidia-persistenced.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Mon 2022-08-01 22:19:54 PDT; 4min 47s ago
Process: 82025 ExecStart=/usr/bin/nvidia-persistenced --verbose (code=exited, status=1/FAILURE)
Process: 82032 ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced (code=exited, status=0/SUCCESS)

nvidia-bug-report.log.gz (73.4 KB)

There are not nvidia devices visible in lspci, please make sure the gpus are properly seated in the pci slots and power cables are connected.

Thanks for the respond.

That’s correct. I cant locate any GPUs with lspci …

There are 8 GPUs in that server, not sure why none of them are detected. the Server has been on the rack for long time. No one touches it.

any other ways to check the detections on these GPUs?

Thanks

You should rather contact supermicro, I guess the gpus arelocated on some extension board that’s only partly there according to lspci, so has a problem. The gpus only went away due to that.