I’m trying to run an A100 under Ubuntu 20.04 in a Dell server but unfortunately it currently does not show up in the nvidia-smi tool. The setup looks like that:
- NVIDIA A100 PCIe 40 GB
- Ubuntu 20.04 Desktop
- Linux kernel 5.4.0-65-lowlatency
- NVIDIA driver: 510.108.03
When I run nvidia_smi it repots
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
dmesg shows some errors:
NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:65:00.0)
nvidia: probe of 0000:65:00.0 failed with error -1
NVRM: The NVIDIA probe routine failed for 1 device(s).
NVRM: None of the NVIDIA devices were initialized.
It seems to be a quite common error, but none of the solutions presented in the forum did fix it for me.
I tried to:
- switch to the latest linux kernel
- switch to the latest NVIDIA driver
- set kernel parameters: pic=realloc and pci=realloc=off
- enable addresses above 4GB in the BIOS
- change PCIe slots
- disable secure boot
I attached the nvidia log. Any hint would be really appreciated.
Thanks so much.
nvidia-bug-report.log.gz (382.3 KB)