Hello Nvidia expert,
My Dell R750 Server is setup with 2 * Nvidia A40 GPU cards. The R750 server is running RHEL 8.8. Each A40 GPU has been passthough to the VMs in KVM.
Below is VM’s guest OS (RHEL 8.8) outputs. I have installed nvidia-driver-local-repo-rhel8-470.199.02-1.0-1.x86_64.rpm for the A40 GPU in VM.
[root@addgpu2 ~]# dmesg | grep -i nvrm
[ 32.853639] NVRM: The NVIDIA GPU 0000:07:00.0
NVRM: (PCI ID: 10de:2235) installed in this system has
NVRM: fallen off the bus and is not responding to commands.
[ 32.857174] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 32.857176] NVRM: None of the NVIDIA devices were initialized.
[ 33.978074] NVRM: The NVIDIA GPU 0000:07:00.0
NVRM: (PCI ID: 10de:2235) installed in this system has
NVRM: fallen off the bus and is not responding to commands.
[ 33.980912] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 33.980914] NVRM: None of the NVIDIA devices were initialized.
[root@addgpu2 ~]# nvidia-settings
Unable to init server: Could not connect: Connection refused
ERROR: The control display is undefined; please run nvidia-settings --help
for usage information.
[root@addgpu2 ~]# nvidia-smi
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
Does anyone know what below message meaning?
State Name Product Name Slot Number
Available GPU Controller in Slot 2 of Instance 1 NVIDIA A40 2
Available GPU Controller in Slot 7 of Instance 1 NVIDIA A40 7
Does anyone know how to fix this issue?
Thanks.
David