Hello,
I am having trouble getting vGPU setup for some of our newer systems. Specifically these are Dell PowerEdge R7525 with AMD EPYC processors and 3x Nvidia A40 GPUs.
The OS is Ubuntu 20.04 and I am using the latest NVidia vGPU driver: 510.108.03
root@srv-p24-14.cloud.ccr.buffalo.edu:~# nvidia-smi
Fri Dec 2 19:35:29 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.108.03 Driver Version: 510.108.03 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A40 On | 00000000:25:00.0 Off | 0 |
| 0% 30C P8 30W / 300W | 0MiB / 46068MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A40 On | 00000000:81:00.0 Off | 0 |
| 0% 29C P8 32W / 300W | 0MiB / 46068MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA A40 On | 00000000:E2:00.0 Off | 0 |
| 0% 30C P8 29W / 300W | 0MiB / 46068MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
I am following the guide that comes with the driver and when I try to enable them with the sriov-manage utility I get the following error:
root@srv-p24-14.cloud.ccr.buffalo.edu:~# /usr/lib/nvidia/sriov-manage -e ALL
Enabling VFs on 0000:25:00.0
Cannot obtain unbindLock for 0000:25:00.0
I verified that srv-io and iommu are enabled in the BIOS and I used the displaymodeselector --gpumode and disabled the display ports.
I am stumped here.
Any help would be greatly appreciated.
Thank you,
Salvatore