Hi,
We recently have installed 15.1 Drivers With LINUX KVM drivers. Our chassis is a SuperMicro and GPU is an A16. In the documentation, is explained how to run multiple vGPU, in a single VM with an A16 GPU with q and c series. We made some test attaching from 2 to 10 vGPUs to a single VM and it works fine. But when the VM is shutdown, sometime we show in dmesg the following messages.
dmesg_vgpu.txt (16.8 KB)
After this failure, if we try to run another VM the server comes stuck and the only thing that we can do is to reset it physically.
When this occurs, the vGPU process in the server does not end. I had a walkthrough, so I didn’t need to reset the server, killing the vGPU process and reseting the GPU with nvidia-smi -r
. This brings some problems, if im running another VM with a vGPU we need to stop it while it is working, so it has to start from the beginning. This makes the server not production ready, because we need to shut down VMs in production, and we shouldn’t stop production VMs.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.07 Driver Version: 525.85.07 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A16 On | 00000000:CE:00.0 Off | 0 |
| 0% 45C P8 16W / 62W | 0MiB / 15356MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A16 On | 00000000:CF:00.0 Off | 0 |
| 0% 43C P8 15W / 62W | 0MiB / 15356MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA A16 On | 00000000:D0:00.0 Off | 0 |
| 0% 38C P8 16W / 62W | 0MiB / 15356MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA A16 On | 00000000:D1:00.0 Off | 0 |
| 0% 36C P8 15W / 62W | 0MiB / 15356MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+