Hello,
I have a configuration of GPU M6 (Cisco) installed in B200 M4 blade, and for some reason after reboot this configuration stopped working.
I just can’t power on VM’s on ESXi 6.0U3
Error in VMware:
Failed to start the virtual machine.
Module DevicePowerOn power on failed.
Could not initialize plugin ‘/usr/lib64/vmware/plugin/libnvidia-vgx.so’ for vGPU ‘grid_m6-4q’.
nvidia-smi showing everything is ok:
nvidia-smi
Wed Oct 11 15:59:12 2017
±----------------------------------------------------------------------------+
| NVIDIA-SMI 384.73 Driver Version: 384.73 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M6 On | 00000000:81:00.0 Off | 0 |
| N/A 48C P8 16W / 100W | 13MiB / 7679MiB | 0% Default |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+
dmesg | grep -E "NVRM|nvidia"
2017-10-11T15:51:09.402Z cpu15:33461)Loading module nvidia …
2017-10-11T15:51:09.410Z cpu15:33461)Elf: 1865: module nvidia has license NVIDIA
2017-10-11T15:51:09.520Z cpu15:33461)NVRM: vmk_MemPoolCreate passed for 4194304 pages.
2017-10-11T15:51:09.771Z cpu15:33461)NVRM: loading NVIDIA UNIX x86_64 Kernel Module 384.73 Mon Aug 21 15:16:25 PDT 2017
2017-10-11T15:51:09.771Z cpu15:33461)Device: 191: Registered driver ‘nvidia’ from 20
2017-10-11T15:51:09.772Z cpu15:33461)Mod: 4943: Initialization of nvidia succeeded with module ID 20.
2017-10-11T15:51:09.772Z cpu15:33461)nvidia loaded successfully.
2017-10-11T15:51:10.712Z cpu30:33460)Device: 326: Found driver nvidia for device 0x590a4304eae944b8
2017-10-11T15:51:10.714Z cpu17:33479)NVRM: nvidia_associate vmgfx0
2017-10-11T15:52:27.278Z cpu3:35450)IntrCookie: 1935: cookie 0x35 moduleID 20 <nvidia> exclusive, flags 0x1d
Try to find any solutions but only 1 article in VMware about change xorg service, but it is running fine on host.
/etc/init.d/xorg status
Xorg is running
I can see configuration in Web client, and everything seems to be ok for VM configuration.
Is someone have any idea about this issue.
Did open a ticket to Cisco to get some help from support, but also want to get maybe some idea from NVidia forum people.
Thanks.