I sometimes get the following error if I power on another VM (10 VMs per host are running):
Could not initialize plugin /usr/lib64/vmware/plugin/libnvidia-vpx.so for vGPU " passthrough device ‘pciPassthru0’ vGPU ‘grid_m60-2q’ disallowed by vmkernel: Out of memory"
The hosts have enough memory and vGPU resources left to power on the VM.
The support says it’s an known issue:
But why can I sometimes power on another VM, e.g. the eleventh and sometimes not? I think it’s another issue.
Does anyone have the same problem?
I have the same problem. In my environment with the profile I’m using I should be able to have 96 VM’s running. Sometimes with only 85 VM’s provisioned I still get the error you mention. After hammering Power On it will eventually power on. Sometimes I even have to delete another VM before I can power on my parent image.
yes it seems you hit the given issue but I disagree that this is a NV issue. From my understanding you need to "fully reserve memory" for the vGPU enabled VMs on ESX. This works until there is not enough system memory available for the hypervisor (VMKernel) any more. There seem to be no rule how much memory needs to be available for hypervisor and therefore only trial and error with reducing the allocated system memory to the VMs seems to help. I’ll try to get some advise what we can do here or if this is something that needs to be addressed from VMWare (which I believe) as I’ve never heard that the same issue occurs on other hypervisors.