Can't power on another vGPU enabled VM

gbrunner · February 21, 2018, 3:07pm

Hello,

I sometimes get the following error if I power on another VM (10 VMs per host are running):

Could not initialize plugin /usr/lib64/vmware/plugin/libnvidia-vpx.so for vGPU " passthrough device ‘pciPassthru0’ vGPU ‘grid_m60-2q’ disallowed by vmkernel: Out of memory"

The hosts have enough memory and vGPU resources left to power on the VM.
The support says it’s an known issue:

But why can I sometimes power on another VM, e.g. the eleventh and sometimes not? I think it’s another issue.
Does anyone have the same problem?

Best Regards

sschaber · February 22, 2018, 6:41am

Hi,

how much system memory have the VMs affected?

Regards

Simon

rjm4484 · February 22, 2018, 4:15pm

I have the same problem. In my environment with the profile I’m using I should be able to have 96 VM’s running. Sometimes with only 85 VM’s provisioned I still get the error you mention. After hammering Power On it will eventually power on. Sometimes I even have to delete another VM before I can power on my parent image.

Extremely frustrating.

gbrunner · February 26, 2018, 9:27am

5 VMs with 128 GB system memory
5 VMs with 16 GB system memory

In total 720 GB.

The hosts have 960 GB system memory.

sschaber · February 26, 2018, 1:21pm

So did you file a ticket with Nvidia and VMWare?

gbrunner · May 3, 2018, 6:59am

Hello,

yes, after the issue came back again i’ve opened Cases at Nvidia and VMware.

Nvidia Case: 00007591
The Nvidia Support Engineer didn’t find any errors on Nvidia side.

VMware says it’s an Known Issue of Nvidia.

Nvidia REF: 200060499

Best Regards
Georg

sschaber · May 3, 2018, 9:57am

Hi Georg,

yes it seems you hit the given issue but I disagree that this is a NV issue. From my understanding you need to "fully reserve memory" for the vGPU enabled VMs on ESX. This works until there is not enough system memory available for the hypervisor (VMKernel) any more. There seem to be no rule how much memory needs to be available for hypervisor and therefore only trial and error with reducing the allocated system memory to the VMs seems to help. I’ll try to get some advise what we can do here or if this is something that needs to be addressed from VMWare (which I believe) as I’ve never heard that the same issue occurs on other hypervisors.

regards

Simon

gbrunner · May 14, 2018, 1:31pm

Hello Simon,

any news from VMware?

Best Regards
Georg

Topic		Replies	Views
Could not initalize plugin /usr/lib64/vmware/plugin/libnvidia-vpx.so for vGPU NVIDIA Virtual GPU Technology	4	7678	July 25, 2017
One of 4 VM's w/ M60 vGPU's refusing to power up. NVIDIA Virtual GPU Technology	3	3743	July 3, 2018
K2 vGPU help NVIDIA Virtual GPU Technology	5	12946	May 17, 2016
VM configured with large memory fails to power on when vGPU is attached Horizon View vDGA horizon_vdga	2	10612	August 9, 2018
ESXi 6.7: vGPU VM cannot start if large memory VM with vGPU already started NVIDIA Virtual GPU Technology	3	1738	June 29, 2021
Is this expected behavior? Tesla M60 NVIDIA Virtual GPU Technology	3	6181	September 16, 2016
2 64 GB VMs sharing same GRID K2 GPU NVIDIA Virtual GPU Technology	5	10117	July 7, 2014
Unable to start VMs with VGPU General Discussion	10	4290	October 6, 2021
Nvidia VMware vSphere-6.7 NVIDIA Virtual GPU Technology	14	10367	August 19, 2019
Horizon 7.1 Unable to power on parent image - The amount of graphics resource available... NVIDIA Virtual GPU Technology	4	8953	June 28, 2017

Can't power on another vGPU enabled VM

Related topics