Issues starting VMs with K220Q and K240Q in the same server

BorjaSena · November 12, 2015, 7:49pm

We are testing Horizon View 6 and K2.
During the tests, we have run into a problem.
The server has 4 machines running with K220Q and when I try to boot a machine with K240Q tells us that no resources available.

Our supposition is that each running K220Q are working each in a different GPU and therefore can not start the K240Q.
One of the questions is how can I see which virtual machine is running on which GPU?
Also, I would like to know if there is some how to move a vGPU to a different GPU.

The server configuration is:

DELL PE730
2x Nvidia K2
10x SSDs
256GB RAM

Can anyone help my?

Regards,
BS

tobias.kreidl · November 12, 2015, 7:55pm

The "nvidia-smi" utility on the host server may offer some help. Run "nvidia-smi -h" to get a full list of options.

idingsdale · November 17, 2015, 10:26am

Hi BS

Your hunch is correct - by default VMs will be spread amongst the physical GPUs. This snippet from NVIDIA’s documentation explains it and the workaround

"1.7 MODIFYING GPU ASSIGNMENT FOR VGPUENABLED
VMS
VMware vSphere Hypervisor (ESXi) by default uses a breadth-first allocation scheme for
vGPU-enabled VMs; allocating new vGPU-enabled VMs on an available, least loaded
physical GPU. This policy generally leads to higher performance because it attempts to
minimize sharing of physical GPUs, but in doing so it may artificially limit the total
number of vGPUs that can run.
ESXi also provides a depth-first allocation scheme for vGPU-enabled VMs. The depthfirst
allocation policy attempts to maximize the number of vGPUs running on each
physical GPU, by placing newly-created vGPUs on the physical GPU that can support
the new vGPU and that has the most number of vGPUs already resident. This policy
generally leads to higher density of vGPUs, particularly when different types of vGPUs
are being run, but may result in lower performance because it attempts to maximize
sharing of physical GPUs.
To switch to depth-first allocation scheme add the following parameter to
/etc/vmware/config:
vGPU.consolidation = true"

JasonSouthernNV · November 18, 2015, 9:11am

The solution above is correct, and unfortunately VMware’s current versions do not provide any means to view where the VM’s have been placed or to limit which vGPU profiles are run on which physical GPU’s. Today to control where a vGPU is placed if you have a single host you need to use the solution provided in to documentation and strictly control the VM startup and running count.

If you have multiple hosts you can manage placement by ring fencing hosts for particular vGPU profiles and using hte desktop pools to control placement on specific hosts, but it’s not very granular or resource efficient.

Tobias has a good point that Nvidia-SMI will allow you to see where the vGPU’s are placed on the physical GPU, but you have to do that at the CLI, and it still doesn’t allow you to control placement of specific profiles onto specific physical GPU’s.