Some questions regarding GRID

Hello, I am interested in using GRID to create vGPUs to use on multiple VMs via (possibly on) a host computer. Looking at KVM as the hypervisor as I am using Linux (Ubuntu not RHEL though) and likely Ubuntu on the VMs. Also considering XenServer or vSphere. I had a few questions of which I am wondering if you either have an answer for, or could point me to a good reliable source.

How secure is each vGPU - I am thinking about when the software creates multiple vGPUs off the host GPU and allocates them to separate VMs, can resources on the GPU be hacked or accessed from one vGPU to another (from within the VMs)?

If not, what sort of security is used?

How does GRID allocate the GPU resources to each vGPU, for example can it be evenly split ~16 times and balanced evenly, or dynamically allocated so if some are being underused, another one can draw more resources?

Looking at packages and pricing, does one vGPU creation off a host count as one CCU (requiring subscription and licensing from Nvidia) or is one host GPU one CCU regardless of how many vGPUs for VMs created?

Thank you in advance, and hopefully someone can provide some answers, no doubt they are out there, but I am having some difficulty finding anything concrete and explicit.

Hi

Answers below:

It’s as secure as sharing the same CPU and Memory from the underlying physical server when using a Hypervisor, if not even more so. If you’re using a VDI deployment model where each user has a dedicated OS, then each VM has dedicated framebuffer allocated to it, but all VMs share the GPU for processing cycles (like the CPU of the physical server). If you’re running something like RDS / XenApp with multiple users all accessing the same VM, then the resources allocated to that VM are shared between all users. Therefore, the VDI approach is more secure.

Answer partly covered in the above. You have the ability to allocate portions of the framebuffer (known as profiles) to VMs. Each portion of framebuffer is dedicated to that VM. The size of the profiles can vary depending on the model of GPU used, as they have different sizes of framebuffer (M60 = 8GB / P40 = 24GB / V100 = 32GB / T4 = 16GB (This list of GPUs is not exhaustive …)). The rest of the GPUs resources are shared amongst the VMs by scheduling algorithms that help to control performance. There are 3 schedulers that can be configured (Best Effort / Fixed Share / Equal Share) that apply to all VMs using that specific GPU.

vGPU is licensed as CCU. A license is allocated to each VM once it has powered on and is released when the VM powers off. The user logging in or out does not affect licensing, meaning that if you simply disconnect your session or log out, a license will still be allocated as the VM is still powered on. The licensing is not profile specific, meaning that a 1GB profile costs the same as a 32GB profile. The difference is that (assuming a V100 32GB) you can get 32 x 1GB users or 1x 32GB user on the same GPU (obviously other multiples are available).

More detailed information is available here: NVIDIA Virtual GPU (vGPU) Software Documentation

Regards

Ben

Thank you for your quick reply