About licensing the vGPU, if I have 2x XenApp vms running on VMware where each vm will receive a T4-8A profile and each vm should receive about 10 users, what is the right way to license it?
Should be 2 vApps licenses considering that I have 2 vms or 20 vApps licenses considering that is the number of concurrent users logged on XenApp.
I’m not an official VMware representative nor affiliated (so feel free to double-check with one) but as far as I have understood the Vmware licensing model it’s the number of the concurrent users when used in Citrix Xenapp (aka Virtual apps) where multiple users share 1 vGPU while the per-machine device model applies to scenario’s such as e.g Xendesktop where each content creator uses his own dedicated machine and thus has its own dedicated vGPU.
To answer your question, yes, there’s a huge difference between 2012 and 2019, it’s a huge jump, but there’s more to it than that. The Application requirements and how Users work have also moved on as well. Users running higher multiples of tabs in browsers, more applications open at the same time, newer versions of things like MS Office etc etc all contributes to increased resource usage.
You can’t build a hardware spec based on a software stack that’s 2 generations old, update to current versions of that software and expect the same density, which is why you need to allow headroom in your hardware Spec for newer software, or plan to run certain versions of software throughout the lifecycle of the environment before the next refresh.
If monitoring for GPU utilisation (framebuffer in your case), the best place to monitor is using nvidia-smi, as this is directly on the GPU. This will give you the most accurate results. Don’t forget, framebuffer is only one component you should be monitoring, there are lots of others to consider as well.
Regarding licensing, vGPU is per concurrent user, whether it’s RDSH or VDI. Each user “that is connected” needs a vGPU license. If you have 20 users, but only 10 are connected at a time, then you need 10 licenses. If all 20 are connected at the same time, you need 20 licenses. The only exception to this is vCS, where the software is licensed per GPU, not per user.
What a very interesting thread this is!
I was wondering if there are any significant changes in approach nowadays with the availability of the A16 GPU Models.
I need to re-platform a 1700 concurrent user CVAD (XenApp) solution.
Current thinking is a new set of DL380 gen11 servers with one or more GPU’s.
The narrative above all seem to drive towards a server with a few T4 cards and associating an 8A profile to each VM.
Could the same now be achieved by a single A16 64Gb card?
Any new thinking since we are 2 years further?
I would recommend to use 4 VMs with 16GB (A16-16A) each. Depending on the density you require you could think of 2 A16s to have 8 VMs per host.
Context switching on RDSH VMs is very high and could cause issues so you should try to reduce the number of concurrent running RDSH VMs on a single GPU.
Thanks for the response.
Further to the multi-session use case, what is the latest on multi monitor support when connecting to vGPU enabled RDSH VMs?
I recall that earlier on this could be probematic. The current solution has a mix of GPU and non-GPU enabled VMs, we are now considering standardising on GPU’s across the board, but need to understand the supportablity of multiple monitors.
Appreciate any insights?
Hi Ralf. It’s been a long time since I checked this thread and I don’t receive email notifications from its updates for some reason so this is why I see it only now but in answer to your questions:
I have been successfully running both A16 and T4’s in our server park without any multi-monitor problems over the past years and this system has worked rock-solid so far.
As long as you assign enough frame buffer memory to the virtual machines (I use 16GB per machine) that are your Citrix Virtual desktops and apps servers and make sure that the total framebuffer size per virtual machine is never exceeded by the total sum of users (I monitor actively using “GPU Profiler”) at any point in time you won’t have issues.
If that limit is exceeded however the server immediately bluescreens and resets, wiping all sessions from it so the trick is configure your Citrix load balancing correctly using Citrix policies and have headroom in all configs. Balance out enough ‘horizontally’ for that reason.
In answer to the question of re-platforming 1700 concurrent CVAD users I suppose this information will most likely come too late for you but here goes anyway for anyone reading this:
The main lesson that I learned over the past years using this design on a daily basis is that for optimization of high density (e.g Citrix CVAD) the trick to scaling is identifying the bottlenecks which are…drumroll…vGPU framebuffer and threaded cores. You’ll see that at the very beginning of this thread I asked Nvidia to make a successor to the T4 but with 64Gb framebuffer mem instead of 16 total and they did it: It’s called the A16.
So for that part the question becomes “which server allows for the most of those A16 cards to be used simultaneously?” as well as “which cpu offers the most threaded cores at a reasonable frequency?” and therein lies the main new insight. For my next design I would be absolutely go with the DL385 G11 series instead. The newest gen of AMD processors offer alot more scaling as well as the PCIE lanes required to host 4 A16 adapters in a physical host server whereas the DL380 series with Intel are limited to only 3 of those cards and too few threaded cpu cores to service many virtual machines without inducing a ridiculous slowdown.
Practically my new future design would boil down to this: A battery of DL385 G11 servers running VMware ESX, each of which holds 4 A16 cars split into 16GB profiles per machine, allowing for 16 virtual Citrix CVAD servers per physical host server and then a matching dual cpu setup that allows for (minimum!) of 8 to 12 or 14 cores per virtual machine. That will be the highest density imho and best suited for dynamic scaling. NO overcommitting of cpu’s !!
In addition I’ll say this up front already: Out of sheer experience I’ve learned that CVAD density users for office work etc (non-cad or other graphical or specific needs) basically use very little actual gpu power (as in: the processing chip on the card) but have never enough framebuffer memory which becomes the biggest limiting factor in the end. Therefore the day that Nvidia releases a new successor to the A16 with the exact same gpu computing power but holding 128GB or 256GB framebuffer memory is the day you should starting using those new cards instead as the user’s need for that goes up year after year.