Looking for advice on optimal config for latest-gen Citrix Xenapp vGPU solution

fabricio.bronzati · March 14, 2021, 2:48am

Hi guys,

About licensing the vGPU, if I have 2x XenApp vms running on VMware where each vm will receive a T4-8A profile and each vm should receive about 10 users, what is the right way to license it?

Should be 2 vApps licenses considering that I have 2 vms or 20 vApps licenses considering that is the number of concurrent users logged on XenApp.

Best
Fabricio

Profundido · March 15, 2021, 1:46pm

Fabricio,

I’m not an official VMware representative nor affiliated (so feel free to double-check with one) but as far as I have understood the Vmware licensing model it’s the number of the concurrent users when used in Citrix Xenapp (aka Virtual apps) where multiple users share 1 vGPU while the per-machine device model applies to scenario’s such as e.g Xendesktop where each content creator uses his own dedicated machine and thus has its own dedicated vGPU.

MrGRID · March 23, 2021, 8:47am

To answer your question, yes, there’s a huge difference between 2012 and 2019, it’s a huge jump, but there’s more to it than that. The Application requirements and how Users work have also moved on as well. Users running higher multiples of tabs in browsers, more applications open at the same time, newer versions of things like MS Office etc etc all contributes to increased resource usage.

You can’t build a hardware spec based on a software stack that’s 2 generations old, update to current versions of that software and expect the same density, which is why you need to allow headroom in your hardware Spec for newer software, or plan to run certain versions of software throughout the lifecycle of the environment before the next refresh.

If monitoring for GPU utilisation (framebuffer in your case), the best place to monitor is using nvidia-smi, as this is directly on the GPU. This will give you the most accurate results. Don’t forget, framebuffer is only one component you should be monitoring, there are lots of others to consider as well.

Regarding licensing, vGPU is per concurrent user, whether it’s RDSH or VDI. Each user “that is connected” needs a vGPU license. If you have 20 users, but only 10 are connected at a time, then you need 10 licenses. If all 20 are connected at the same time, you need 20 licenses. The only exception to this is vCS, where the software is licensed per GPU, not per user.

Regards

MG

karsayor · March 26, 2021, 9:46am

Ok, very good thank you !

ralf11 · September 29, 2023, 3:27am

Hi,

What a very interesting thread this is!
I was wondering if there are any significant changes in approach nowadays with the availability of the A16 GPU Models.

I need to re-platform a 1700 concurrent user CVAD (XenApp) solution.
Current thinking is a new set of DL380 gen11 servers with one or more GPU’s.
The narrative above all seem to drive towards a server with a few T4 cards and associating an 8A profile to each VM.

Could the same now be achieved by a single A16 64Gb card?
Any new thinking since we are 2 years further?

Any help much appreciated.

Cheers,
R

sschaber · September 29, 2023, 6:13am

Hi Ralf,

I would recommend to use 4 VMs with 16GB (A16-16A) each. Depending on the density you require you could think of 2 A16s to have 8 VMs per host.
Context switching on RDSH VMs is very high and could cause issues so you should try to reduce the number of concurrent running RDSH VMs on a single GPU.

ralf11 · October 4, 2023, 12:10am

Thanks for the response.
Further to the multi-session use case, what is the latest on multi monitor support when connecting to vGPU enabled RDSH VMs?
I recall that earlier on this could be probematic. The current solution has a mix of GPU and non-GPU enabled VMs, we are now considering standardising on GPU’s across the board, but need to understand the supportablity of multiple monitors.
Appreciate any insights?

Cheers,
Ralf

Profundido · March 20, 2024, 3:07pm

Hi Ralf. It’s been a long time since I checked this thread and I don’t receive email notifications from its updates for some reason so this is why I see it only now but in answer to your questions:

I have been successfully running both A16 and T4’s in our server park without any multi-monitor problems over the past years and this system has worked rock-solid so far.

As long as you assign enough frame buffer memory to the virtual machines (I use 16GB per machine) that are your Citrix Virtual desktops and apps servers and make sure that the total framebuffer size per virtual machine is never exceeded by the total sum of users (I monitor actively using “GPU Profiler”) at any point in time you won’t have issues.

If that limit is exceeded however the server immediately bluescreens and resets, wiping all sessions from it so the trick is configure your Citrix load balancing correctly using Citrix policies and have headroom in all configs. Balance out enough ‘horizontally’ for that reason.

In answer to the question of re-platforming 1700 concurrent CVAD users I suppose this information will most likely come too late for you but here goes anyway for anyone reading this:

The main lesson that I learned over the past years using this design on a daily basis is that for optimization of high density (e.g Citrix CVAD) the trick to scaling is identifying the bottlenecks which are…drumroll…vGPU framebuffer and threaded cores. You’ll see that at the very beginning of this thread I asked Nvidia to make a successor to the T4 but with 64Gb framebuffer mem instead of 16 total and they did it: It’s called the A16.

So for that part the question becomes “which server allows for the most of those A16 cards to be used simultaneously?” as well as “which cpu offers the most threaded cores at a reasonable frequency?” and therein lies the main new insight. For my next design I would be absolutely go with the DL385 G11 series instead. The newest gen of AMD processors offer alot more scaling as well as the PCIE lanes required to host 4 A16 adapters in a physical host server whereas the DL380 series with Intel are limited to only 3 of those cards and too few threaded cpu cores to service many virtual machines without inducing a ridiculous slowdown.

Practically my new future design would boil down to this: A battery of DL385 G11 servers running VMware ESX, each of which holds 4 A16 cars split into 16GB profiles per machine, allowing for 16 virtual Citrix CVAD servers per physical host server and then a matching dual cpu setup that allows for (minimum!) of 8 to 12 or 14 cores per virtual machine. That will be the highest density imho and best suited for dynamic scaling. NO overcommitting of cpu’s !!

In addition I’ll say this up front already: Out of sheer experience I’ve learned that CVAD density users for office work etc (non-cad or other graphical or specific needs) basically use very little actual gpu power (as in: the processing chip on the card) but have never enough framebuffer memory which becomes the biggest limiting factor in the end. Therefore the day that Nvidia releases a new successor to the A16 with the exact same gpu computing power but holding 128GB or 256GB framebuffer memory is the day you should starting using those new cards instead as the user’s need for that goes up year after year.

I hope this helps

Topic		Replies	Views
vGPU for AutoCAD/RDSH questions General Discussion	17	12467	December 10, 2020
Adding GPU to existing UCS cluste General Discussion	7	3163	May 12, 2020
XenApp & Nvidia Tesla M10 profiles XenApp	5	13293	August 16, 2017
Which NVIDIA GPUs are more suitable for high-performance computing? CUDA Programming and Performance	33	2723	November 13, 2024
Suitable GRID NVIDIA Virtual GPU Technology	14	14934	June 14, 2017
13 months with NVidia GRID and XenServer XenDesktop	27	42196	February 15, 2017
GPU-Accelerate >>one<< RDSH (Windows Server 2019) on VMWare Essentials Plus General Discussion	17	8014	January 11, 2024
TITAN X CUDA Programming and Performance	35	10411	March 23, 2015
Autodesk Revit and User density XenApp	15	23633	October 27, 2015
Putting best foot forward General Discussion	3	1980	February 20, 2020

Looking for advice on optimal config for latest-gen Citrix Xenapp vGPU solution

Related topics