Adding GPU to existing UCS cluste

Archie · May 10, 2020, 2:48am

Hi Everyone,

We currently have a UCS cluster of 8 HX220 servers running VMware Vsphere with Citrix for our VDI. The servers are a couple years old and recently I asked Cisco for a design to expand the cluster so we can add M10 cards. Every design they’ve come back with is a "new" cluster. When I asked why can’t I just add the GPU capabilities to the existing I was told it won’t work. The last design we had was adding 10 nodes with 2 M10 cards in each, so I would have to have one cluster for non-gpu workloads and the other for GPU. I mentioned well I could reduce the number of M10’s needed and I’ll purchase T4 cards for the exiting servers even if they don’t scale as well at least I’d have some GPU for the existing desktops on the existing cluster.

I was advised that the hypervisor that virtualized the desktop (not the server) is where there is a limitation. This doesn’t make sense to me, has anyone any similar experience in this with Cisco UCS servers?

Thanks for any insight if anyone can help.

MrGRID · May 10, 2020, 10:16am

Hi

If you’re using the M4 architecture then you’re completely out of luck, as the HX220 doesn’t support any GPUs, you’d need to use HX240, but be hugely limited on GPU options. If you’re using the M5 architecture then the HX220 will support either 1 or 2 T4s, so you could run 4 GPU accelerated XenApp Servers (splitting the T4s in half) or up to 32 XenDesktop VMs (assuming 1GB is enough framebuffer).

As a heads up, unless M10s are already deployed in an environment and you’re looking to add additional GPU capacity with the same architecture (regardless of the OEM Server), I would strongly advise you (and anyone else) avoid purchasing them for a production environment at this point of their lifecycle and look at current generation GPUs. The T4 will scale as well as the M10, it’s just that the T4 has half the amount of Framebuffer, so you need 2 of them to equal a single M10s.

If you’re going to retro fit GPUs into a Server, unless you’ve previously accounted for this with the specification when purchasing (which I’m assuming wasn’t done, otherwise the HX240 would have been used), it’s best to start clean. Installing GPUs typically requires larger PSUs and GPU enablement Kits (low profile CPU heat-syncs and custom power cables (as (certainly with Cisco) the GPUs have to be Passively cooled)) and sometimes different PCIe Riser configurations. CPU and Memory configuration is also a huge consideration due to the way in which the GPU changes the way resources are allocated. Although it can obviously be done, retro fitting GPUs into Servers that weren’t originally spec’d to run them is something I always try and avoid, and I’ll never recommend it (unless as mentioned, the Server had been correctly spec’d to accommodate GPUs after purchasing).

My advice, take this opportunity and create a new Cluster with Servers that were correctly spec’d to support GPUs from the outset. A HX240 M5 will support up to 6 T4s, 3 times the density of the HX220 M5. One reason that Cisco may have come back with a 10 Server design, is because their 2U Servers only support 2 double wide GPUs, where as other OEMs support at least 3, which can make a huge difference to user density, architecture, licensing and overall cost.

"The Hypervisor that virtualised the desktop" … So VMware then? I’m not sure which limitation they’re referring to, but ideally your Hosts in that type of Cluster should be the same GPU hardware. I’m assuming you’re using MCS / PVS?

Regards

MG

Archie · May 10, 2020, 7:05pm

Thank you very much for the detailed reply much appreciated. We are using the M5 architecture so our HX220’s can take 2 - T4’s each. We are going to try and run mostly hosted shared desktops off of Server 2016/2019 and only in use pooled desktops in use cases that absolutely need them.

We use MCS, and Cisco’s designs are all M5 HX240 nodes for the "new" cluster and again I simply thought it would be a matter of adding the HX240’s into the existing cluster that would then allow us to provision GPU resources within. We thought even if our existing non-GPU "task worker" and moving some of those to the citrix cloud to free up CPU, RAM, and Storage if needed again just to add GPU. It is at this point that Cisco is saying there is a limitation of the desktop hypervisor, so they haven’t said it isn’t possible. I feel like I’m being oversold a bit, and although I understand a new cluster is a preferred design I’m trying to squeak as much ROI out the existing cluster thats all.

We don’t know what size of frame buffer will work, we are post secondary institution so we are making assumptions based off of the physical spec’s of those programs that use software that requires GPU (AutoCad, Revit, ArcGIS). So we’re not entirely sure of the total amount of GPU horse power they’ll need for learning outcomes. We plan to start small and then scale if the user experience and performance demands it, but we need to start somewhere.

So I didn’t want to retro-fit I completely ok with adding new nodes I just wanted to attach them to the existing cluster is all so to expand some GPU potentially those that already exist. Some of the existing use cases do have some small measure of application where GPU allocation may benefit their outcomes.

Just for record the expansion we are planning going from starter assumptions is about 250 CCU that may end up with a 2GB FB and 200 CCU at 1GB. Because so many things are unknown of where this will need to scale that’s sort of our starting point in the expansion. Cisco has been very good and trying to respond based on requirements provided but I still feel I’m missing pieces to explain it to our management properly.

One last thing, the M10’s are they due for a new generation or is the T4 that ? I understand the T4 2:1 ratio compared to M10’s I just thought it was more bang for the buck going with the M10 for density. However if they are going out soon then ya buying them now might make scaling in a year or 2 become tricky (new clusters again lol) if they are no longer available for sure.

Again thank you for your reply it’s very helpful in helping me understand and additional comments are much welcomed.

MrGRID · May 11, 2020, 8:43am

Hi

Is there an issue with creating a new Cluster? Apart from the requirement for a minimum amount of Hosts (I’m assuming you’re running Hyperflex) but you have that covered with Ciscos recommendation. The existing vCenter will run multiple Clusters so no complication there. Citrix will work across Clusters, just add the new location into Studio. I’m assuming you already have the correct vSphere licenses (Enterprise Plus) for your existing Cluster, but you’ll obviously need additional vSphere Enterprise Plus licenses for the new Cluster, unless you’re running (vSphere for Desktop). So I’m not initially sure what the issue is with creating a new Cluster, and it’s actually not a bad idea to have more than one …

What are the Specs of your existing HX220 Hosts? (PSU Rating / CPU (Cores and Clock (or model number and I’ll look it up) / RAM / Disk / Network). Have you checked to see if you can fit 2 T4s in them (just to make sure 2 PCIe slots are available)?

What’s meant by the "desktop Hypervisor"? Are they referring to VMware? I don’t see the relevance unless it’s out of date? Which version of vSphere / vCenter are you running?

Which version of Citrix are you running? (MCS is my preferred method of deployment, glad you’re using it).

Hosted Shared Desktops (XenApp) - Don’t use Windows 2016, it’s really old! If you’re building these XenApp VMs from scratch, then you should be using the latest version of Windows 2019. If you’re running “Microsoft Volume Licensing” then there’s an April 2020 fully patched version sitting on the VLSC Portal ready for use. Regarding XenApp VM specs, now that you’ve mentioned which applications you’re running, the M10 is no longer an option for you regardless of any Server. The Applications you’ve listed are also reliant on CPU Clockspeed, the higher the better and typically nothing less than 3.0Ghz, so make sure any new Server purchases have a 3.0Ghz+ Clock. You should be looking at the T4 at a minimum. Using vGPU, you’ll need the 8Q Profile (Not 8A) for each XenApp VM and you’ll get 2 of those per T4. This will require QvDWS licensing (because of your Applications), but as you’re an Educational facility, speak to your IT distributor and you will be able to get EDU pricing for the vGPU licenses (it makes a HUGE difference). The reason this is important, is that vGPU is licensed per CCU, so if you have 250 CCU, you’ll need 250 QvDWS licenses. And yes, you need QvDWS licensing with these applications, vApps won’t give the performance needed. Regarding CPU and RAM, assuming you have adequate resources and can retro fit 2 T4s into the HX220, you’re effectively splitting the Server into quarters to support 4 XenApp VMs. Start with the following specs for each VM and then modify 8 vCPU / 32GB RAM / T4-8Q. This spec will typically support 20 – 25 Users (sometimes more / less), but this is application dependant and the applications you’re planning to run are typically delivered using XenDesktop (Not XenApp) to give consistent performance, so your XenApp density will probably be lower, but it depends how the Applications are used. Start with those specs and see how you get on. The only vGPU options you have with XenApp are 8Q or 16Q. You wouldn’t go lower than 8Q due to Framebuffer limits, and 16Q would mean only 1 VM per T4.

Pooled Desktops (XenDesktop) – Start with 4 vCPU, 8GB RAM and T4-2Q per VM. Scale up resources from there depending on requirements.

I assume you know about the hard RAM allocation when running vGPUs? It’s fixed per VM and the entire amount is hard allocated on boot. This is one reason it’s important to spec the Servers correctly when purchasing with GPUs as it changes the way in which resources are used.

Rather than let Cisco talk you into a new purchase at this point, you should be looking to run a POC for the XenApp and XenDesktop VMs so you know what size profiles you need (CPU / RAM / vGPU). Then you can spec your new Servers correctly to fit the maximum amount of Users on them and allowing a little headroom for future performance and additional application requirements. There’s nothing worse than buying a new platform to meet your current requirements, then in 6 months or a year later when new applications are installed, realising that you didn’t allow enough performance headroom to support them.

What is your total planned CCU density for XenApp & XenDesktop? 10 servers sounds a lot just to support 250 CCU …

Unless you have reasons for not wanting to, it actually makes sense to run multiple Clusters if you have the resources to do so, one for your XenApp VMs the other for your XenDesktop VMs. That way you keep better control of the resource allocation from each vSphere Host for a more consistent experience.

Unfortunately I can’t comment on GPU EOL / EOS. All I’ll say is that it’s not a good idea to buy and build a new Server platform around a GPU that’s 4 years old. There are technical reasons why you wouldn’t want to use them as well due to their architecture. Basically, stick with the T4 as a minimum, design around that and you won’t go too far wrong.

Regards

MG

Archie · May 12, 2020, 1:09am

No after your explanation I think that better sells the idea of a new cluster a little better. The original thought process I was under was basically in 2 pieces 1) KISS keep it simple, less provisioning of new VM hosts etc., 2) I wanted to understand why when sold a "scalable" solution originally how it could be so complex to access that original scalability model (that’s my own OCD kicking in there).

The existing vCenter will run multiple Clusters so no complication there. Citrix will work across Clusters, just add the new location into Studio. I’m assuming you already have the correct vSphere licenses (Enterprise Plus) for your existing Cluster, but you’ll obviously need additional vSphere Enterprise Plus licenses for the new Cluster, unless you’re running (vSphere for Desktop). So I’m not initially sure what the issue is with creating a new Cluster, and it’s actually not a bad idea to have more than one …

Good call we have standard licensing on the existing nodes right now so we’ll upgrade to Enterprise Plus. Your explanation makes sense to me and the idea of a new cluster based on that is no longer a bottle neck in my mind now.

What are the Specs of your existing HX220 Hosts? (PSU Rating / CPU (Cores and Clock (or model number and I’ll look it up) / RAM / Disk / Network). Have you checked to see if you can fit 2 T4s in them (just to make sure 2 PCIe slots are available)?

Each node is HX220c-M5sx, dual 2.6 6132 processors, 756GB (766) max ram, 6 x 1.8TB drives. I looked up the specs here https://www.nvidia.com/en-us/data-center/resources/vgpu-certified-servers/#utm_source=shorturl&utm_medium=referrer&utm_campaign=grid-certified-servers

So its good for 2x T4 per node, and it looks like the HX240x M5 can take 6x per node ?

Hosted Shared Desktops (XenApp) - Don’t use Windows 2016, it’s really old! If you’re building these XenApp VMs from scratch, then you should be using the latest version of Windows 2019. If you’re running “Microsoft Volume Licensing” then there’s an April 2020 fully patched version sitting on the VLSC Portal ready for use. Regarding XenApp VM specs, now that you’ve mentioned which applications you’re running, the M10 is no longer an option for you regardless of any Server. The Applications you’ve listed are also reliant on CPU Clockspeed, the higher the better and typically nothing less than 3.0Ghz, so make sure any new Server purchases have a 3.0Ghz+ Clock. You should be looking at the T4 at a minimum. Using vGPU, you’ll need the 8Q Profile (Not 8A) for each XenApp VM and you’ll get 2 of those per T4. This will require QvDWS licensing (because of your Applications), but as you’re an Educational facility, speak to your IT distributor and you will be able to get EDU pricing for the vGPU licenses (it makes a HUGE difference). The reason this is important, is that vGPU is licensed per CCU, so if you have 250 CCU, you’ll need 250 QvDWS licenses. And yes, you need QvDWS licensing with these applications, vApps won’t give the performance needed. Regarding CPU and RAM, assuming you have adequate resources and can retro fit 2 T4s into the HX220, you’re effectively splitting the Server into quarters to support 4 XenApp VMs.

We are definitely looking at server 2019 for brand new builds.

Start with the following specs for each VM and then modify 8 vCPU / 32GB RAM / T4-8Q. This spec will typically support 20 – 25 Users (sometimes more / less), but this is application dependant and the applications you’re planning to run are typically delivered using XenDesktop (Not XenApp) to give consistent performance, so your XenApp density will probably be lower, but it depends how the Applications are used. Start with those specs and see how you get on.

Those are spec’s now, except we are at 36GB RAM and don’t have a vGPU profile. Same mindset as you suggest need to try it and see what the real user experience is like and so on.

you should be looking to run a POC for the XenApp and XenDesktop VMs so you know what size profiles you need (CPU / RAM / vGPU). Then you can spec your new Servers correctly to fit the maximum amount of Users on them and allowing a little headroom for future performance and additional application requirements. There’s nothing worse than buying a new platform to meet your current requirements, then in 6 months or a year later when new applications are installed, realising that you didn’t allow enough performance headroom to support them.

What is your total planned CCU density for XenApp & XenDesktop? 10 servers sounds a lot just to support 250 CCU …

I think this is solid advice, running a PoC with some T4’s is the way to go and then make expansion plans based on outcomes from that we’ll have a better idea. We really don’t know our CCU count, we have around 500 physical workstations currently but the number of unique users is 4x times that amount but they are not all using them at the same time so this expansion to measure actual usage patterns is somewhat of a guessing game for us at this point as well.

Unfortunately I can’t comment on GPU EOL / EOS. All I’ll say is that it’s not a good idea to buy and build a new Server platform around a GPU that’s 4 years old. There are technical reasons why you wouldn’t want to use them as well due to their architecture. Basically, stick with the T4 as a minimum, design around that and you won’t go too far wrong.

I think you sold me, if the M10 is 4 years old it’s pretty sure it will be discontinued or replaced in short order vs the T4 which is from 2018.

MrGRID · May 12, 2020, 8:08am

Hi

If your HX240 Nodes are only running XenApp / XenDesktop VMs, then you can use "vSphere for Desktop" licensing, which is cheaper than Enterprise Plus, but you can only run desktop workloads on those vSphere Hosts. With vCenter you can use Standard licensing.

The HX220 with a T4 will be ok for a small, controlled POC to understand initial utilisation and configuration and will give you a rough idea about what your new VM specs and Cluster configuration will look like. It may be that you decide to scale up, not out, in which case you might use the 16Q profile with 1 XenApp VM, making the XenApp VMs have a higher density but have fewer to manage over all. Framebuffer is typically what you’ll run out of first when sizing with 8Q vGPU profiles on XenApp, but you obviously have 2 of them compared to 1 16Q profile. It’s the same amount of physical resource, you’re just distributing it in a different way and it can impact user density so it’s worth investigating whether 2x 8Q VMs vs 1x 16Q VM works for you. The other advantage of having less VMs is Windows licensing, just something to be aware of depending on which Microsoft licensing model you use.

The HX240 will support 6 T4s, but this obviously depends on PCIe Riser configuration and whether you’re running anything else in those PCIe slots. Another benefit of the HX240 are the additional storage configurations you can use due to having more onboard disk capacity.

Regarding platform usage … If you look through (Citrix) Director history it will tell you your loadings. You’ll then have an idea of how many and when people use the platform. The other thing is how many Citrix licenses you have, that’s your Max CCU limit and will give you a hard number to work with. You could have a look through the Citrix User Profiles (I assume you’re running UPM), and see when the people last logged on, obviously making calculations to accommodate the way in which people are currently working (COVID).

Regards

MG

Archie · May 12, 2020, 2:01pm

I really appreciate your advice, its been a good sounding board for me to figure my way through this initial decision. Yes we’re going to PoC, its about learning what we need to know then to expand up or out whichever makes sense.

Thanks again, I’m sure I’ll have more questions in the future :)

MrGRID · May 12, 2020, 3:14pm

Hi

No problem at all, glad it’s been a useful discussion

Best of luck with the POC, please let us know how you get on!

Regards

MG

Topic		Replies	Views
Looking for advice on optimal config for latest-gen Citrix Xenapp vGPU solution XenApp	67	21036	March 20, 2024
XenApp & Nvidia Tesla M10 profiles XenApp	5	13315	August 16, 2017
Putting best foot forward General Discussion	3	1987	February 20, 2020
vGPU for AutoCAD/RDSH questions General Discussion	17	12548	December 10, 2020
Suitable GRID NVIDIA Virtual GPU Technology	14	14966	June 14, 2017
Autodesk Revit and User density XenApp	15	23642	October 27, 2015
Successful XenApp GRID deployments XenApp	21	43256	October 6, 2014
GPU-Accelerate >>one<< RDSH (Windows Server 2019) on VMWare Essentials Plus General Discussion	17	8123	January 11, 2024
Physical GPU shared between user/license types NVIDIA Virtual GPU Technology	35	45649	April 30, 2017
ideal GPU card for Citrix Xenapp/XenDesktop/ Published Desktops XenApp	1	2824	June 20, 2019

Adding GPU to existing UCS cluste

Related topics