XenApp & Nvidia Tesla M10 profiles

Hi Community

This year, we will renew our Citrix infrastructure (XenServer & XenApp, in newest version, OS Windows Server 2016) and we are planning to use the "Nvidia Tesla M10" GPU.

We want to use XenServer Hosts with each 8 XenApp VMs and one Tesla M10 per XenServer host. On one XenApp VM, there are 11 users in normally case and 22 in worst case (one datacenter down - we need 100% redundancy).

Our users mainly are office workers. Our grafic applications, such as Adobe Photoshop, Indesign, and CAD applications, etc. are used on FAT clients.

So, I’m new in here and therefore I have some questions for my comprehension.

I know there are two ways to make the GPU available for our users: GPU Passthrough and vGPU

  • With GPU Passthrough the GPU is directly connected to the VM through the hypervisor (like physcal GPU?).
  • With GRID vGPU technology, the GPU is divvying up the GPU among multiple virtual machines.

So, I really don’t know exactly the technically difference between those. May you cann tell/explain me the difference.

Which way is the best one for us and why? What does it depend on?

Then I have a further question. By using Virtual GPU (vGPU) Types, which one should we take (M10-4Q?) What kind of profiles do I have with GPU Passthrough?

Thanks in advance for you help.

Hi Rus

Welcome to the forum!

Yes, Passthrough is directly attached to the VM, and no driver is needed in the Hypervisor. This is a 1:1 mapping between the VM and physical GPU, and no other VMs can use the specified GPU when it’s being used like this.

With vGPU, you need a driver in the Hypervisor and any VMs that you want to give GPU access to. vGPU is used to slice up FrameBuffer and create various profiles and add lots of other functionality and more flexibility.

On to your platform … Firstly, scale up, not out! …

Based on what you’ve mentioned above, if I were building it, I’d have 4 XenApp VMs, not 8. Increase the CPUs (not too much though) and RAM for each XenApp VM and give each one an 8A vGPU profile (vGPU profiles allow differing functionality depending on the letter at the end of the profile name (A,B,Q)). As the M10 has 4 GPUs on it, this makes more sense than having 8 XenApp VMs and giving 2 of them access to the same GPU. It’s also 50% less VMs to license and manage (unless you have datacenter licensing and are using MCS :-) ) and you have less CPU contention / scheduling for XenServer to deal with and you’ll have more RAM to use overall.

The number in the vGPU profile is the amount of FrameBuffer that profile will allocate. So in the above case with an 8A vGPU profile, you’ll allocate a full 8GB. The reason you’ll want to do that, is because each physical GPU on the M10 has 8GB, so by assigning all of that FrameBuffer, you give sole access of the underlying GPU to that specific (XenApp) VM, which stops other VMs from using it.

Page 4 of this will help with vGPU profiles: Virtual GPU Software User Guide :: NVIDIA Virtual GPU Software Documentation

What’s the spec of your XenServer hosts? (CPU, RAM, Network, Disk?)

Ben

Hi Ben

Thank you for your answers, they are very helpful.

Each XenServer host probably will own 2x CPUs of Intel Xeon Gold 6136 3.0G with 12 Cores/24T and 512GB of RAM. We have an amount of ~1600 users.

We also looked at the CPU Intel Xeon Gold 6154 3.0G with 18 Cores/36T.

You said it would make more sense to have 4 VMs per XenServer host but then I think we would have to buy too much of XenServer Hosts?

We asked a consultant and he said we should have ~10 XenServer Hosts with 8 VMs on each and a userload of ~20 users per XenApp-VM. In this case we would use the GRID Profile "M10-4A".

What do you think about this configuration?

With the second CPU he said we could have 12 VMs per XenServer host. With 8 VMs we have to overvommit the CPUs, but this is not recommended.

btw question: Is there a possibility to share the Tesla M10 GPU with 12VMs?

Hi Rus

That post above did make me chuckle (160 users on a single M10) :-)

I’ve given my answer above on why you shouldn’t split the GPU between multiple XenApp VMs, and my answer still stands. I’m not going to comment on the other persons opinion. However, if you’d like to see what happens to the user experience when you run 160 users on a single M10 then be my guest :-) I wouldn’t do it, and I certainly wouldn’t advise anyone else to either on a production environment. Yes, technically you can do it. I’m not sure if there’s a physical limit on how many users you can cram on to a single XenApp / RDS machine? But there comes a point where the user experience degrades as the resources are over utilized …

Rather than just talk random specs, have you carried out any type of evaluation or POC yet? … You need to if not, as it will answer many of your questions.

The CPUs you’re looking at (both of them) will be a complete waste with standard office users on a XenApp deployment. They’re designed for performance, not density, and you’re designing for the latter. Drop the speed, increase the Core count. There are 2.6Ghz 16 Core, 2.3Ghz 18 Core and 2.4Ghz 20 Core for very similar money to what you’ve mentioned above, these would be a better choice for this use case in my opinion and would give you more options. You’re looking to over-commit as little as possible to maintain the user experience.

Bottom line, if you want 160 users per physical server, you will need 2x M10 and more CPU Cores. (Refer to my comment in the first paragraph about "Yes, technically you can do it …")

You haven’t looked at that document I linked to above, have you ;-)

You can slice up the FrameBuffer into a few different profiles depending on requirements. You can run 32 individual VMs per M10 with a 1GB vGPU profile (I deliberately haven’t mentioned the 64 x 512MB profile). Or you can run 4 with an 8GB profile and a few different allocations in between. You cannot use what you don’t have and GPU FrameBuffer can’t be over-committed, although by using XenApp you can work around some of the vGPU profile limitations. With 8 VMs allocated with 4GB of FrameBuffer, the answer is no, you cannot run 12 VMs on a single M10. (12 x 4 = ???)

Regards

Hi Ben

I understand your recommendation - Thank you very much for your answers - they make sense to me.

So, if I understood correctly:
You would run 4 XenApp VMs, because in this case we can allocate the GPUs directly via passthrough and don’t have to "share" the GPU?

Would it make sense, if we would make 8 XenApp VMs, if we have two Nvidia Tesla M10? In this case we wouldn’t have to split it. As Long as I understood you correctly :-)

As I said, we have an amount of 1’600 users. So in our case we should run a XenServer host with 2x Nvidia Tesla M10 and 4 XenApp VMs (Maybe 8?), where every VM owns (more than) 40 users. (Overall 160 users per XenServer host)

That would mean that we’ll need 10 XenServer hosts for ~1600 users. We also have to warrant a 100% of redundancy, so over all we’ll need 20 XenServer hosts.

What do you think about the user load with 4 XenApp VMs each host (>40)? Do we have the need to have 20 hosts or is there a possibility to reduce the amount of XenServer Hosts? What would you recommend?

Of course, we consider the part with the CPU (Cores)

Thanks again for your Support!

Regards,
Rus

Hi Rus

Nearly.

I’d still use vGPU, but I’d use the 8A profile. You’ll get the benefit of monitoring through director and some other nice things that are on their way VERY VERY soon ;-)

Slightly off topic, but if you were using ESXi as your hypervisor and you used Passthrough, you would bind the VM to a specific GPUs PCIe address. So if that GPU failed or was unable to accommodate VMs for whatever reason, the VM would fail to start. Or if you were using MCS, you’d end up with all your XenApp VMs trying to use the same physical GPU, and all but 1 would fail to start. You’d then have to manually assign each one to another GPU, unless you used a clever script to automate that part. With vGPU, the system would start the VM on another functional GPU within the Pool. But as XenServer works in a different way, this doesn’t apply to you. That said, I’d still use vGPU :-)

As for 4 or 8x VMs with 2x M10s, that’s an easy decision. I’d go for 8 VMs with an 8A profile on each. The M10 isn’t an expensive GPU, so adding another to the same server to double the density doesn’t add much overall cost.

Your CPU choice will impact your server density as well. If you’re going to put 2x M10s on a server, you want the CPU to match so you don’t run out of CPU resources. Don’t forget the hypervisor needs CPU cyclces to perform it’s duties as well, and the more load you put on it, the more resources are required …

Look at the options I’ve mentioned above, have a think about the 20 Core 2.4Ghz (Xeon Gold 6148 (if you’re interested)). You could then see how allocating 10 cores to each VM performs. This is a 2x over-commit on the physical Cores, however you still stay within the bounds of the CPUs thread count (which in total is 80 (2x Sockets)), so it’s pretty good!

However, bear in mind that testing may show that all 10 cores don’t get used before you run out of FrameBuffer or RAM before hand. This is why testing through a proper POC is so important. You want all resources to max out at the same time so you don’t over / under spec and waste resource.

Memory wise you’ve mentioned 512GB, that’s 64GB for each of the 8x VMs, plus the hypervisor will take 10+GB of that so it’s less than 64GB (per VM), however this will still be plenty. Again, testing during your POC will show you the best way to spec them up.

The more users you add to a single VM, the more chance there is of a single user impacting all others through a task (watching a YouTube video, multiple heavy browser tabs open etc etc). 20 is a nice number per VM, and it scales quite nicely with 8x VMs per physical host and 1600 users total. VM management isn’t an issue if you use MCS / PVS so it’s no problem to have more VMs (as long as you have the appropriate windows licensing in place). Spreading the user load across the VMs will give better hardware utilisation over all, rather than trying to cram on double the amount of users on to half the amount of VMs.

How you plan for redundancy is up to you, you know your infrastructure better then anyone else. Are you single or multi-site? Do you want to plan for site redundancy or just an N+* (* = the number of spare hosts). There are lots of options to look at, however I can’t cover them here as I don’t know your infrastructure design :-)

It’s pretty clear that you’ll need 2x M10s per host, and as said, you’ve already mentioned 512GB RAM, so the only option you really need to consider is the CPU. You could very easily get 3 servers with 2x M10s, 512GB RAM and the 3 CPU choices I’ve mentioned above, then try all 3 in a POC and see where any weaknesses / strengths are. You can then choose the best overall specifications for your design.

Regards