M60 GPU Passthrough Scaling

According to a chat with a Citrite, the http://testdrive.cloud.com site has GRID K2 GPUs on the back end and the Windows 10 VMs are running in passtrhough mode. A K2 in this configuration can purportedly handle around 20-30 sessions per engine, so is it fair to estimate that an M60 engine might be capable of handling something like 40-50 sessions in passthrough mode? Clearly, these couldn’t be super heavy-duty types of applications, but the demos available on that remote site were quite intensive and even running frequent tests at all sorts of times, I could not discern a time when performance was at all sluggish. Here’s a video I took using a Raspberry Pi 3B (Citrix/Viewsonic edition) running Citrix Receiver/TLXOS that showcases a couple of demos: http://bit.ly/29RQHyt

We’re interested in scaling for a student environment that would be running apps like Google Earth, ArcGIS and various AutoDesk products, and have two Dell R730 units, each containing a pair of Intel Xeon E5-2680 v4 14-core CPUs, 256 GB of RAM and an M60. Some guidelines on XenApp configurations for passthrough would be useful (number of VCPUs, RAM and if NUMA is a major concern or not). The thought was to run this under XenServer and allocate perhaps 12 VCPUs and 62 GB of RAM so that up to four XenApp VMs could eventually fit if we added a second M60 without first running out of CPU power. Is this an realistic configuration?

One other question – are there any good load testing routines out there to simmlate multiple users in passthrough mode?

Hi Tobias

Any reason why you want to use Passthrough rather than vGPU? With the K2, there were differences in features and performance between a Passthrough and K280Q, but with the M60, I’ve seen no real discernible difference in performance between Passthrough and 8Q vGPU, and CUDA (if you need it) is available through the vGPU profiles whereas it wasn’t with the K2.

Regarding NUMA, I’ve not run into any issues with that yet, I don’t believe it to be as much of a concern as it used to be due to technology and performance advances.

XenServer, love it, no issues there, especially the current one.

It is a realistic configuration, but (and I’m aware of who I’m talking to here, so I’m sure you’ll know all this ;-) ) there are plenty of variables to impact density and overall experience. The biggest variable you’ve listed is Autodesk, which has such a wide variety of products all with differing resource requirements. Depending on which Autodesk products you’re looking to use, you may struggle to get the density you’re looking at, especially if all 40-50 users decide to use that product at the same time. It also depends how the individual application is being used, the type of workload etc etc. I’ve seen a user get a published application, and stretch it across 2 4K monitors and start using it, the application then effectively ran at 8K… So there are lots of variables that will impact your capacity, and some that may still be unknown.

There are numerous ways to design a XenApp Site, so don’t think you have to cram it all on one VM. Best way is to have a go at it and see how you get on with capacity, what you’ve listed sounds like a good starting point. You may decide to silo off your heaviest apps onto dedicated resources for guaranteed performance and minimal impact on other sessions, or scale out instead of up. Lots of different options to play with, it depends what kind of experience you’re looking to deliver :-)

Hi, Ben: This is for students, so no matter what CAD/CAM sort of app they will be running, it won’t be anything fancy or overly complex. It’s also not like they are engineers who crank things out at lightning speed with constant flicks of their wrists; there will be a lot of pausing to look and learn, so the load won’t be that great nor that continuous from any individual, plus likely only a single monitor will be in use in most cases. With passtrough, you of course need to use the whole engine, so the idea is to maximize what you can do with it on a given server. I know some sites with 128 GB devoted to a single XenApp + GPU instance. As to why passthrough, there’s also the simple matter of cost. Its cheaper plus VMs are used inconsistently and it would be a waste of resources and more costly to pin vGPUs to a number of idle VMs and remove that availability from others; it also opens up better options for remote BYOD-type access.

Ultimately, you’re right that it’s probably going to take some experimentation just to see how it works out and some tweaking to maximize resources. Appreciate you feedback, of course!