Firstly, before you do anything hardware related, check over the application requirements to see how it uses the resources. It may be able to use multiple Cores, it may put more emphasis on the GPU or it may just want a single high speed core, or a bit of everything. If you can’t find it in the system requirements documentation, contact the software vendor and ask them directly, you can then look at appropriate hardware to support it.
If the application is single thread limited, then yes, to make life much easier for yourself, focus on the base Clock and forget Turbo. It’s obviously possible to get Turbo to work in a virtualised environment, but there are many hoops to jump through, and to be honest, it’s not worth the hassle. Then you’re into Single Core Boost vs All Core Boost (they have different levels of Boost depending on how many Cores Boost at the same time, and this is dependant on all sorts of variables as well) and trying to keep them engaged so you don’t get peaky performance, it’s just a real pain to work with when virtualising and it’s far easier just to opt for a high Base Clock from the start and not have to worry about it. For CAD applications that are predominantly single threaded, 3.0Ghz would be the minimum I’d work with whilst maintaining a balance vs Cores. For example, there are Xeons that have a higher base Clock, but you’re compromised on Core count. The best balance at the moment is the Gen 2 Scalable Xeon Gold 6254, which is 18 Cores @ 3.1Ghz. You can obviously trade up or down (Core vs Clock) depending on your requirements, but this is the sweet spot in my opinion.
Regarding vGPU Licensing … It’s really simple … vApps, vPC, QvDWS are all licensed Per CCU. So if you have 1800 users, but only 1000 connect to the platform at any one time, then you only need 1000 licenses. As for the type of license, if you’re running RDSH you’ll want vApps, for your normal single user Windows VMs, you’ll need vPC and for your 3D Workstations you’ll potentially want QvDWS depending on the application requirements. All of those are Per CCU.
Honestly … Forget the M10 at this stage. You can absolutely still buy it and it will be supported, but unless you already have them and are looking to scale out an already existing M10 deployment, give them a miss at this stage. They’re superseded 3 times architecture wise and are lacking in features and functionality compared to the current generation. From what you’ve said above about the majority of your user base being Knowledge Workers (Now referred to as “Digital Workers”) and that you don’t have any high-end CAD users, you should be looking at T4s for all of your workloads. You can use the same model of GPU, but use the Profiles to allow different amounts of performance, for example:
RDSH VMs will use the T4-8A and you’ll have 2 of those per T4.
Single User VMs will have either the T4-1B or T4-2B depending on your requirements, and you’ll have 8 or 16 of those per T4.
3D Single User VMs “could” have T4-4Q and you’ll have 4 of those per T4.
You mention HP above (I’m assuming this is your server platform), but don’t mention what your server hardware or generation is (DL380 Gx … ?). If you were planning on purchasing completely new HP hardware, then you can speak to your partner about the appropriate configuration, if retro fitting GPUs into existing servers, firstly make sure they’re supported, the BIOS and Firmware are fully updated, then purchase appropriate PSUs, PCIe Risers, GPU Enablement Kits (low profile heatsink for CPUs, and GPU power cables if needed) and potentially high-performance fans and also the GPUs. When working with vGPU, it’s important to have an up to date software stack, this includes Hypervisor, Management (vCenter in your case), Operating System and vGPU Drivers. This prevents a lot of issues and allows the best performance and functionality. Before proceeding with vGPU, you need to check your licensing situation. For VMware, at a minimum you’ll want vCenter Standard and vSphere Enterprise and ideally, you’ll be running 6.7U3.
Before setting up vGPU, make sure you get your vGPU License Server built and then vGPU licenses ordered through your Partner. Licenses can sometimes take up to 24hrs to arrive in your NVIDIA Portal, so best to get that kicked off from the start as your vGPUs have very limited (unusable) performance until licensed.
As per your question, the vGPU software comes in two parts. A .vib is required to be installed in each physical vSphere Host that will be running vGPU (this is referred to as the vGPU Manager), this will allow you to allocate vGPU profiles to your VMs through vCenter. The second part is a driver that goes inside the Windows / Linux OS.
Regarding configuration, it’s relatively strait forward with some extremely basic maths to workout approximate user capacity per VM as a ballpark number to aim for pending a POC where you can firm things up. These are very general guidelines, and there are a lot of variables to consider, but on average, you should be aiming for 20 - 25 concurrent users per vGPU enabled RDSH VM (sometimes it’s slightly more, sometimes it’s slightly less), and 2 VMs per T4. This is really going to depend on your applications, how the users use those applications, platform and system hardware including hardware generations and overall optimisations etc etc. If we take the lower of those two numbers (which combined is 40 (2 VMs each with 20 users)) as a working number, we can workout how many GPUs we’ll need to support 1000 concurrent users, and by default, how may servers we’ll need to support X number of GPUs.
1000 (Users) / 40 (Users Per GPU) = 25 (GPUs Required)
(This bit would be handy to know your server hardware and generation, so I’m just going to use the current Gen10 …)
HP DL380 G10 will support 5 T4 GPUs ( https://www.nvidia.com/en-us/data-center/tesla/tesla-qualified-servers-catalog/ ).
25 (GPUs) / 5 (The capacity of a DL380 G10) = 5 (DL380 G10 each with 5 T4 GPUs, each supporting 40 users)
So what this means, is that if you’re able to support 20 users per RDSH VM, and get 2 of those on 1 T4, then you’ll want 25 T4 GPUs, and 5 DL380 G10 servers to support it.
This scales up / down either way. If you can get 25 users on a VM, that’s 50 users per T4, long story short, you only need 4 DL380s instead of 5. If you can only get 4 T4s in your existing server hardware, then you’re going to need more DL380s etc etc … And don’t forget your N+1 for resilience and maintenance windows, or every time you want to work on a Host, you’ll be doing it out of hours, or removing it from service and impacting your total usable capacity … ;-)
FYI, your limiting factor will be vGPU Framebuffer, as you’ll split the T4 into 2 8GB vGPU Profiles and run the RDSH VMs with 8GB, and you’ll be using the T4-8A vGPU Profile.
Now, as you’ll also have single user desktop VMs, as well as your 3D Workstation VMs, what I would do is run all of your RDSH VMs in a dedicated vSphere Cluster, that way there are no vGPU Profile issues and it makes scaling, migration and management really easy. Then create a second vSphere Cluster for your single user VMs that have a different vGPU Profile (Probably T4-2B), and depending on how many 3D Workstations you need you may even want a 3rd vSphere Cluster to support that vGPU Profile. It is possible to manage that from a single Cluster using the breadth first / depth first configuration settings in vCenter (Performance vs Density), and if you wanted to do that, you’d be constantly running in “Density” mode, so up to you how you’d like to configure it.
The reason it can be better to split out the vGPU Profiles into dedicated Clusters, is that you can’t mix Profiles on the same physical GPU. So if a VM with a particular profile starts up on a GPU, only other VMs with the same Profile can use that specific GPU. Despite a lot of customer frustration, this does make sense. For example, a vPC workload assumes a different workload to QvDWS or vApps, so why would you run those on the same GPU, or in some instances, even the same physical server? We also need to remember, that this isn’t about cramming as many users onto a single physical server as is humanly possible, this is about giving the best user experience whilst still delivering a cost effective solution. If anyone wants to cram users on to the platform to the detriment of the user experience, then simply remove the GPU from the system altogether and you can do just that.
Right, I’ve rattled on for long enough. I hope at least some of that is useful. As said, recalculate some of those numbers and apply what’s useful to your own environment and hardware. Let me know if you’d like more detail about anything …