Please don’t use 2.1GHz for AutoCAD. Yes it will load and run etc, but the user experience will usually be poor when scrolling in and out, manipulating designs and objects. These CPUs are for entry level workloads, typically classed as “Digital Worker”, not CAD.
Forget about Turbo, there are way too many caveats to get it to work consistently and properly in a virtualised environment and just because the boost clock will go to 3.2GHz, doesn’t mean your users will always get that if or when it the CPU decides to boost. On a physical workstation with no hypervisor, sure, but on a virtualised environment, whole different thing. Then you’re into “All Core” vs “Single Core” boost and all sorts of other variables that make it deliver inconsistent performance. A far simpler approach is use a CPU with a faster base clock and forget about Turbo. The users will have a far more consistent experience, as the minimum they’ll be working with is 3.xGHz.
Either of these two CPUs are fine, however, personally I’d go for the current version (Gen 2)
Gen 2 - 18 Core 3.1GHz
Gen 1 - 18 Core 3.0GHz
I typically use the 18 Core variants because this gives more flexibility per Host and provides the best balance of Cores vs Clock (there are obviously other CPU variants with either higher Clock or Core count for more specialised use cases). With most OEMs supporting 6 > 8 T4s per Host, having 36 physical cores will allow you to provide the best experience and give you the most headroom for additional density (It’s far easier to add additional GPUs than change out the CPUs when you need additional capacity!).
3.xGHz is a bit of a waste for typical Digital Workers, but, you need to look at the overall bigger picture for the hosting environment and whether you want to run RDSH and CAD Workstations on the same physical Hosts. It would be a slightly unusual configuration, but is completely possible and yes you can do it as the Workstations and RDSH VMs would run on different GPUs. Use the software to carve up the platform, not the hardware.
Regarding how you support the RDSH and CAD VMs, you can do this in a couple of ways. You can either have two different CPU specifications in your hosting environment (one specification running 2.1GHz and the other running 3.xGHz CPUs) for each workload, or you can have a single specification environment where all hosts are the same and you control performance through the Hypervisor (as mentioned above, again using the software layer, not hardware). My preference would be the single specification as this gives you more flexibility in terms of resilience, migration, maintenance and manageability and it gives you a single price for scaling out the environment. You may or may not be using it, but this is something that Hyperconverged is really good for as everything comes in a single Node. Yes, Hyperconverged is overkill for just this customer, but looking at the bigger picture if you have multiple customers …
For the RDSH VMs, allocate each of them the 8A vGPU Profile (8GB of FB) from the T4 (1x vApps license is required per CCU) and change the vGPU Scheduler to Fixed. Changing the Scheduler is really important for the RDSH VMs. Ensure that “GPU Consolidation” (named differently depending on the Hypervisor choice) is configured so that the RDSH VMs all end up on the same GPU. The alternative to that setting, depending on your Hypervisor choice, is that you can even select which vGPU Profiles run on which specific GPUs. For example, this means that you could have two T4s that will only run 8A vGPU Profiles, and all other T4s run other vGPU Profiles (except 8A). What this does, is let you configure a different Scheduler (Best Effort or Fixed) on specific GPUs, and then specific VMs (CAD or RDSH) will only start on those GPUs and have appropriate access to them. You wouldn’t want the CAD VMs running a Fixed Scheduler for example.
For the CAD VMs, do some homework first, it’ll save you a lot of issues in the long run. Find out what kind of monitor configurations the CAD users are currently running for a starter (1080P / 2x 1080P / QHD / 4K …) and whether there are plans to upgrade these in the future (if there are, factor that in now, not at a later date). This will at least give you an indication of which vGPU Profile to use initially. Unless they’re doing something crazy, you’ll more than likely be looking at between a 2Q or 4Q vGPU Profile (2GB or 4GB) (QvDWS license is required per CCU (for this specific use case, think of it as per Workstation, it’s easier)). If you’re unsure, go up in vGPU Profile size, not down! The reason for this is simple … If you go down in vGPU Profile size trying to cram on as many users as possible to hit your ROI, and then build out the environment with that specification based on a specific user density (defined by vGPU Profile size) and cost model, and your users suddenly start:
- Running larger models
- Have an application upgrade (AutoCAD 2018 > 2019 > 2020)
- Introduce additional applications that have not been considered or evaluated
- Have a monitor upgrade from 1080P to 4K
- Want to update Windows 10 to a newer version (Eg 1803 > 1909 > … )
- This list is not exhaustive …
… Then you now need to increase the vGPU Profile size by just one step (2Q > 4Q (on a T4)) because there wasn’t enough headroom in your original configuration. As a direct result, you now halve the density of your GPU (platform), meaning you now need to buy more physical servers, or you have to limit the customers in what they want to do, neither of which are good options. If unsure about monitor configurations, again, always go up, so configure for 4K, that way you know you’re going to be covered.
Something else you’ll ideally want to do is monitor utilisation of the CAD users existing workstations to see how much resource they’re currently using before deciding on an overall CAD VM profile. You can do this with great little tool called GPUProfiler ( https://github.com/JeremyMain/GPUProfiler/releases ). This tool was created by a friend of mine (Jeremy Main) who works at NVIDIA and who manages and updates it. It’s a portable .exe so no installation is required, just set the amount of time you want it to monitor for and you can export the results in .csv to check resource utilisation (it’s a fantastic tool). This will give you a pretty accurate idea of how much resources are being used. Run this on as many CAD users workstations as possible to get the best range of metrics. Don’t forget to factor in headroom when you configure your spec based on any results!
This is something I see a lot. Customers forget that if they get the vGPU sizing wrong by trying to save money at the POC stage, at best, it’s going to halve their server density in a production deployment when updates happen (which they will at some point). This is really important, and something that I hardly ever see talked about.
Also remember that SSD / All Flash storage is now a defacto standard with VDI / RDSH. Sometimes NVMe is required, but only for higher performance workloads. Typically (there’s always the odd exception), mechanical spinning disks are now considered for bulk storage only.
You haven’t mentioned which protocol you plan to use for the environment ? (HDX 3D Pro, Blast, PCoIP, TGX). You need to check with the CAD users about their peripherals (3d Space Mice, Tablets etc etc) and then make sure they are all supported with your chosen Protocol.
Remember, if in doubt, go up in vGPU Profile size. You can always scale down if it’s too much, but you can’t (easily) go up if you start too low.
Out of interest, which Thin Client solution are you planning to use?
The whole point of my comments above and how I personally work, is to try and mitigate future developments and changes as much as possible so you don’t have to upgrade specifications for the foreseeable future. If you do, it will typically cause issues in one way or another. A lot of things that will cause future issues can be mitigated if taken in to account at the beginning of the deployment. There are always things that catch us out, but the majority of things can easily be planned for. There’s nothing worse than a customer coming along and asking for more performance from a newly deployed platform, only for the system architects / engineers to realise they don’t have it without significant changes.
Apologies, I do tend to rattle on when I get going, but there’s a lot to consider and be aware of right from the start so the correct expectations can be set. When deployments under-perform or go wrong, those deploying it have a habit of blaming the technology which is rarely at fault, and it’s typically the person deploying it didn’t understand what they were doing. Not saying that’s the case here, just covering all bases :-)
Best of luck! Let us know if you need any guidance.