Problems with perfomance vGPU (Dell R730, VMware ESXi 6.7, Tesla P40)

Hello.
Acquired NVIDIA Tesla P40 graphics card for use in virtualization.
Dell R730 Server, E5-2667v4, all-Flash, VMware ESXi hypervisor, 6.7.0, 17098360
We registered and received a 90-day trial license, installed the NVIDIA-GRID-vSphere-6.7-450.89-452.57 drivers. The licensing server runs on Windows Server 2012R2 x64 (License Client Manager version 2020.05.0.28406365 x64).
The problem is this:
Regardless of the selected profile (1…8q), the performance does not change in tests and in operation. The main work is related to CAD applications, performance measurement is carried out using the Red Turbine Demo and Solidworks 2020 Performance test. Are there performance limitations in the trial license? If there are not, how can we properly measure the performance for each profile?

I’m assuming you’re using this in VGPU mode, can you cofirm what EUC you’re using? Horizon/Citrix?

Also can you confirm that

  • Your vm is starting with a HW gpu (check vm events in vcenter)
  • You see the hardware for the video card in the vm
  • you've successfully checked out a license

Also will note in testing once i got to a suitable profile (2b), using a better profile didn’t get me better performance, though to be fair i didn’t do synthetic testing, just real world loads that my users would hit. To make sure though i did remove the gpu to see what a non accelerated load would perform.

Last thing I’d note, double check the vGPU guide, they qualify certain vgpu/OS/Hypervisor combos, that if not matched will give you bad performance.

  1. We are using Horizon Client latest version for Windows.
  2. All conditions are met, vgpu is included in the equipment, the license is obtained by VDI.
    Maybe there are some recommended benchmarks to make sure vgpu is working properly?

Ah, not sure on benchmarks, I used real testing to get an idea of load per user and monitored that by directly monitoring gpu and framebuffer usage and scaling for our load.

As for horizon, can’t help much there, FWIW we tried horizon before we went with citrix, as deployment seemed much simpler, and licensing would’ve saved us 30%, but we found vgpu performance in a greenfield deployment following all docs to a T to be the same as pure cpu.

We’re sure there was something wrong with our setup but after spending for a vmware suggested consultant and them confirming our setup looked perfect, they referring us to support. Vmware support then took ~6 weeks of throwing darts and requested we rebuild and collect diags before we finally asked for a refund and tried Citrix VAADS. With VAADS we were up and with great performance inside of a day.

Hi

1Q >> 24Q vGPU Profiles all have identical performance, assuming the default (Best Effort) Scheduler is used and you are not Framebuffer limited. Unless you are Framebuffer limited, increasing the amount of Framebuffer with a bigger vGPU Profile will not change the performance of the GPU or your application. If you start playing with the Scheduler, that’s a different conversation, but you already have the maximum performance available to you.

A far better test would be to use a relevant benchmarking utility (Redway Turbine is not) and use something from SPEC like their new SPECviewperf 2020 release. This is where you’ll clearly see the difference between all of the vGPU Profiles.

You’ve mentioned CAD as a main workload … CAD is typically CPU limited, if you require better CAD performance, you need newer hardware so that you can run better (newer generation) CPUs, as the Intel v4 are an old architecture. Your Clock Speed is ok (faster is better), but the age of the CPU is really holding it back.

If you can’t upgrade the Server to an R740XD / R7525, then there are a few things you can do to help improve things with what you have:

1: Upgrade your virtualisation software. ESXi 6.7 is pretty old.
2: Make sure your Server Firmware, BIOS etc are all fully up to date.
3: Make sure your Server BIOS is tuned for Performance, and make sure ESXi can control it. Lots of tuning guides out there for this.
4: Make sure the ESXi Host its self is configured for Performance - not Balanced or other.
5: Make sure your CAD VMs are running on SSD / NVMe storage.
6: Make sure your VMs are running the latest supported version of Windows 10 (2004 at time of writing, even though H2 is now available) (or Server 2019 (currently 1809) if running Server VDI) and that you’ve tuned / optimised that Operating System using one of the many tools out there.
7: Make sure you’re running the latest VMTools and vGPU Software.

As a starting point, you should be looking to configure your CAD VMs with the following Spec and optimize from there:

vCPUs: 4-6 Cores @ 3.0Ghz+ (3.0Ghz minimum - Faster is better, adding more Cores won’t typically help)
System RAM: 12GB (or higher depending on usage)
vGPU: 4GB & QvDWS License (4GB at least or higher depending on model size, screen resolution etc)
Storage: SSD / NVMe

If you do that with your current platform, you’ve pretty much maxed it out. Any additional performance will need to come from optimizing your CAD models to suit your hardware, or newer hardware.

Regards

MG