I should start by mentioning I have cases open with VMware (the horizon view team, esxi team, vcentre team) and I just opened a case with nvidia enterprise support. I’m exhausting all options.
I have a vsphere 6.7 environment using the Tesla M10 GPUs in Shared Direct mode. I have the latest 410 driver release. I am using the 4Q profile for all desktops, and each desktop is configured with 16GB of memory (all reserved) 8 vCPU (as recommend by vmware for autocad) and paravirtual disks for better disk performance.
I’ve had lots of different issues, some related some not, but I will stick with the facts.
- Users are on a dual screen configuration (1920x1080)
- Some users use H264 decoding on their laptop for those that have a GPU that supports it, others do not
- Users are hardwired
- Users connect to the View environment via an IPSEC VPN Tunnel (traffic is UDP) using vmware blast, they connect to the unified access gateway, which connects to a nginx load balancer, which connects to the view connection server (There are two)
Users are reporting poor graphical performance and even general performance. Lag/delays in typing, poor performance in revit/autocad. It seems to have gotten worse after the initial deployment. I have been trying to reproduce some of the issues from my office but they seem fairly inconsistent. Users report the poor performance even in the evening when 10 people or less are on the system.
I’ve been reviewing esxtop to see if there is CPU contention, I am monitoring both sides of the VPN tunnel to see the traffic flow (we arent using any QOS but the local connection to the internet is only used for Horizon). I have been using the various nvidia-smi commands but GPU utilization always seems relatively low (50% or less). I’ve been monitoring the performance from within the virtual desktop as well. One curious thing I’ve noted is that the VM seems to think it has 2GB dedicated vram when the GPU profile should have 4GB.
I also have weird issues were I cannot power on the base image because it says insufficient graphics resources available to the parent pool, however when I try to increase the size of the pool I can provision a new VM no problem… and all the KB articles related to this error don’t seem to help me.
I honestly did not expect so many issues with this technology as I thought it was fairly mature by now, but clearly I must be missing something here?