3D load impacts NVENC performance

AHB · April 11, 2017, 2:40pm

We are observing drops in the achievable XenDesktop frame rate when the 3D load increases. We can achieve 50 FPS using NVENC with little 3D load, however this drops to 15 FPS when the 3D load increase (such as due to increased model complexity or running something like Unigine Heaven demo).

The issue can be observed on a dual display XenDesktop by playing a video on one monitor then introducing the Unigine Heaven demo on another.

We’ve isolated this to some sort of interference between the graphics processing and hardware encoding units on the vGPU. As the 3D GPU utilisation increases the hardware encoder utilisation drops as does the frame rate. The GPU is nowhere near maximum load and should have plenty in reserve.

Tech docs indicate the hardware decoder performance should not be affected by the CUDA load with some exceptions such as temporal AQ. However there is clearly significant interference occurring.

Setup is Tesla M60, XenDesktop, Windows, vGPU, vSphere/ESXI everything recent at time of post. Dev/Test network with little load.

This seems to be the last barrier preventing us from achieving near bare-metal performance for dynamic 3D model visualisation over NVIDIA Grid/XenDesktop.

Having reviewed the docs, some possible theories/strategies I’ve come up with include:

Frame buffer reading is being held up by slower redraws
This is a software/driver issue and we should report it
vGPU Frame Rate Limiter setting might help
Temporal AQ is being used and is impacted by CUDA load.
Triple buffering might help??

Any assistance is appreciated.

AB

AHB · April 11, 2017, 10:59pm

Looking into this further we’ve noticed that the performance is initially high at 60 FPS then after a short period (10s or so) of viewing a more complex part of the model (3D City model in this case) the performance drops dramatically to around 20 FPS.

After scrolling away from the complex area of the model (i.e. the CBD) to a less complex part of the model (suburbs/rural) where there is less 3D complexity the performance returns to 60 FPS.

Previous theories are probably invalid. Now looking at:

Cooling setup
Power management in particular correct BIOS support and fan control (this is a somewhat cobbled together Dell eval system)
Software/Drivers

We’ll instrument the Tesla to see what insights we can gain.

sschaber · April 12, 2017, 7:26am

Hi AHB,

which FPS do you mean in your tests? Is it the FPS within the application or the session?

Regards

Simon

AHB · April 13, 2017, 2:28am

FPS is as reported by Citrix HDX Monitor which is polling the virtual machine from the native client OS.

Problem is not temperature - Tesla is running cool at around 44C.

Problem appears to be that the M60 hardware encoder is dying when the 3D load increases. When the vGPU load passes around 35% the hardware encoder utilization drops sharply from 20% to below 10% and the FPS plummets. Encoder utilization starts climbing again when vGPU utilization reduces below around 35%.

All data are as measured by GPUProfiler v1.04.

AHB · April 17, 2017, 8:28pm

There looks to be two issues that are contributing to the visual degradation during the 3D model flyover:

Reduction in frame rate under 3D load
Visual jitter/flicker/tearing of 3D primitives within the model (e.g. a structure such as a column in a building).

I’ve run Unigine Heaven demo at ‘High’ quality on 1920x1200 single display and the FPS ranges from 30 to 60 and is generally close to the native FPS value within Unigine. Display quality is generally very good with none of the tearing/jitter we are seeing with the 3D city model in TerraExplorer.

I’ve investigated the following settings but none have resulted in any noticeable improvement:

Disable Aero theme on client/VDA ends.
Disable Off Screen Surfaces (using the .ini file setting on the receiver client)
Increase Display Memory Limit to max (using policy)
Set NVIDIA Frame Rate Limiter setting to 30 FPS (using registry on VDA virtual host)
CPU/RAM - VDA host has 8xCores, Xeon E5-2667 V4, 3.2/3.6 GHz, 32 GB RAM, Tesla M60, client has dual Xeon, Quadro M2000, stacks of RAM - plenty of grunt here.

Further things to investigate:

Storage bottleneck (check virtual and native storage setups are equivalent)
CPU Only Encoding (Disable NVENC in policy)
Disable HDX 3D Pro (re-install VDA without HDX 3D, depending on outcome above, mightisolate issue to frame buffer capture)
Passthrough GPU (isolate to vGPU)
Linux Client (isolate to Windows Receiver)
System Display Memory (unlikely, but worth a check)
Legacy Graphics Mode (clutching at straws)
Alternate card - M10, M4000, M2000 (more straw clutching)

AHB · April 21, 2017, 11:16am

Thanks for the advice Martin, the references you provide give us some other things to look at.

FPS issue is not in the application as the native performance is seamless.

We’ve established that configuring CPU encoding results in significantly improved performance (> 30 FPS from memory).

We are pretty sure this is a hardware encoding issue with one of:

XenDesktop VDA
NVIDIA GRID Drivers
Hypervisor (ESXI)

NVIDIA Australia are looking into it with us.