XenDesktop vGPU PoC Application Issues

I have just set up a Proof of Concept VDI for a customer, with the aim of utilising NVidia GRID vGPU, but I have had major application compatibility issues :(
My setup is

  • HP ProLiant DL380 Gen 9, dual 10 Core CPU, 128GB RAM, 4x300GB 15k SAS (about 550GB local storage)
  • NVidia Grid K2
  • Citrix XenServer 6.5
  • Citrix XenDesktop 7.6 (recommended patches applied to server components, and VDA)
  • NVidia vGPU Drivers for XenServer 6.5 - Windows Display Driver (341.08) and GRID vGPU Manager (340.57)

I have created one base desktop image configured for vGPU, and created a Machine Catalog. I then modified the base image for passthrough GPU, and created another Machine Catalog.
This gave me side-by-side comparison of vGPU vs. vDGA, with the vGPU configured with GRID K240Q profiles and the vDGA getting one of the GPUs on the card passed through.
With the vDGA machine, basically all of the software worked, which is all fine.
However, with the vGPU machine nearly anything that required OpenGL crashed in the NVOGLV64.DLL :(
The list of applications that don’t work is

  • 3DEqualiser4
  • Adobe After Effects CC 2014
  • Adobe PhotoShop CC 2014 (it ran, but with no hardware acceleration)
  • Adobe Premier Pro CC 2014
  • Autodesk AutoCAD 2015
  • Autodesk AutoCAD Architecture 2015
  • Autodesk Maya 2015
  • Hiero
  • Mari
  • MODO
  • Nuke
  • Silhouette
  • SolidWorks 2010
  • Toon Boom Harmony
  • Toon Boom Storyboard

I know that 3D acceleration is possible, as the Unigine Heaven benchmark works in both profiles (vGPU and vDGA) and in all rendering modes.
I really need some help to understand if

  • I have an issue on my setup
  • There is an issue in the NVidia VM driver
  • The applications just won't work until re-written to support vGPU environment

Most of the crashes take the following form

Faulting application name: AEGPUSniffer.exe, version: 0.0.0.0, time stamp: 0x53e05513
Faulting module name: nvoglv64.DLL, version: 9.18.13.4108, time stamp: 0x5452245c
Exception code: 0xc000001d
Fault offset: 0x0000000000d5fb10
Faulting process id: 0x1878
Faulting application start time: 0x01d049daf4c88fde
Faulting application path: C:\Program Files\Adobe\Adobe After Effects CC 2014\Support Files\AEGPUSniffer.exe
Faulting module path: C:\Windows\SYSTEM32\nvoglv64.DLL

Some of the applications created crash dump files, and analysing those showed the 0xc000001d exception (invalid op code) was caused my a AVX instruction (I think). My only thoughts are that the memory pointed in the instruction wasn’t correctly 16-byte aligned, but it would require more debugging than I have access to.

Any help/pointers would be greatly appreciated, otherwise vGPU is pretty much of no use to this customer :(

What CPU’s are you using? I’m going to make a guess at Haswell based?

Thanks for the reply Jason.
The CPUs are Intel Xeon E5-2650 v3 @ 2.30Ghz, so yes they are Haswell-EP.
Is there a known issue with these processors with vGPU?

I have experience the same issue, with the flowing setup:

NVidia Grid K1
Citrix XenServer 6.5
Citrix XenApp 6.0 (Windows 2008 R2)
NVidia vGPU Drivers for XenServer 6.5 - Windows Display Driver (341.08) and GRID vGPU Manager (340.57)

I manage to change the Windows Display Driver with 347.52-quadro-tesla-grid-winserv2008-2008r2-2012-64bit-international-whql.exe. After that, I do not get fault errors in nvoglv64.dll.

Similar issues have been reported since the XenServer 6.5 has been released, though that may be more coincidental with customers buying Haswell systems.

We have an updated driver package that should be released this week that has incorporated a workaround to address this issue.

Check our drivers download page later today for a new vGPU package for Xenserver 6.5, once you’ve downloaded and tested them, let us know if it resolves the issue.

Jason, you’ve made me a very happy man :)

I will do.
Thanks a lot for the update.

Preliminary Update

I have updated the XenServer driver and the Windows driver in the vGPU profile base image [NVIDIA GRID VGPU SOFTWARE RELEASE 340.78/341.44 WHQL], and initial testing has been 100% positive :)
The ones I quickly tested (it’s quite late here) are

  • Adobe PhotoShop CC 2014
  • Autodesk AutoCAD 2015
  • Autodesk Maya 2015
  • Nuke
  • SolidWorks 2010

All ran with 3D acceleration. So looking very promising!
I will do some more thorough testing in a couple of days, when I visit the customer’s site.

Many thanks again for the information, and the heads-up on the new driver release.

Full Update

I dialled back in and tested all the remaining applications on my "crash" list

  • 3DEqualiser4
  • Adobe After Effects CC 2014
  • Adobe Premier Pro CC 2014
  • Autodesk AutoCAD Architecture 2015
  • Hiero
  • Mari
  • MODO
  • Silhouette
  • Toon Boom Harmony
  • Toon Boom Storyboard

and all of them ran without crashing :)
So certainly my issue has been resolved by the driver updated.

On an aside: currently CUDA/OpenCL is not supported with vGPU mode. Is this a technical issue (hardware limitation, etc.), or driver limitation? There are a few applications that my customer is testing that do use CUDA/OpenCL for raytracing, etc., and while CPU is always a fallback, it would have been interesting to benchmark/compare CPU vs. vGPU to see what performance gains could be had.
Is there a roadmap to add support for CUDA/OpenCL to vGPU, or is there not enough perceived demand for it and just concentrating on OpenGL/DirectX visuals (rather than compute)?

Excellent, thanks for letting us know it’s resolved.

Onto the CUDA question.

First it’s important to understand that vGPU shares resources based on a scheduler, it doesn’t allocate blocks of CUDA cores, but you get allocated a "slice" of the clock schedule. This allows us to increase a VM’s clock time if the GPU is not fully utilised so giving users a bump in performance when other users aren’t using the GPU fully.

Now, when using CUDA you would essentially be sending code directly to the GPU and it will run until completion. If this exceeds the users scheduled time, it just keeps running and locks out the GPU resources for other users. Today there’s no mechanism to suspend or pre-empt completion of the code so not a good situation for multiple users sharing resources!

This is the reason why today CUDA support not available for vGPU, only for passthrough.

Is it being developed for the future? Absolutely, we’re keen to ensure that vGPU can offer identical capabilities to a passthrough GPU including use for GPGPU, and it is a roadmap item, though I can’t share timelines at present.

Hi Jason,

Thank you for the explanation on CUDA/vGPU.
Yes, I can see that a pre-emptive scheduler would be required to handle the correct allocation of resource between vGPUs in the case of CUDA. I had read up on the "time-slicing" of the compute cores to each vGPU (after I noticed that the vGPU was reporting all cores to the VM, not a subset of them, unlike the RAM allocation), but didn’t know how it was actually achieved.
I am guessing it is some kind of round-robin queuing in the dom0 driver? As in, it accepts graphics "operations" from each VM and then goes round executing "operations" from each VM queue in sequence that has a pending operation? Or is it strictly timed via the dom0? If so, what is the size of the time-slices used? Pure curiosity on my part, so if "secret sauce" I understand if you don’t want to say :) Interesting to know the effect that the VM "sees" as any time-scheduling will cause a "pulsing" in activity: most of the time nothing, then pulses of "full power" on the GPU. Must be fun ensuring that this doesn’t cause any adaptive timing issues in the VM :)

The scheduler is actually in the GPU hardware, Dom0 isn’t aware of it because it happens at the hardware level. It works in exactly the same way on vSphere as it does on XenServer, no hypervisor involvement in the GPU virtualisation at all.

When a VM boots and the vGPU profile is attached to the physical GPU it’s effectively given a minimum guaranteed slice of time, but if more is available it can be utilised. All done in hardware so it’s really fast.

There’s a lot of clever behaviour in the cards and driver that is there to smooth things out for the application, and we have the Frame Rate Limiter which prevents users experiencing wild swings in FPS when they’re the only user on a physical GPU.