XenDesktop 7.1 VDI slow graphic performance with 3D Applications

We have a K1 GRID Card installed in a PowerEdge R720.
We’ve configured the vGPU and pass-through appears to be working correctly.
However, when we run Solid Edge ST4, the rendering is choppy when we try to rotate the image.
On a desktop, it’s smooth.

What applications are performing slow?
Solid Edge ST4 (Basically any program that uses 3D Graphics)

What versions of XD/XS are you running?
XenDesktop 7.1
XenServer 6.2 (Build 1377)

Are you using pass though or vGPU?
vGPU (1GB Profile)

What vGPU driver version?

What NVIDIA Driver version in VM?

What client are you using for the passthrough instance? If you are running from a XenDesktop 7.1 VM instance, normally you’d have a XenApp host configured to tap into if you are using passthrough mode or you would otherwise be assigning the full passthrough of one of the K1 engines to just one XenDesktop VM. Maybe you can go into a little more detail on your configuration. Also, if you run nvidia-smi.exe on the Windows instance through which the passthrough is supposedly running, do you see a load on the K1? ANd the rendering, is this with OpenGL, DirectX or what exactly?

It sound’s like the connection the possible cause, but lets eliminate a few other things first.

Are you seeing the same choppy performance with passthrough as with vGPU?

Can you report on the framerate inside the VM?

What workstation card are you comparing it to?

Remember the GPU on a K1 is equivalent to a K600, which is the entry level to the current Quadro range. If using a K140Q profile, with 4 users on the GPU, that’s 25% of the card you’re making available.

What’s the load on CPU and it’s clock speed?

We’re using the Citrix Receiver to connect to the VDI Session.
Currently, we’re using the K140Q Profile.
One end-user tries it on the VDI Session and on his Precision T1700 with a nVidia Quadro K600.
Another is using a Vostro 220 with a nVidia GeForce 8400GS.

The first user is in Colorado while the second user is in the same building as the server.

We have Dual 10-Core E5-2680 v2 CPUs and are only using less than 20% of the total resources.

As far as the nvidia-smi, the results are below.

C:\Program Files\NVIDIA Corporation\NVSMI>nvidia-smi.exe
Thu Oct 09 11:03:39 2014
| NVIDIA-SMI 332.83 Driver Version: 332.83 |
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| 0 GRID K140Q WDDM | 0000:00:0B.0 Off | N/A |
| N/A N/A P8 N/A / N/A | 915MiB / 959MiB | N/A Default |

| Compute processes: GPU Memory |
| GPU PID Process name Usage |
| 0 708 Insufficient Permissions N/A |
| 0 3400 Insufficient Permissions N/A |
| 0 2440 C:\Program Files\Solid Edge ST4\Program\Edge.exe N/A |

C:\Program Files\NVIDIA Corporation\NVSMI>

You have to run nvidia-smi at the XenServer console for vGPU sessions, not inside the VM itself.

Can you also measure frame rate inside the VM?

@Jason: But for GPU passthrough, you do have to run nvidia-smi.ece on XenApp it it is hosting the service, right? It was confusing (at least to me) to properly interpret what Dave’s configuration looks like. I thought for some reason he had a XenApp server set up in GPU passthrough mode. @Dave: Are you just running a VM on XenServer as either a vGPU or GPU passthrough instance?

Hi Tobias,

Yes, for passthrough you use nvidia-smi inside the VM, for vGPU it’s at the host console.

Dave described his environment in his first post, it’s XenDesktop with vGPU. It’s also reinforced by the K140Q profile and “Insufficient Permissions N/A” reported in the nvidia-smi report above.

Thanks everyone for your responses.
Can anyone recommend a program/utility that I can use to test the FrameRate in the VDI Desktop?

Below is the nvidia-smi report from the XenServer Console.

[root@ie-xen01 ~]# nvidia-smi
Fri Oct 10 08:10:55 2014
| NVIDIA-SMI 331.59 Driver Version: 331.59 |
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| 0 GRID K1 On | 0000:44:00.0 Off | N/A |
| N/A 42C P8 10W / 31W | 9MiB / 4095MiB | 0% Default |
| 1 GRID K1 On | 0000:45:00.0 Off | N/A |
| N/A 42C P8 10W / 31W | 3884MiB / 4095MiB | 0% Default |
| 2 GRID K1 On | 0000:46:00.0 Off | N/A |
| N/A 31C P8 10W / 31W | 9MiB / 4095MiB | 0% Default |
| 3 GRID K1 On | 0000:47:00.0 Off | N/A |
| N/A 35C P8 10W / 31W | 265MiB / 4095MiB | 0% Default |

| Compute processes: GPU Memory |
| GPU PID Process name Usage |
| 1 20635 /usr/lib/xen/bin/vgpu 960MiB |
| 1 16295 /usr/lib/xen/bin/vgpu 960MiB |
| 1 18106 /usr/lib/xen/bin/vgpu 960MiB |
| 1 24440 /usr/lib/xen/bin/vgpu 960MiB |

@Dave: We use various Unigine routines, in particular, the "Heaven" benchmark.

Hello Dave,

Some things to explore besides looking at the framerate and passing through the whole GPU for comparison as pointed out above:

Looking at the system requirements for Solid Edge ST4 and considering that you report one of your users getting a good local experience using a K600 card you should be getting a good remoting experience with the setup you are using now as long as you only have 1 user connected per GPU. It is however on the low-side of the recommendations. When you later scale out to 4 users simultaneously hitting a K1-GPU (16 users per board) I suspect you will be noting some performance issues as you then will be sharing the cores as well as the memory. Considering the system requirements for Solid Edge ST4 (high processing/memory ratio) I would probably have tried slicing a K2-GPU into 8 vGPUs instead (K220Q profile). Just something to keep in mind in case you do note a performance degradation later on when you scale up. If not, then great!

You report a low CPU-usage, but is that on a VM-level or host-level? If you have hyperthreading enabled on the host (recommended), haven’t configured the VM with 4 vCPUs (inluding enabling Windows to make use of them by setting the cores-per-socket parameters) and/or haven’t disabled Aero by policy, it might be a CPU-related performance issue as well. (Disabling Aero is also a Solid Edge recommendation when working with View and Markup or Solid Edge Viewer.) Note that you might be able to use less vCPUs per VM without limiting the performance for Solid Edge ST4 if the users aren’t using a lot of other processes simultaneously, but you should start testing with 4.

You also state that one of your users is connecting from the same building, which I assume is by cable over your LAN. Is that correct?

One more thing: You mentioned that the remoting experience is "choppy". What Citrix policies do you have applied at the moment?

You will be able to get a great remoting experience with Solid Edge so don’t give up.

I have been working with Dave on this issue. We have built a fresh VM with 4 VCPU cores (I made the cores-per-socket change in Xenserver to get them all present). This does result in SolidEdge loading faster.

We have setup this VM to use the 1 GB profile as well as using GPU pass through. This has made no difference (in fact, the entire VM seems worse than before).

As for Citrix policies, we have various printer policies set and the following: https://www.dropbox.com/s/o6x09k3xnvemqoh/Capture.JPG?dl=0

Hi Michael,

Quick reply while on the run:
That’s good to hear. Getting a bit closer.

When switching between passing through vGPU and the entire GPU remember to update the driver. You can find both here, under GRID: http://www.nvidia.com/Download/index.aspx?lang=en-us

However, looking at the current policies I would start by changing those. The exact policies you should apply depends on your use cases and may vary between user groups, wan, lan, peripherals such as 3d mouses etcetera. The Citrix edocs is an excellent resource for finding out which policies that are working together or oppossing eachother and in what cases they are applicable.

As a quick test try to
Uncheck: color compression to unconfigured/default
Edit: audio - medium, display memory - 131072, framerate - 30 (or higher if you have the bandwidth)

I’m only able to use my phone right now and can’t review it properly but this should give you a better baseline than what you currently are using, before starting to optimize. How does that work to you now?

Also, don’t use a thin client as a test device if you are. They will often require a bit of optimization as well before working properly (and may not at all in some cases), depending on the model/specs. Your flash setting somehow made me think of it. If you’re using a ‘normal’ PC or better (with the latest Receiver installed) then disregard this last note.

Quick note, also:
Desktop composition redirection - disabled, Visual quality - high, dynamic windows preview - disabled

Hi Dave,

To capture framerate you should be able to see this in the application itself. I’m at a conference this week so don’t have a system in front of me to look but I’m sure a colleague or forum member could check on this. Alternatively look at something like FRAPS (free to download).

Looking at your nvidia-smi output I can see no load on the GPU that has the vGPU’s associated with it ( 0000:45:00:0). This could be for two reasons.

First and most likely that the VM’s didn’t have anyone using Solid Edge whilst you took that measurement.

Second, the applications isn’t using the GPU. This is unlikely though possible, some applications do disable hardware acceleration when they see insufficient resources and it may be the case here. Need testing though.

As has been raised above by a couple of posts, a K1 in passthrough really would be the minimum starting point for this application so vGPU on the K1 is possibly not suitable and you’'d fare better with it either in Passthrough, or using a K2 with vGPU. I’d strongly recommend checking performance with a passthrough GPU on the K1 card.

If we can see good framerates in the VM, then we can dive into the protocol. I have an old post on the forum linked below that suggests a Citrix policy for a starting point and it includes the points Tony referes to above.


Take a look at all of this and let us know how you get on.

I applied the policy settings here and also got assistance from Jared with NVidia directly. We isolated the issue down to Solid Edge not using the GPU so we are working on that. I would like to thank everyone for their assistance.

Quick update on SolidEdge not using the GPU for anyone else who’s seeing this problem.

Siemens added the Grid card(s) to AutoConfigure.txt file in ST7 so if running a previous version SE will more than likely default to Backing Store (no graphic acceleration).

However, the users should be able to manually configure the application display setting to select Graphics Card Advanced. You could also set this in the gold image when setting up your VDI environment.

To change

Application Button -> SolidEdge Options -> View -> Application Display ( Disable automatic selection, choose Graphic card driven Advanced)