Poor performance XenDesktop + NVIDIA GRID K1

Hello,

We’ve set up a PoC server with XenDesktop 7.6, XenServer6.5, nVidia GRID K1 and Windows 7 64bit.

We’ve installed Revit 2014 to test performance. We expected it to perform the same as our desktops which all use nVidia Quadro k600 cards. The performance is much worse. We ran RFOBenchmark on the desktops and on the XenDesktop and apart from the rendering aspect, the desktops got a much better benchmark.

The profile I’m currently using on the grid is K180Q, which should be roughly the same as a k600, right?

I’ve tried following the citrix policies guide on the forums here but the performance is still pretty low. I don’t think it actually made a difference.

Does anyone have any pointers, or places to start looking? What sort of benchmarks does anyone else get with Revit 2014 on XenDesktop?

The Virtual Desktop has:

2 Processors with 4 Cores (Intel Xeon E5-2667v2 @ 3.30GHz)
8GB RAM
K180Q GPU
30GB Hard Drive space (Local storage on xenserver with 10k drives. RAID1)

Our normal desktops:

1 core i7 @ 3.4GHz
8GB RAM
Quadro k600
500gb SATA III 7.2k RPM drive

I though Revit uses Cuda which isn’t leveraged unless you go via GPU passthrough mode as opposed to using a vGPU.

Did you run the benchmark with the Frame Rate Limiter disabled? (vGPU release notes, page 9)

vGPU profiles are all limited to a max of 60fps, even the x80Q profile, so any benchmark that uses frame rate as a factor in it’s final result will be affected by this.

Also, you’ve only given the VM 2 vCPU’s. Best practice is minimum of 4, you have to allow for the HDX encoding (ctxgfx.exe) which when expected to deliver 30-60fps down to the client will consume a whole vCPU. You’re effectively trying to run the OS and the applications on a single vCPU.

Since Revit is CPU intensive, give it more vCPU’s.

And finally… Though you have 10K drives, how many other VM’s are accessing these drives at the same time affecting the IO queue? Have you implemented any IO acceleration technologies to offset the fact that you’ll have multiple VM’s on the same drives? This will come into play much more when you move to multi vm testing and production usage.

Hi Guys,

Thanks for your replies.

Is that true regarding CUDA cores and revit?

I’ll have a try with the Frame Rate Limited disabled and see what it does.

I’ve actually given the VM 8 vCPU’s. It’s just presented as 2 sockets with 4 cores. Windows can’t have more than 2 sockets I thought? I tried to give it 6 sockets with 2 cores each but it still only displays 2 sockets with 4 cores.

We don’t have any IO acceleration at the moment, however we are running just one virtual machine at the moment to test the maximum possible performance, which is why we’ve been a bit startled with the results so far.

Even when I remote desktop to a machine with a k600 card, the experience is much better than connecting to the VM via XenDesktop. I’ve followed all the recommendations so I don’t think I’ve done something that could cause the performance to be so low. The majority of the use cases I’ve read and videos I’ve watched are all really positive.

Perhaps my expectations are too high? Would you expect one of the XenDesktop VM’s with the specs to be on par with one of the desktops? I know its only a GRID K1 card, but the models are only ~100mb and the desktops use a k600.

vCPU’s fine then, I mis read your config as just 2 vCPU’s not 2 with 4 cores.

Revit doesn’t use CUDA, so you don’t need passthrough. Though there would be no harm in trying it, just make sure you switch out the drivers.

The K1 in passthrough is the same GPU as the K600, but with more memory so the GPU performance will be similar. The K180Q effectively gives you the same performance as passtrhough, but with the flexibility of vGPU, however you do have to account for the FRL. XenDesktop/XenApp can’t currently send more than 60fps via HDX, but it does skew benchmarks…

So, in short if removing the FRL doesn’t bring your results back in line with the K600 powered workstation, I’d be looking at the rest of the configuration.

Hi Max

What’s your server hardware and generation? Are you running the latest BIOS and Firmware and have you fully optimized the hardware BIOS on the server(s) for "Maximum Performance" from the default of "Balanced" (CPU, Memory, Power Management and Cooling… Turn it all up to Maximum!)? In XenServer, have you modified it to "Performance Mode" and enabled "Turbo Mode"?

Performance Mode: /opt/xensource/libexec/xen-cmdline --set-xen cpufreq=xen:performance

Turbo Mode: xenpm enable-turbo-mode

Reboot XenServer after running both of these

Regarding I/O Acceleration, if you’re only running on 10k disks (even in a RAID1 with 1 VM), you’ll want to get your hands on something much faster (Yes, I did notice you’re standard desktop runs a 7.2k disk :-) ). Look to someone like Atlantis (Works very well on XenServer) or Fusion I/O (depending on your server hardware) (or both ;-) ) and start running stuff out of Flash, it makes a huge difference to the overall experience and evaluation kit / licenses obviously won’t cost much / anything.

Just out of interest, when you look at the benchmarks, does what you’re actually seeing correlate to the FPS the benchmark is telling you (does the FPS match the quality of the display)? Or are you seeing high FPS but the user experience isn’t matching it?

Have you tried other benchmarks and if so what were the comparable results with that of the PC? :

NVIDIA Demos - Download NVIDIA Tech Demos
Unigine Valley - UNIGINE Benchmarks
Unigine Heaven - UNIGINE Benchmarks
Redway Turbine (not Watch) - Redway3d - Reliable and versatile graphics engine for independent software

Many others are obviously available, but these are just really quick installable .exe so very fast to test. Also, install the same benchmarks on your physical PC to get a reference / comparison / baseline.

How are you accessing the VM? Are you using a Thin Client? PC? Laptop? What size screens and how many do you have? Also, what resolution are you running on your screens?

Probably the most important one of all this… What is your network speed (back end and also to desk)? Are you accessing your VM over a WAN? Is it Public or Private? Do you have any bandwidth optimization in place? (Citrix Branch Repeater or other). Any switches or firewalls restricting throughput?

Don’t under estimate your bandwidth requirements! This is one of the most common issues we are seeing. Generating high FPS and generally a great user experience is now pretty easy! (Thanks NVIDIA! :-) ). However, getting all that across the network (most of which were not designed with high end graphics, video and media in mind) is proving to be the most challenging thing, even with HDX or PCoIP.

A few things there to look into, hope that helps a little

Regards

Ben