Performance issue about GPU Passthrough with XS 6.5 and XD 7.6

I have set up a PoC server, passthrough the K1 card to VM, but the performance is so poor, running Unigine Valley only get about 1/4 FPS of my laptop (equip with GTX 850M).

The barematal:

cpu: E5-2620
memory: 32GB
motherboard: Supermicro X9DRG-HF
GPU: Grid K1

VM:

vcpu: 4 core in a socket
memory: 4GB
GPU Driver: 332.76
OS: Windows 7 Ultimate 64bit
VDI: XenDesktop 7.6 VDA with HDX 3D pro

Any advice will be greatly appreciated!

I think that is expected.
Grid K1 is low-level (and high priced) card (4x entry level Kepler GK107 very like "GT 630"/"Quadro K600"/"Quadro K1000M"). Add also virtualization (time-shared units with frame rate limiter if you use vGPU instead passthrough), encoding, power management and other penalties.
GTX 850M has more (Maxwell (newer architecture) GM107 - 3x more units, fasters clock for core).

http://www.techpowerup.com/gpudb/1699/grid-k1.html
http://www.techpowerup.com/gpudb/2538/geforce-gtx-850m.html
http://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units
http://www.videocardbenchmark.net/compare.php?cmp[]=2617&cmp[]=2859

Hi, mcerveny

Thanks very much for reply, it help me a lot!

Hi

What are the spec’s of your laptop in comparison? (CPU, RAM, disk, screen resolution) and what FPS are you getting? (Hi / Low (it says at the end of the benchmark))

Have you updated the Firmware on the Hypervisor host?
Have you tuned the BIOS for Maximum Performance?
Have you tuned XenServer for performance?
Are there any other VMs running on the XenServer host?
How are you connecting to your VM to run the benchmark?
What’s your network speed (to desk + backend)?
Are you connecting over a LAN or WAN?
What FPS are you getting (Hi / Low)?
What are you using for storage?
What screen resolution is the VM running?
Any Citrix policies applied?

The K1 is capable of running that benchmark, but there are limitations and also things you can do to make it perform better. It’s not just a case of throwing a GPU in a server, assigning it to a VM and away you go. There are more variables to consider to get the best out of it. In order for us to help, you’ll have to give us a little more information than you have, hence all the questions above ;-)

Regards

Ben

Hi, Benji

Thanks for you reply!

The spec’s of laptop:

CPU: Intel Core I7-4710MQ
RAM: 8GB
disk: WDC WD10SPCX 7.2K SATA disk
OS: Windows 7 Ultimate
screen resolution: 1920 * 1080 with full screen mode (the same as the VM)
  1. Have you updated the Firmware on the Hypervisor host?
    No, would the BIOS be a issue?
BIOS Information
        Vendor: American Megatrends Inc.
        Version: 3.0b
        Release Date: 01/02/2014
        Address: 0xF0000
        Runtime Size: 64 kB
        ROM Size: 12288 kB
        Characteristics:
                PCI is supported
                BIOS is upgradeable
                BIOS shadowing is allowed
                Boot from CD is supported
                Selectable boot is supported
                BIOS ROM is socketed
                EDD is supported
                Print screen service is supported (int 5h)
                8042 keyboard services are supported (int 9h)
                Serial services are supported (int 14h)
                Printer services are supported (int 17h)
                ACPI is supported
                USB legacy is supported
                BIOS boot specification is supported
                Function key-initiated network boot is supported
                Targeted content distribution is supported
                UEFI is supported
        BIOS Revision: 3.11
  1. Have you tuned the BIOS for Maximum Performance?
    Yes! I disabled the Hyper Threading too (whether disable or not, all with no luck)

  2. Have you tuned XenServer for performance?
    Yes, I also tuned the Turbo Mode on

  3. Are there any other VMs running on the XenServer host?
    No, only 1 VM running on the XenServer

  4. How are you connecting to your VM to run the benchmark?
    Citrix Receiver on the laptop, XenDesktop 7.6 VDA + HDX 3D Pro on the VM side

  5. What’s your network speed (to desk + backend)?
    1GB Ethernet

  6. Are you connecting over a LAN or WAN?
    LAN

  7. What FPS are you getting (Hi / Low)?
    On the laptop, I get 34.1 FPS, and on the VM there is 9.1 FPS (I disabled the FRL)!

  8. What are you using for storage?
    Intel SSDSC2BB12, I also check the IO load on the Dom0 when benchmark running

Linux 3.10.0+2 (localhost)      04/10/2015

    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
               0.26    0.00    0.44    0.01    0.11   99.17

    Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
    sda              39.78      1313.05       836.88    5352310    3411323
  1. What screen resolution is the VM running?
    1920 * 1080, with full screen mode

  2. Any Citrix policies applied?
    Yes, I googled for advices, changed none policy to the following but all with no luck

Desktop Composition Redirection:  Disabled
HDX3DPro quality settings: 6553680 ( I changed the min and max value, but doesn't take effect)
Lossy compression level: None
Lossy compression threshold value: 10240 Kbps
Minimum image quality: Very High
Moving image compression: Disabled
Queuing and tossing: Disabled
Target frame rate: 60fps
Target minimum frame rate: 20 fps
Visual quality: High

Valley-Win7-Local-GTX850M.html (2.68 KB)
Valley-Win7-XS65-HDX-K1.html (2.86 KB)

Hi Weizhang

Ok, the laptop has a much better spec than your VM:

*CPU - 2.5GHz compared to 2.0GHz (2.0GHz is very slow these days) (Yes, I realize there’s more to it than pure MHz, but this should not be overlooked)
*RAM - 8GB compared to 4GB
*GPU - GTX 850 compared to K1 Passthrough

And to top it all off, it’s all running locally without a Hypervisor in the way.

Unless otherwise advised by the server hardware vendor, NVIDIA or Citrix, yes, you should absolutely be running the latest firmware / BIOS.

The Hypervisor relies on the BIOS being configured appropriately, otherwise it cannot make use of any of the performance or specific features. Firmware should be up to date for functionality. When tuning the BIOS for performance, don’t forget the cooling. If not configured correctly, this can throttle back the overall performance on some servers.

Personally, I always leave Hyperthreading enabled and have always had positive results. Although there are some recommendations out there to disable it, but this is application specific and you should check with the application vendor about which is best. But certainly for Unigine benchmarks, leaving enabled will be fine.

1GB to desk is ok, nothing less than that though… and make sure it’s hardwired for consistency.

So the FPS generated by the VM there is pretty low. Is that a maximum FPS?

If you’re running Passthrough, disabling FRL won’t make any difference, this is only for vGPU profiles.

Ignore the FPS for a second… What did the benchmark actually look like when you ran it on your laptop? At over 30FPS, I’m guessing it ran quite smoothly and looked ok? So why the need to try and run it at 60FPS on the VM? What I’m getting at here, is that just because there’s an option to run it at 60FPS, doesn’t mean it needs to be run that high. Don’t focus on the numbers, focus on the quality of what you’re seeing and the user experience and whether it’s good enough.

You can lose (disable) most of those Citrix policies, as with your current configuration they won’t really be helping. You’re looking to conserve bandwidth, not try to consume it with 60FPS :-)

When you run the Unigine benchmark, what settings are you configuring? Do you run it as default, or do you configure the quality? I’d be expecting around the 20 – 25FPS without much tuning (although differing hardware setups will vary those results) and although not exactly “amazing!”, is much more watchable than your 9.1FPS will be :-) Don’t get me wrong, it will never be in the same league as a GRID K2 (which will push FPS strait into the hundreds on this particular benchmark) as they are designed for completely different things, but as a basic GPU, if used for the correct tasks, is certainly adequate.

Regards

Ben

The target FPS has no bearing on bandwidth, unless you’re actually achieving that level in the VM. Where Weizhang is only achieving 9fps it will only be transmitting 9fps.

What that level of setting allows for is.

  1. Ensuring that what the VM is generating is what the end user see’s as it’s the same as the FRL and the max the protocol allows.
  2. It helps to mitigate lag with software cursors. 60fps is around 15ms, 30fps around 33ms, 10fps is 100ms.

So there’s good reason to set that value there.

The policy looks like one I posted here, and on the Citrix forums (where there’s explanation of the reasoning for each) and there’s a large amount in there for fallback purposes in the event a user connects in legacy mode due to client capability.

The key values though are

DCR - Off
Visual Quality - High
Target FPS - 60

Though none of those will affect benchmark scores. The Unigene benchmark will be affected by CPU clock speed (it’s single threaded) and GPU resources.

What applications are going to be delivered? Unigene Heaven and Valley don’t reflect typical enterprise usage, so I’d suggest picking a more appropriate benchmarking tool, and a set of measures that more truly represents the intended use case.

Hi, all

Thanks for all your kind help!

The 9.1 FPS is the average FPS, and I used the default configuration when running Valley, maybe the resolution too high (1920 * 1080) to get the better user experience and I overused the K1!

We propose to delivery a app which would handle about one million triangular facets (the human 3D model), and maybe need a more powerful GPU, would the K2 or K5 be the choice?

Jason - That’s fair enough, I stand corrected. Every day’s a school day :-)

That’s a good point, if the GPU’s not hitting the desired FPS, then at this stage the bandwidth has nothing to do with it. I overlooked that bit, and was thinking of the issues I’ve experienced with the K2 and bandwidth.

Weizhang - As Jason mentions above, it’s best to test with applications that are going to be at least similar if not the actual application that is going to be used. Otherwise you can get caught up in chasing things that bear no relevance on production usage. If you’re going to be delivering anything like the human 3D model, you’ll absolutely want the K2. Although try it with the K1 so you have a reference between the 2 cards (Have a play with this: http://www.nvidia.co.uk/coolstuff/demos#!/lifelike-human-face-rendering). You may also want to revise your server specs if that’s possible at this stage…

1920x1080 is not too high for the K1 if you use it for its intended purpose, but the K1s purpose is not to run benchmarks like that, that’s the K2s job ;-) That being said, the screen resolution plays a massive part of the overall requirements and experience (drop Unigine Valley down to 1024x768 and see the difference). What screen resolution do you plan to run in production?

Regards

Ben

Hi Benji,

It’s a pity that the app is on developing, I need to ensure that this platform could fulfil our performance demand.

I will try the new benchmark tools, and thanks for you kind help:)

Sorry for repeat reply :)

No worries, happy to try and help where I can :-)

If you plan on running 3D models though, I’d definitely look at a revised server spec at this stage, even if it’s just to rule it out, it’s good to know what all the options are before production.

Let us know how you get on with your testing…

Regards

Ben