Doubling Down: Further Experiments and Benchmarks with the NVIDIA GRID K1 GPU/vGPU

Some preliminary results of possible interest. The thought here was to see if doubling up a GPU and vGPU would be beneficial for a VM that needed an extra boost. The results were somewhat surprising.

These are additional benchmarks, run as before on a setup with a Dell/Wyse Xenith 2 thin client and NVIDIA GRID driver version 331.59. The VMs were running as Windows 7 under XenDesktop 7.1. The same Unigine Heaven 4.0 benchmark was utilized. For the original posting, please see https://gridforums.nvidia.com/default/topic/22/xendesktop-with-nvidia-grid/some-quick-benchmark-results-of-k140q-vs-k120q/

The original table looked like this:

.                           K140Q                                              K120Q       
                                                                              DIFF: K120Q
Type        Frame size  FPS: AVE  MIN  MAX  Score FPS: AVE   MIN   MAX  SCORE  vs. K140Q
----------+------------+--------------------------+---------------------------+---------- 
OpenGL       800x600        24.5  7.3  41.3   618     23.5   7.3  37.1   593   -4 %
DirectX 11  1280x800        13.0  5.6  22.5   326     13.0   5.5  22.4   327    0 %
OpenGL      1280x960        10.5  5.1  17.8   254      8.6   3.7  13.9   216  -15 %

Here are some new benchmarks, this time using in all cases as a front end a XenApp 7.5
server under Windows 2012 R2 fed to a XenDesktop VM hosted under XenServer 6.2 SP1 using GPU passthrough mode for the XenApp instance through an entire engine (one of four) of a GRID K1 (so 4 MB of GPU RAM). The XenApp server and the XenDesktop VMs all resided on the same physical Dell R720 server but each GPU/vGPU was installed on a separate engine on the K1. No other VMs were active during the time of testing. So, the application was run as a XenApp utility but windowed to the XenDesktop instance, as opposed to running natively on the XenDesktop VM itself.

Values below show the worst case performance (differences of the runs with the same parameters were generally within a couple percent of each other). The percentage difference compares the overall "scores":

.                                                  DIFF vs. K140Q VM
Type       Frame size  FPS: AVE   MIN   MAX Score  with vGPU alone
----------+------------+--------------------------+-----------------
GPU PT      800x600         25.1  13.0  41.2   632    +2 %
GPU+K140Q   800x600         13.0   7.6  48.7   710   +15 %
GPU+K120Q   800x600         10.5   1.5  36.8   321   -48 %

where:
GPU PT = XenApp GPU passthrough alone to the VM
GPU+K140Q = XenApp GPU passthrough + VM with K140Q vGPU
GPU+K120Q = XenApp GPU passthrough + VM with K120Q vGPU

Notes:

The VM using just the GPU passthrough on the XenApp is about equal to the VM with just the vGPU K140Q – a bit surprising as the GPU passthrough can tap into all 4 MB of RAM on the one engine, while the K140Q is limited to 1 MB of GPU memory.

The combination GPU passthrough + K140Q boosted performance by 15% probably because the K140Q can keep up better with what is being fed via the GPU passthrough.

Conclusions:

The combination of GPU passthrough + K120Q slowed things down and resulted in times when things were at a standstill, then slowly ramped up, and abruptly stalled again. This is evident in the low minimum FPS rates. Likely this is caused by a bottleneck of the GPU passthrough feeding in data too quickly for the K120Q to be able to keep up (contrary to the K140Q). This was seen in the vGPU load being zero at times and then ramping up slowly back to 100%. The K120Q by itself equals pretty much both the K140Q and GPU passthrough for the smaller frame size, and definitely is better as its own vGPU instance as opposed to pairing it with a GPU passthrough front end. Pairing a GPU/vGPU seems to have only fairly minimal benefits and only in very specific cases; given the extra cost factor for the minimal gains it can sometimes produce, it hardly seems worth it. The K140Q vGPU ran pretty steadily at 93-100% of the vGPU when front-ended by the GPU passthrough application. Things do not always scale as expected and more can sometimes result in less!