FYI, some benchmark comparison using the Unigine "Heaven" GPU benchmark run on a Xenith 2 thin client on a Dell R720 under Windows 7 using the latest 331.59 GRID driver release:
Format | Frame | K140Q | K120Q | Score Difference
Type | size | FPS:AVE MIN MAX Score | FPS:AVE MIN MAX Score | K120Q vs. K140Q
---------+----------+-------------------------+---------------------------+------------------
OpenGL 800x600 24.5 7.3 41.3 618 | 23.5 7.3 37.1 593 | -4 %
DirectX11 1280x800 13.0 5.6 22.5 326 | 13.0 5.5 22.4 327 | 0 %
OpenGL 1280x960 10.5 5.1 17.8 254 | 8.6 3.7 13.9 216 | -15 %
The difference is that the K140Q setup allocates 1 GB of GPU memory and you can have a maximum of 16 vGPUs per NVIDIA GRID K1 card. The brand new K120Q format is limited to the same maximum frame size, but allocates just 512 MB of RAM and hence allows up to 32 vGPUs per board. By contrast the K100 also supported 32 vGPUs, but only with 256 MB of memory (hence, half the memory was going unused).
This demonstrates a minimum penalty for using the smaller configuration which even with large formats isn’t that much less, while essentially no difference with either smaller frames or the somewhat less demanding DirectX 11 format. In other words, this density supports as many users as the weaker K100 and sacrifices little performance compared with the K140Q. This is a nice and somewhat surprising result.
Great job Tobias! So is it the goldilocks card?
This is interesting. Right now we are running some of our Dell R720 with 2 x K1 and 16 x K140Q enabled Win 7 VMs on each card (32 VMs per server).
If I could only use one K1 per server then I would not have to worry about the boot issue on our R720 :-).
Indeed. The big limitation with the K100 instance was performance, and the K120Q indeed seems to fill a sweet spot by leveraging all the GPU memory while allowing for just as many (32) vGPU instances. For what most people are doing, these are great numbers. The caveat is of course that the load and user experience is going to look different if all 32 users are running something demanding at the same times, since you still would then have to share a single K1 engine among up to 8 users. It would be interesting to test that with a large number of users to see if then the GPU itself is overtaxed. In other words, it may still take more than one K1 to provide adequate performance on a given server.
An additional note: On a XenApp 7.5 Windows 2012 R2 instance running openGL and using GPU passthrough mode (on the same GRID K1) and with an 800x600 high-resolution test with the same Unigine "Heaven" benchmark as in the cases above, we were able to achieve:
Ave FPS: 25.3
Min FPS: 7.5
Max FPS: 41.5
Score: 637
running from a Receiver on a decent Windows Desktop PC with a medium-quality internal GPU itself. There was no hesitation or any jerking or tiling taking place. Bandwidth usage seemed to average around 100-200 kB/sec though sometimes it spiked way higher.
It would be very interesting to see how this scales as additional users are added. We also ran the openGL version of Google Earth, which performed also very well, with quick responses to zooming and panning.