We had some odd results with the Redway3D benchmark, I’d love it if a vSphere/View expert can explain them, been driving me nuts for months as to what I missed and never got to the bottom of it… http://blogs.citrix.com/2014/01/03/comparing-vgpu-and-vsga-on-nvidia-grid-using-the-redway3d-turbine-demo-observations-and-questions/
The vSGA software has a shim driver within the VM that sits between the application and communicates ultimately with the NVIDIA hardware driver in the hypervisor. This is different when compared to vGPU workstation where there is an NVIDIA driver inside the VM and another part inside the hypervisor. This can yield some key differences:
The vSGA driver in the VM will expose a certain level of API version support, which for vSGA is DirectX 9.0c and OpenGL 2.1. So apps asking to create a DX10 or 11 will fail and likewise any OpenGL functions defined beyond 2.1, as well as extensions also not supported by the vSGA driver, will fail.
In both APIs data is frequently passed from the application through the API into the driver and ultimately to the GPU where its used to draw pixels. In the case of vSGA there are more likely to be more copies before it reaches the GPU. Depending on the amount of data, how frequently calls are by the application the additional data transfer can be an impact.
In some cases the vSGA driver will translate the API calls made by the application into a different set of calls which it issues to the NVIDIA driver in the hypervisor, eg DX. This translation is also a potential area for performance impact.
It would take more work to examine the exact graphics calls being made by the Redway benchmark but through a combination of the above 3 key differences there could be a significant overhead when comparing vSGA to vDGA, vGPU and baremetal alternatives.
Certain OpenGL data types (even in OpenGL 2.1) such as buffer objects also create and when used reference memory GPU memory and in a virtualized environment these may trigger extra work in the hypervisor to ensure the virtual machine memory mappings are correct for the VM executing. This in turn may cause extra work compared to baremetal. In the case of vGPU. In the case of vSGA there may be some extra implications for these types of data.
So even though the same hardware is underneath both comparisons there are a lot of software layers and in the vSGA case they are very different and there are more of them compared with vGPU. As I say it would require detailed investigation to say definitively what is the bottleneck however the extra layers have the potential for performance impact.