VDI, performance impact of non-numa locality (Multisocket servers)

Hi.

I’m currently doing some research on vGPU + VDI.

It looks like ESXi does a really bad job with handling Numa and PCI devices.
Unless you pin your VM to a specific socket and ensures that it really picks the local GPU you might end up on having multiple VMs running a vGPU through the interconnect.

I haven’t really tested the performance impact here. I’m aware that this depends on the workload.

Have anyone done any testing related to mixed VDI environments? Typical office users / CAD users.
How much impact will running through the interconnect have? Especially when using NVENC to encode the stream for Citrix/Horizon.

I’m aware of pinning VMs to specific sockets might be the way to go, but this is won’t scale in a big environment where VMs are moved, respecced etc.