"An emulator required to run this VM failed to start"

Causing VMs to be unable to run on a particular host with a vGPU:

Steps and tests taken:
1. Host (K104) has been rebuilt and tested in and out of the pool with various VMs.
2. Can run VMs with no issues when the host is either in or out of pool but only with no vGPU enabled, or "Pass-through whole GPU" enabled. None of the GRID K*** vGPU options can be selected or the VM will fail to load with the "An emulator required to run this VM failed to start" error message.
3. Bootloader has been edited; (/etc/grub.conf) grub.conf to contain: iommu=dom0-passthrough (as suggested here: https://discussions.citrix.com/topic/382259-nvidia-grid-k2-on-hp-gen8-xenserver-70-an-emulator-required-to-run-this-vm-failed-to-start/ and here: https://nvidia.custhelp.com/app/answers/detail/a_id/4249/~/nvidia-grid-vgpu-drivers-will-fail-to-load-when-used-with-xenserver-7.0-on).
4. No issues were detected from the XenServer health check
5. Have removed, dusted, and re-seated the two GPUs and checked cable connections (as suggested here: https://discussions.citrix.com/topic/380940-xenserver-unable-to-start-vms-with-a-vgpu-after-crash/).
6. Host was rebuilt again with no change to issue - same error still occurs
7. All current working K1 hosts (3 of them in a pool) and VMs are on the older driver versions listed shortly and are working fine with vGPUs. I have tried the latest NVIDIA driver version on the isolated K104 host to see if an updated version fixed it but no change (GRID vGPU Manager 367.124 -> 367.128 Windows Display Driver 370.21 -> 370.28). Same error on both versions and no other change.
8. Currently have the host in (what should be) a working condition outside of the pool with a VM for testing purposes.

Some extra information:
Server model: HP Proliant DL380 Gen9
CPU: Intel® Xeon® CPU E5-2687W v3 @ 3.10GHz
RAM: 384GB 1600MHz
BIOS on host: P89
XenServer: 7.1.1
GRID vGPU Manager: 367.124 & 367.128 both tried
Windows Display Driver: 370.21 & 370.28 both tried

Any suggestions greatly appreciated, thanks in advance.

Xenserver with proper license?
vGPU manager installed properly on this host? Check with nvidia-smi

Hi sschaber,

Yep, Xenserver is fully licensed on all the hosts and have checked/compared the vGPU manager on all hosts to be the same version using nvidia-smi in the CLI - it’s installed with no issues/differences.

As far as I can see there are no differences at all between any of the hosts and their versions/updates anywhere, yet this one keeps failing with the emulator error.



Just for anyone who comes across this while trying to troubleshoot the same error…

After swapping several cards between identical hosts it became clear that a single K1 card was causing the problem. A replacement part solved the issue completely so in this case it appeared to be a GPU hardware failure. Good luck!