I am testing vGPU config and automated recovery after a host failure with HA. When an ESXi host dies the running VMs fail to automatically start on another vGPU enabled host.
The error given by HA is:
"Insufficient resources to fail over this virtual machine. vSphere HA will retry the fail over when enough resources are available. Reason: Unable to find healthy compatible hosts for the VM"
When all hosts have recovered and the VM is powered back on it shows an error in the Event log:
"Hardware GPU resources are not available. The virtual machine will use software rendering."
Which in fact is a total lie. vgpuvm shows that it is using the hardware GPU, nvidia-smi shows it, and dxdiag in the guest vm shows it is using it.
Any ideas?
[root@smview1:/vmfs/volumes/9d2dede8-7f3698ef/Training-0004] grep -i ‘mks|’ vmware.log
2016-06-09T15:41:56.342Z| mks| I120: VTHREAD start thread 2 “mks” pid 39369
2016-06-09T15:41:56.342Z| mks| I120: MKS thread is alive
2016-06-09T15:41:56.343Z| mks| I120: MKS-RenderMain: RenderMain: PowerOn allowed 0 1 1 1 0
2016-06-09T15:41:56.343Z| mks| I120: MKS-RenderMain: Collecting RenderOps caps…
2016-06-09T15:41:56.344Z| mks| W110: GLWindow: Unable to reserve host GPU resources
2016-06-09T15:41:56.351Z| mks| I120: MKS-SWP: plugin started - llvmpipe (LLVM 3.3, 256 bits)
2016-06-09T15:41:56.351Z| mks| I120: Started Shim3D
2016-06-09T15:41:56.352Z| mks| I120: Stopped Shim3D
2016-06-09T15:41:56.352Z| mks| I120: MKS-SWP: plugin stopped
2016-06-09T15:41:56.352Z| mks| I120: MKS-RenderMain: Starting MKSBasicOps
2016-06-09T15:41:56.352Z| mks| I120: KHBKL: Unable to parse keystring at: ‘’
2016-06-09T15:41:56.352Z| mks| I120: MKS-RemoteMgr: Set default display name: Training-0004
2016-06-09T15:41:56.352Z| mks| I120: MKS-RemoteMgr: Loading VNC Configuration from VM config file
[root@smview1:/vmfs/volumes/9d2dede8-7f3698ef/Training-0004]
[root@smview1:/vmfs/volumes/9d2dede8-7f3698ef/Training-0004]
[root@smview1:/vmfs/volumes/9d2dede8-7f3698ef/Training-0004] grep -i ‘mks’ vmware.log
2016-06-09T15:41:55.633Z| vmx| I120: MKSXlib: Initialized thread-safe Xlib
2016-06-09T15:41:55.701Z| vmx| I120: DICT mks.enable3d = “TRUE”
2016-06-09T15:41:55.701Z| vmx| I120: DICT mks.use3dRenderer = “automatic”
2016-06-09T15:41:56.342Z| vmx| I120: MKS PowerOn
2016-06-09T15:41:56.342Z| mks| I120: VTHREAD start thread 2 “mks” pid 39369
2016-06-09T15:41:56.342Z| mks| I120: MKS thread is alive
2016-06-09T15:41:56.343Z| mks| I120: MKS-RenderMain: RenderMain: PowerOn allowed 0 1 1 1 0
2016-06-09T15:41:56.343Z| mks| I120: MKS-RenderMain: Collecting RenderOps caps…
2016-06-09T15:41:56.344Z| mks| W110: GLWindow: Unable to reserve host GPU resources
2016-06-09T15:41:56.351Z| mks| I120: MKS-SWP: plugin started - llvmpipe (LLVM 3.3, 256 bits)
2016-06-09T15:41:56.351Z| mks| I120: Started Shim3D
2016-06-09T15:41:56.352Z| mks| I120: Stopped Shim3D
2016-06-09T15:41:56.352Z| mks| I120: MKS-SWP: plugin stopped
2016-06-09T15:41:56.352Z| mks| I120: MKS-RenderMain: Starting MKSBasicOps
2016-06-09T15:41:56.352Z| mks| I120: KHBKL: Unable to parse keystring at: ‘’
2016-06-09T15:41:56.352Z| mks| I120: MKS-RemoteMgr: Set default display name: Training-0004
2016-06-09T15:41:56.352Z| mks| I120: MKS-RemoteMgr: Loading VNC Configuration from VM config file
2016-06-09T15:41:56.353Z| vmx| I120: [msg.mks.noGPUResourceFallback] Hardware GPU resources are not available. The virtual machine will use software rendering.
2016-06-09T15:41:56.354Z| vmx| I120: Vigor_MessageRevoke: message ‘msg.mks.noGPUResourceFallback’ (seq 24717) is revoked
2016-06-09T15:41:56.500Z| vmx| I120: OvhdMem OvhdUser_VmxMks : 33 33 - | 2 2 -
2016-06-09T15:41:56.500Z| vmx| I120: OvhdMem OvhdUser_VmxMks3d : 180224 180224 - | 0 0 -
2016-06-09T15:41:56.500Z| vmx| I120: OvhdMem OvhdUser_VmxMksGLRenderer : 12288 12288 - | 0 0 -
2016-06-09T15:41:56.500Z| vmx| I120: OvhdMem OvhdUser_VmxMksGLTransient : 65536 65536 - | 0 0 -
2016-06-09T15:41:56.500Z| vmx| I120: OvhdMem OvhdUser_VmxMksLLVM : 8192 8192 - | 0 0 -
2016-06-09T15:41:56.500Z| vmx| I120: OvhdMem OvhdUser_VmxMksScreenTemp : 36866 36866 - | 0 0 -
2016-06-09T15:41:56.500Z| vmx| I120: OvhdMem OvhdUser_VmxMksVnc : 19362 19362 - | 0 0 -
2016-06-09T15:41:56.500Z| vmx| I120: OvhdMem OvhdUser_VmxMksScreen : 32769 32769 - | 0 0 -
2016-06-09T15:41:56.500Z| vmx| I120: OvhdMem OvhdUser_VmxMksSVGAVO : 4096 4096 - | 0 0 -
2016-06-09T15:41:56.500Z| vmx| I120: OvhdMem OvhdUser_VmxThreadMks : 512 512 - | 512 512 -
[root@smview1:/vmfs/volumes/9d2dede8-7f3698ef/Training-0004] grep -i ‘wddm’ vmware.log
2016-06-09T15:42:29.264Z| vcpu-3| I120: Guest: vm3d: SVGA WDDM Full Display driver, Version: 8.15.01.0045, Build Number: 3471414
2016-06-09T15:42:29.264Z| vcpu-3| I120: Guest: vm3d: WDDM OS version: 6.1, build number: 7601, service pack version: 1.0, platform Id: 2, product type: 1, suite mask: 0x110
2016-06-09T15:42:29.270Z| vcpu-3| I120: Guest: vm3d: WDDM Guest backed surface is enabled.
2016-06-09T15:42:29.272Z| vcpu-3| I120: Guest: vm3d: WDDM 3D is enabled.
2016-06-09T15:42:29.272Z| vcpu-3| I120: Guest: vm3d: WDDM DX10 context is disabled.
2016-06-09T15:42:29.272Z| vcpu-3| I120: Guest: vm3d: WDDM GL3 is disabled.
2016-06-09T15:42:29.272Z| vcpu-3| I120: Guest: vm3d: WDDM DX cap is disabled.
2016-06-09T15:42:29.272Z| vcpu-3| I120: Guest: vm3d: WDDM Guest backed primary in aperture is disabled.
2016-06-09T15:42:29.273Z| vcpu-3| I120: Guest: vm3d: WDDM GDI HW Acceleration is enabled.
2016-06-09T15:42:29.273Z| vcpu-3| I120: Guest: vm3d: WDDM GDI HW Acceleration Patch is enabled.
2016-06-09T15:42:29.273Z| vcpu-3| I120: Guest: vm3d: WDDM primary bounding box mem 16384KB.
2016-06-09T15:42:29.273Z| vcpu-3| I120: Guest: vm3d: WDDM VRAM 49152KB.
2016-06-09T15:42:29.276Z| vcpu-3| I120: Guest: vm3d: WDDM using 152KB memory for OTable.
2016-06-09T15:42:29.276Z| vcpu-3| I120: Guest: vm3d: WDDM GMR memory segment 262144KB.
2016-06-09T15:42:29.276Z| vcpu-3| I120: Guest: vm3d: WDDM Aperture memory 524288KB.
[root@smview3:~] nvidia-smi
Thu Jun 9 15:57:40 2016
±-----------------------------------------------------+
| NVIDIA-SMI 361.45 Driver Version: 361.45.09 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GRID K2 On | 0000:84:00.0 Off | Off |
| N/A 34C P8 29W / 117W | 846MiB / 4095MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 GRID K2 On | 0000:85:00.0 Off | Off |
| N/A 31C P8 28W / 117W | 426MiB / 4095MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 2 GRID K2 On | 0000:8A:00.0 Off | Off |
| N/A 24C P8 28W / 117W | 426MiB / 4095MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 3 GRID K2 On | 0000:8B:00.0 Off | Off |
| N/A 36C P8 28W / 117W | 426MiB / 4095MiB | 0% Default |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 37053 C+G Training-0001 416MiB |
| 0 37054 C+G VDI-0047 416MiB |
| 1 37055 C+G Training-0003 416MiB |
| 2 37056 C+G Training-0002 416MiB |
| 3 39359 C+G Training-0004 416MiB |
±----------------------------------------------------------------------------+
[root@smview3:~] gpuvm
Xserver unix:0, PCI ID 0:132:0:0, vGPU: 0x11b0:0x109d, GPU maximum memory 4184024KB
pid 37053, VM "Training-0001", reserved 425984KB of GPU memory.
pid 37054, VM "VDI-0047", reserved 425984KB of GPU memory.
GPU memory left 3332056KB.
Xserver unix:1, PCI ID 0:133:0:0, vGPU: 0x11b0:0x109d, GPU maximum memory 4184024KB
pid 37055, VM "Training-0003", reserved 425984KB of GPU memory.
GPU memory left 3758040KB.
Xserver unix:2, PCI ID 0:138:0:0, vGPU: 0x11b0:0x109d, GPU maximum memory 4184024KB
pid 37056, VM "Training-0002", reserved 425984KB of GPU memory.
GPU memory left 3758040KB.
Xserver unix:3, PCI ID 0:139:0:0, vGPU: 0x11b0:0x109d, GPU maximum memory 4184024KB
pid 39359, VM "Training-0004", reserved 425984KB of GPU memory.
GPU memory left 3758040KB.
[root@smview3:~]