vSphere 6.7 up1 - VMotion

I just upgraded my View environment to 7.6 with vSphere 6.7 up1 (both vcenter and esxi). I have M10 cards in my servers setup for shared direct, and I also enabled the vgpu.hotmigrate.enabled advanced setting in vCenter to allow vmotion. However, if I try to vmotion, I get: "A required migration feature is not supported on the "Source" host" error. The host driver version is 390.94-1OEM.670.0.0.8169922.

Is this not supported yet with GRID? Or am I missing something else?

Thanks!

Yes. You’re on the wrong branch of vGPU driver. You need to be running 410.68 (vGPU 7.0)

Yeah, right after I posted this I logged in to see 410/vGPU 7 was released. I upgraded all hosts and the guest OS. However, I still get the same error. Is there anything other than the vgpu.hotmigrate.enabled option that needs to be done?

The only changes you need to make are listed here: https://docs.nvidia.com/grid/7.0/grid-vgpu-user-guide/index.html#configuring-vgpu-migration-vmware-vsphere.

Make sure you do it on all vSphere Hosts in the Pool.

I assume the Hosts are all identical, and that you rebooted them after upgrading to vGPU 7.0. Did you check with SMI on each host to ensure the upgrade had actually been applied?

If everything else is as it should be, then it may be worth completely uninstalling the vGPU software from each host > rebooting the hosts > reinstall the vGPU software > reboot the hosts > validate with SMI that it’s installed and working and try a migration again.

I verified that all hosts have 410.68 installed with nvidia-smi. Each host was rebooted after the install. I followed that guide, but it references that setting on each ESXi host, but that’s a vCenter setting. There’s no such ESXi setting. When I disable the ‘vgpu.hotmigrate.enabled’ option, the error I get says that it’s not compatible because vGPU migration is disabled. When I enable that option, the error just says "A required migration feature is not supported on the "Source" host". I have other non-vGPU vms on these hosts that migrate with no issues.

Also - I ran this command on each ESXi host:

[root@esx23:~] nvidia-smi vgpu -m
GPU 00000000:3D:00.0
Migration capability : Yes
GPU 00000000:3E:00.0
Migration capability : Yes
GPU 00000000:3F:00.0
Migration capability : Yes
GPU 00000000:40:00.0
Migration capability : Yes

I have rebooted the vCenter appliance, but I can’t reboot the ESXi hosts…because there are vm’s on them that cannot be migrated off.

I found the issue. When I installed the update from 6.5 to 6.7 I accidentally applied the "security only update" (ESXi-6.7.0-20181001001s-standard), instead of the full update (ESXi-6.7.0-20181002001-standard).

I reapplied the upgrade and was able to vmotion with no issues.

:-)