Hi,
I have been fighting with this issue of VMs crashing or becoming unstable after a random length of time for the last 5 days.
sample of the (many) errors in the vmware.log of any of the VMs using vGPUs (v100D_2Q profile):
2019-12-07T14:14:01.622Z| vcpu-2| W115: Memory regions (0xfc000000, 0xfcfff000) and (0xfc810000, 0xfc81f000) overlap (0x54f0024000 0x5520026000).vcs = 0xfff, vpcuId = 0xffffffff
2019-12-07T14:14:01.627Z| vcpu-0| W115: Memory regions (0xfc000000, 0xfcfff000) and (0xfc810000, 0xfc81f000) overlap (0x54f0024000 0x5520026000).vcs = 0xfff, vpcuId = 0xffffffff
When the screen is rendering it is choppy/laggy and eventually freezes, then requires the use of esxcli vm process kill on the host to stop the VM.
VM Specs:
EFI Boot
4x vCPU
16GB vRAM - Reserved
Paravirtualized SCSI Adapater
Shared PCI Device vGPU
These VMs are to serve desktops using Horizon 7.11, Both Windows 1903 and Ubuntu 1804 LTS are generating the same errors in their respective vmware.log files.
Ubuntu seems to crash more offten than windows, I think due to the Video Ram Usage on the vGPU.
Tried GRID Drivers (and VM drivers to match)
10.1
10.0
9.2
ESXi Versions tried
6.7 U3
6.7 U2
6.7 U1
We are fully licensed for vQwS and vPC
HPE RBSU (BIOS) Profile - Virtualization - Max Performance (SR-IOV, VT-D enabled)
I cannot provide a full vmware.log as this is a darksite.
Looking for any Assistance/Hints to solve this.
Thanks in Advance