We have upgraded a esxi host to 6.5 and the VIB to the supported NVIDIA-kepler-vSphere-6.5-367.64-369.71 downloaded from Nvidia’s website but the base machine will not start with the GPU (PCI shared device) enabled complaining about not enough GPU memory. When running ‘nvidia-smi’ on the host, it shows the cards:
nvidia-smi
Thu Nov 24 00:04:52 2016
±----------------------------------------------------------------------------+
| NVIDIA-SMI 367.64 Driver Version: 367.64 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GRID K2 On | 0000:05:00.0 Off | Off |
| N/A 25C P8 28W / 117W | 18MiB / 4095MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 GRID K2 On | 0000:06:00.0 Off | Off |
| N/A 23C P8 27W / 117W | 18MiB / 4095MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 2 GRID K2 On | 0000:84:00.0 Off | Off |
| N/A 26C P8 28W / 117W | 18MiB / 4095MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 3 GRID K2 On | 0000:85:00.0 Off | Off |
| N/A 24C P8 27W / 117W | 18MiB / 4095MiB | 0% Default |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 68574 G Xorg 7MiB |
| 1 68600 G Xorg 7MiB |
| 2 68641 G Xorg 7MiB |
| 3 68660 G Xorg 7MiB |
±----------------------------------------------------------------------------+
[root@k2-3:~]
Um, Xorg? The older esxi host down’t show that. Output from ‘gpuvm’
gpuvm
Xserver unix:0, PCI ID 0:5:0:0, vSGA mode, GPU maximum memory 4173824KB
GPU memory left 4173824KB.
Xserver unix:1, PCI ID 0:6:0:0, vSGA mode, GPU maximum memory 4173824KB
GPU memory left 4173824KB.
Xserver unix:2, PCI ID 0:132:0:0, vSGA mode, GPU maximum memory 4173824KB
GPU memory left 4173824KB.
Xserver unix:3, PCI ID 0:133:0:0, vSGA mode, GPU maximum memory 4173824KB
GPU memory left 4173824KB.
To me, something implies the VIB is not correct but that is the only 1 available via Nvidia’s website. Downgrading to NVIDIA-GRID-vGPU-kepler-vSphere-6.0-367.64-369.71 on the esxi host allows the base machine to start with GPU enabled, but View won’t compose a pool as it does not recognize the older GPU.
Anyway, has anyone else upgraded their Vsphere to 6.5 and run into this issue or are we missing something simple?
Nevermind, the host graphics settings on each esxi that had been updated to 6.5 had reverted back to Shared and not Shared Direct. Once setting the host to "Shared Direct" and restarting xorg, all is well.
vSphere 6.5 and November 2016 GRID drivers (both Kepler and Maxwell) require changing the default GPU mode from “Shared” (vSGA) to “Shared Direct” (vGPU) via vCenter to enable vGPU support for VMs.
Not changing this will result in the VMs with a vGPU profile assigned to not start with the standard “graphics resources not available” error.
For those that may be starting to evaluate the November 2016 GRID drivers with vSphere 6.5, an additional step to configure the GPU mode is required.
Procedure:
Select the ESXi 6.5 host in vCenter 6.5, next select the “Configure” tab and scroll down to “Graphics”.
Highlight each GPUs that you want to use for vGPU and then select the edit icon to modify the Graphics device settings.
Select “Shared Direct” for vGPU
The host will need to be rebooted for the changes to take effect, after that your vGPU VMs should now start normally.
This new requirement and procedures will ba added to the documentation shortly, thank you for reporting this issue.
I found this and configured my server this way. It caused all my VMs set to use vmware svga to have issues. I don’t need them to use the GPU at all. I only wanted to enable for some.
Is this the new way we need to configure? To have all the VMs use the GPU, regardless of if needed?
This happened to VMs that did not have the Shared PCI added with a profile.
@Taskman: There are different versions of vGPU manager. Our documentation is fully correct. We reference on the Maxwell based vGPU manager (>GRID 2.0) but there is still the kepler one for public download as this version is for K1/K2 only and doesn’t require a GRID license.
@Jmain: I tried to follow your procedure to change the GPU from "Shared" to "Shared Direct". Although I dont see Edit option available under Graphics setting for my ESxi host. I am running vsphere 6.0.0. Where else can I change the Graphics settings?
Hi - Any thoughts on how to fix the ‘GPU memory’ error if we are not on 6.5 vSphere ? I am on 6.0.0 rev 3018524 of vSphere. I just upgraded some Esxi hosts to 6.0 Patch 5 ( i.e. rev 5572656 ). I now can not turn on any VM’s with a K2 card. Do i need to force them to vGPU mode ? im trying to figure out how to do that now with the rev’s im at . Any ideas? thanks.
@bobtheslob: We have the same issue after upgrading to ESXi 6.0 Patch (Build 5572656). I’ve opened a case at VMware. I will inform you, if I have any news.
Hi - I’ve received an answer from VMware, they have sent me a link to kb2150498: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2150498
I’ve followed the instructions and copied the attached xorg file, after that I was able to start the service and the VMs again without changing the graphic settings. It seems that there is no other fix for this issue on ESX 6.0 Patch 5 (Build 5572656) with vCenter 6
I performed this extra step, rebooted and my vm is still not powering on and giving the error "graphics resource not available". Any suggestions? Which documentation outlines these steps btw?
Ok. I’m getting conflicting info. Should the Tesla M60 GPU card be in PCI passthrough on the ESXi host, or should it not be, in order for vGPU to work? GRID requirements use to state GPU passthrough, but what about Tesla M60? Does the VIB take care of all of that? When I place into passthrough I now notice xorg won’t start and nvidia-smi complains of an initialization error.
So, what’s the correct procedures for ESXi 6.5 with Tesla M60 and Horizon View 7.3 in order to utilize vGPU?