NVIDIA Vmware vSphere-6.5

We have upgraded a esxi host to 6.5 and the VIB to the supported NVIDIA-kepler-vSphere-6.5-367.64-369.71 downloaded from Nvidia’s website but the base machine will not start with the GPU (PCI shared device) enabled complaining about not enough GPU memory. When running ‘nvidia-smi’ on the host, it shows the cards:

nvidia-smi
Thu Nov 24 00:04:52 2016
±----------------------------------------------------------------------------+
| NVIDIA-SMI 367.64 Driver Version: 367.64 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GRID K2 On | 0000:05:00.0 Off | Off |
| N/A 25C P8 28W / 117W | 18MiB / 4095MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 GRID K2 On | 0000:06:00.0 Off | Off |
| N/A 23C P8 27W / 117W | 18MiB / 4095MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 2 GRID K2 On | 0000:84:00.0 Off | Off |
| N/A 26C P8 28W / 117W | 18MiB / 4095MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 3 GRID K2 On | 0000:85:00.0 Off | Off |
| N/A 24C P8 27W / 117W | 18MiB / 4095MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 68574 G Xorg 7MiB |
| 1 68600 G Xorg 7MiB |
| 2 68641 G Xorg 7MiB |
| 3 68660 G Xorg 7MiB |
±----------------------------------------------------------------------------+
[root@k2-3:~]

Um, Xorg? The older esxi host down’t show that. Output from ‘gpuvm’

gpuvm
Xserver unix:0, PCI ID 0:5:0:0, vSGA mode, GPU maximum memory 4173824KB
GPU memory left 4173824KB.
Xserver unix:1, PCI ID 0:6:0:0, vSGA mode, GPU maximum memory 4173824KB
GPU memory left 4173824KB.
Xserver unix:2, PCI ID 0:132:0:0, vSGA mode, GPU maximum memory 4173824KB
GPU memory left 4173824KB.
Xserver unix:3, PCI ID 0:133:0:0, vSGA mode, GPU maximum memory 4173824KB
GPU memory left 4173824KB.

To me, something implies the VIB is not correct but that is the only 1 available via Nvidia’s website. Downgrading to NVIDIA-GRID-vGPU-kepler-vSphere-6.0-367.64-369.71 on the esxi host allows the base machine to start with GPU enabled, but View won’t compose a pool as it does not recognize the older GPU.

Anyway, has anyone else upgraded their Vsphere to 6.5 and run into this issue or are we missing something simple?

Thanks.

Nevermind, the host graphics settings on each esxi that had been updated to 6.5 had reverted back to Shared and not Shared Direct. Once setting the host to "Shared Direct" and restarting xorg, all is well.

This this exactly the problem I was running into, thanks for sharing the solution.

vSphere 6.5 and November 2016 GRID drivers (both Kepler and Maxwell) require changing the default GPU mode from “Shared” (vSGA) to “Shared Direct” (vGPU) via vCenter to enable vGPU support for VMs.

Not changing this will result in the VMs with a vGPU profile assigned to not start with the standard “graphics resources not available” error.

For those that may be starting to evaluate the November 2016 GRID drivers with vSphere 6.5, an additional step to configure the GPU mode is required.

Procedure:

  •      Select the ESXi 6.5 host in vCenter 6.5, next select the “Configure” tab and scroll down to “Graphics”.
    
  •      Highlight each GPUs that you want to use for vGPU and then select the edit icon to modify the Graphics device settings.
    
  •      Select “Shared Direct” for vGPU
    
  •      The host will need to be rebooted for the changes to take effect, after that your vGPU VMs should now start normally.
    

This new requirement and procedures will ba added to the documentation shortly, thank you for reporting this issue.

I found this and configured my server this way. It caused all my VMs set to use vmware svga to have issues. I don’t need them to use the GPU at all. I only wanted to enable for some.

Is this the new way we need to configure? To have all the VMs use the GPU, regardless of if needed?

This happened to VMs that did not have the Shared PCI added with a profile.

Hi,

Thanks alot for this info. I was working with NVIDIA support team on SR 161202-000639 with no avail until I came with this community.

Once again thanks alot Jeremy Main

This worked perfectly and make sure to restart xorg as mentioned by Yem above. I have edited my comments per @Sschaber below.

@Taskman: There are different versions of vGPU manager. Our documentation is fully correct. We reference on the Maxwell based vGPU manager (>GRID 2.0) but there is still the kepler one for public download as this version is for K1/K2 only and doesn’t require a GRID license.

Regards

Simon

@Jmain: I tried to follow your procedure to change the GPU from "Shared" to "Shared Direct". Although I dont see Edit option available under Graphics setting for my ESxi host. I am running vsphere 6.0.0. Where else can I change the Graphics settings?

Hi, this is an option only for vSphere 6.5. You won’t find it on 6.0!!!

Hi - Any thoughts on how to fix the ‘GPU memory’ error if we are not on 6.5 vSphere ? I am on 6.0.0 rev 3018524 of vSphere. I just upgraded some Esxi hosts to 6.0 Patch 5 ( i.e. rev 5572656 ). I now can not turn on any VM’s with a K2 card. Do i need to force them to vGPU mode ? im trying to figure out how to do that now with the rev’s im at . Any ideas? thanks.

@bobtheslob: We have the same issue after upgrading to ESXi 6.0 Patch (Build 5572656). I’ve opened a case at VMware. I will inform you, if I have any news.

Friends. With the video card K1 the same problem. I decided temporarily through shared direct. We are waiting for corrections

Hi - I’ve received an answer from VMware, they have sent me a link to kb2150498: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2150498
I’ve followed the instructions and copied the attached xorg file, after that I was able to start the service and the VMs again without changing the graphic settings. It seems that there is no other fix for this issue on ESX 6.0 Patch 5 (Build 5572656) with vCenter 6

@Neo2k4: Thanks for the link to the article. I will track the decision

Thank you! This resolved our issue.

I performed this extra step, rebooted and my vm is still not powering on and giving the error "graphics resource not available". Any suggestions? Which documentation outlines these steps btw?

Ok. I’m getting conflicting info. Should the Tesla M60 GPU card be in PCI passthrough on the ESXi host, or should it not be, in order for vGPU to work? GRID requirements use to state GPU passthrough, but what about Tesla M60? Does the VIB take care of all of that? When I place into passthrough I now notice xorg won’t start and nvidia-smi complains of an initialization error.
So, what’s the correct procedures for ESXi 6.5 with Tesla M60 and Horizon View 7.3 in order to utilize vGPU?

For sure you can’t run the GPU in PCI Passthrough if you want to use vGPU.

I’m on ESXI 6.5 not using vCenter how do I enable shared-direct graphics? I can’t find the option anywhere.

I tried to do it through esxcli but it says I can only set the GPU to Shared, or SharedPassthru