NVidia Grid Tesla M10 VMware 6.7 vGPU profile problem

We are trying to do an evaluation for NVidia grid. We have bought the cards, in our new server Dell forgot to sell us the NVidia software. We are waiting on them to get us the correct quote.
Anyway as we wait on them I got an evaluation license for our setup

We are running 4 Dell servers R740xd
We have 4 Tesla M10 cards 1 in each server
We have installed VMware 6.7
We have installed the NVidia grid vSphere 6.7-390.57-391.58 software in our ESXi hosts

When I do a nvidia-smi I see the driver is loaded and everything there looks good.

However when I try and add the PCI Device NVidia Grid vGPU (edit settings on the VM) the vGPU profile pull down box is empty.

I’m unsure on how to fix this problem. I would love some help fixing it.

Stephen

Hi Steven,

have you checked to switch from "Direct" to "Shared Direct" GPU setting in vCenter?
In addition you need to have Enteprise Plus License on ESX.

Regards

Simon

i just checked on all of the hosts we are using "Shared Direct" (Vendor shared pass through graphics" and not "Shared" (VMware shared virtual graphics) and we are running Enterprise Plus.

Stephen

OK i fixed it. i’m not sure why i had to do this but even though i have rebooted my ESXi hosts a few times it now works because i restarted the xorg service.

Thanks again.

Stephen

Good Mornining,

I’m having this issue where Nvidia M10 Tesla card’s are not being recognised within the OS. When going in to the configuration of VMware 7 Administrator I am able to complete the set up select the GPU and click OK but as soon as boot the VM the GPU is not recognised in the OS and it just uses a generic graphics driver.

As anyone come across this problem before?

Kind Regards

Santokh

I am glad I found this topic.

I too am having the exact same problem with my M10 cards.

We have a production environment with a bunch of M10 Hosts on ESX 6.0 and Horizon 6.2.5 but I am now doing a POC with ESX 6.7 and Horizon 7.5

I know everything worked fine in ESX 6.5 with my last test but Grid refuses to work in ESX 6.7

I am using the Grid 6.1 Driver Package for ESX 6.7

Has anyone found a reliable way to get this working? I have setup many hosts over the last few years and never seen an issue like this. You can add the Shared PCI Device and it says Nvidia Grid but the Profile selection field is blank. I have done numerous host restarts, vCenter restarts and XORG Service restarts but nothing.

I have attached my Nvidia-smi (shows Xorg running on each core). Kinda strange… never seen that before.

As an update to those with the issue.

Even though the field was blank I manually typed in the profile name.

Example: M10-1B

and pressed enter. It then accepted it and added the profile. I suspect there is a problem with vCenter that it is not properly displaying the menu options (they might be there just not visible or selectable).

Also, make sure you do not have under CPU options enabled: Expose hardware assisted virtualization to the guest operating system.

[b]Update: Even at this point the VM will not power on. You will get the message: "No Host is compatible with the virtual machine"

So the ESX Host is having an issue with the driver(s).[/b]

Hey Guys,

I have figured out the issue.

in ESX 6.7 by default the Host is set to "PCI Shared" instead of "PCI Shared Direct". I have no idea why this change from ESX 6.5 to 6.7

As soon as I changed this in the Client and rebooted the Host everything was working properly as expected.

I am glad I got this figured out but I just do not understand why for no real reason the "default" configuration for vGPU has changed.

As a side note for those who have not noticed but at the time of this post Grid 6.1 is only certified to Horizon 7.4.x but has anyone tested with Horizon 7.5? It is currently unknown if there a new Grid 6.2 coming down the pipe soon (or official thumbs up on 6.1 with Horizon 7.5).

Thank You!

GRID 6.2 is already released with support for 7.5. In addition I don’t understand your post above. Shared is already the default for 6.5 and you always need to change to Shared direct. This was introduced with 6.5 and is not new to 6.7.

Regards

Simon

When we did testing on ESX 6.5 (at least the versions we tried) this was NOT the default. The default was the same as ESX 6.0

For us ESX 6.7 was the first time we have seen this change in default settings.