Nvidia VMware vSphere-6.7

Hi,

I have installed VMware-VMvisor-Installer-6.7.0.update02-13006603.x86_64, on my server and the vib to the supported NVIDIA-VMware_ESXi_6.7_Host_Driver-430.27-1OEM.670.0.0.8169922.x86_64.vib, download from the Nvidia Enterprise Website but the base machine will not start with vGPU.

When running ‘nvidia-smi’ on the host, it’s shows the cards:

[root@localhost:~] nvidia-smi

Tue Jul 2 09:35:54 2019

±----------------------------------------------------------------------------+

| NVIDIA-SMI 430.27 Driver Version: 430.27 CUDA Version: N/A |

|-------------------------------±---------------------±---------------------+

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

|===============================+======================+======================|

| 0 Quadro RTX 6000 On | 00000000:1A:00.0 Off | Off |

| 34% 36C P8 146W / 260W | 159MiB / 24575MiB | 0% Default |

±------------------------------±---------------------±---------------------+

| 1 Quadro RTX 6000 On | 00000000:1B:00.0 Off | Off |

| 34% 37C P8 150W / 260W | 159MiB / 24575MiB | 0% Default |

±------------------------------±---------------------±---------------------+

| 2 Quadro RTX 6000 On | 00000000:60:00.0 Off | Off |

| 33% 36C P8 137W / 260W | 159MiB / 24575MiB | 0% Default |

±------------------------------±---------------------±---------------------+

| 3 Quadro RTX 6000 On | 00000000:61:00.0 Off | Off |

| 34% 37C P8 140W / 260W | 159MiB / 24575MiB | 0% Default |

±------------------------------±---------------------±---------------------+

| 4 Quadro RTX 6000 On | 00000000:B1:00.0 Off | Off |

| 34% 37C P8 147W / 260W | 159MiB / 24575MiB | 0% Default |

±------------------------------±---------------------±---------------------+

| 5 Quadro RTX 6000 On | 00000000:B2:00.0 Off | Off |

| 34% 37C P8 146W / 260W | 159MiB / 24575MiB | 0% Default |

±------------------------------±---------------------±---------------------+

| 6 Quadro RTX 6000 On | 00000000:DA:00.0 Off | Off |

| 33% 31C P8 140W / 260W | 159MiB / 24575MiB | 0% Default |

±------------------------------±---------------------±---------------------+

| 7 Quadro RTX 6000 On | 00000000:DB:00.0 Off | Off |

| 33% 35C P8 144W / 260W | 159MiB / 24575MiB | 0% Default |

±------------------------------±---------------------±---------------------+

I changed the default GPU mode from “Shared” (vSGA) to “Shared Direct” (vGPU) via vCenter to enable vGPU support for VMs.

Here is the error message:

Failed to start the virtual machine.
Module ‘DevicePowerOn’ power on failed.
Could not initialize plugin ‘/usr/lib64/vmware/plugin/libnvidia-vgx.so’ for vGPU ‘grid_rtx6000-24q’
passthrough device ‘pciPassthru0’ vGPU ‘grid_rtx6000-24q’ disallowed by vmkernel

Thanks for your help

Hi

Which server chassis are you running?

What happens when you try a smaller profile? Try a 1Q and see if the VM powers on.

Also …

Have you disabled ECC? …

Check it by running: nvidia-smi -q

Disable it by running: nvidia-smi -e 0

You’ll need to reboot the chassis after running this command

You can also try adding the following to the VMs "Advanced Configuration":

pciPassthru.use64bitMMIO= "TRUE"

pciPassthru.64bitMMIOSizeGB = "64"

Regards

Ben

Hi

My server is a TYAN Model: B7109F77DV10E4HR-2T-N

I check with 1Q and it’s same error.
I have disabled ECC
And i have added the following to the VMs "Advanced Configuration"

pciPassthru.use64bitMMIO= "TRUE"
pciPassthru.64bitMMIOSizeGB = "64"

Always the same error

Thanks

Hyssam

Hi

Just checking … but when you added those entries, I assume you added the values without the quotation marks on each end? " "

When you configured the VM, did you select the option on the VM to "Reserve all guest memory" ?

Also make sure that the memory allocated to the VM vs the "reserved memory" values are the same. If you’ve changed the amount of memory allocated to the VM, you need to un-check, then re-check the "reserved memory" option, as it doesn’t automatically update and the VM will then fail to power on.

I’ve had it in the past that when changing from "Shared" to "Shared Direct" a host reboot has been required. As although you can manually restart Xorg, sometimes this isn’t enough and a full reboot has made the difference.

Something you could try just to see whether it’s vGPU or System related … Put one of the GPUs into Passthrough mode, replace the vGPU profile on the VM with it and try powering it on.

Regards

Ben

Hi

I have added the values without the quotation marks;

I have selected on the VM option "Reserve all guest Memory"

The reserve memory and the memory alloccated are the same.

I have restarted manually xorg and ESXi.

How put on of the GPUs ?

thanks

Hyssam

Hi

As I can’t actually see how you’ve configured things, I’m not able to suggest anything else.

Can you take a few screenshots of your VM configuration and also GPU configuration from vCenter and post it on here? Maybe that will show a configuration issue somewhere.

Regards

Ben

VM configuration



Vcenter configuration


VMware ESXi configuration


Thanks for taking the time to do that.

The VM has a lot of vCPUs added, but that won’t stop it powering on. Apart from that, the general config looks ok initially with no obvious issues to me.

Have you made any changes in the BIOS? Can you have a look at the MMIO settings and make sure they’re configured correctly. I’ve not used a Tyan before, so am unsure what options are available, but here’s a reference on what you should be looking for: Incorrect BIOS settings on a server when used with a hypervisor can cause MMIO address issues that result in GRID GPUs failing to be recognized. | NVIDIA

Regards

Ben

Just to be sure, what license is your vcenter?

Also, do you have a GPU profile that ends with: a
Does this work or does it give the same error?

You are solved my problem, on the BIOS the intel VT for Directed I/O has been Disabled.

I activated the option and my virtual machine works.

Thanks for your help

Hyssam

No worries, glad it’s now working :-)

By the way … that’s a kick-ass configuration! Just out of interest, are you able to say what you plan to use it for?

And FYI, you can put 4 of those RTX 6000s with the 24Q profile inside the same VM if using vGPU, as vGPU now supports Multi-GPU configurations with up to 4 GPUs (but you have to use the top profile, in this case 24Q). But if you switch to Passthrough, then you can put all of them inside a single VM !! … :-D

Regards

Ben

It’s to make a server certification(3D virtualisation) and all GPU are allocated inside a single VM

thank you for all

Hyssam

Nice!

Thanks for the information

Best of luck with your project!

Regards

Ben

Hi, I am unable to download VIB for ESXi 6.7 .I have TESLA V100d.

Can anyone help.