Hi - i did a quick check on NVidia’s website… is there official vGPU support ? I cant contact VMware for that correct ? Any idea where / how to contact support for official NVIDIA vGPU support ? We purchased like 9 of those things ( k2 cards ) - i just font see it in plain sight how to get ahold of them.
Hi Jason. Thank you for your reply. We purchased them through Dell. We have gold + support with them… they however seem to know not a ton about these cards. The tech i got only handled about 2 calls on these thus far.
We have 3 Dell T630 servers. Our dell rep confirmed we could have 3 - k2 cards in each. They have two 1600 power supplies. We are using these for Autocad in our Vmware view environment. We are currently using the k2’s via pass through ( vDga ?.. ). We have one server with 3 k2 cards that dont have pass through configured. I am trying to get these to work with vGPU. ( they are all in the same cluster ).
I have installed the VIB. ( once via VUM… then uninstalled it manually and reinstalled it manually )
I have confirmed the server via esxcli can see the (6) devices - 2 each card.
I seem to have to load the VIB manually.
Xorg will NOT load… it wont stay running. /etc/init.d/xorg status shows not loaded… however if i look via Vcenter that says its running … odd
Im currently trying to get Xorg to stay running.
I turned off the memory mapped I/O above 4gb in the bios ( i missed it at 1st… its hiding at the very bottom off the bios option screen).
Once i did that i got the UEFI0134 - unable to allocate MMIO resources for one of more PCIe devices…
will Xorg not load while keeping this Bios option on for the Memory mapped I/O above 4gb ?
I changed the Bios settings so we have no redundant PSU… so hopefully both are engaged.
The Bios we are on is 1.1.4 for the t630.
We purchased these specifically to have three k2 cards in them… Is it sounding like we need to pull one ?.. Dell support says slots 3 and 7 should be used only… but then they looked at the PSU’s again and said if we turned off the redundancy it should work.
Thus… what i think im going to try next is :
Look for a new bios update.
pull one card out - only keep a card in slot 3 and 7.
disable Memory mapped i/o above 4gb again.
and see if i get the UEFI0134 error again.
My two symtoms are - Xorg will not stay running / load… and ‘nvidia-smi’ doesnt show anything at all.
i get a failed to initialize NVML: Unknown Error when i do a ‘nvidia-smi’
sorry - hope this is a little coherent… im way tired.
Did you remove the vDGA / PCI passthrough assignment? If not, then vGPU will have no GPU resources.
Which .vib? Can you paste the full name of the .vib you installed on the host.
That should not be necessary for ESXi 6.
Dell have certified the chassis for up to 4 cards, so they should be able to advise on the correct configuration to get all cards working.
I suspect that you still have all the GPU’s configured for PCI passthrough. In this scenario nvidia-smi at the console will see nothing and fail, the vGPU Manager in the host will have no resources to start, and so you will have no vGPU present in the host.
NVIDIA-vgx-VMware_ESXi_6.0_Host_Driver_346.68-1OEM.600.0.0.2494585.vib is the exact vib i used.
Do the cards hold the passthrough config ?.. Or is it the cluster ?
We have a cluster called Graphics-K2… it has 3 servers in it… each have 3 k2 cards.
The server im messing with has cards that might have been setup for passthrough… but no users are currently using those cards im trying to get to work.
Can i have BOTH ?.. in a cluster ?.. 2 servers w/ passthrough and 1 server w/ vGPU in the same cluster ?
What specifically actually holds the passthrough config ?.. I wasn’t the one that set this up initially ( if that’s not glaringly obvious :)
Do i change it on the cluster settings or the card itself ?.
Thanks again for the help… Dell ( whom we purchased these from ) only really knows the hardware side… will it go in there servers etc.
This is simply indicating that not all the hosts in the cluster are configured for vGPU.
Once all the hosts are configured the alert goes away. It’s implemented by VMware to alert you so that you don’t move a vGPU enabled VM to a host that can’t run a vGPU session.