NVIDIA GRID VGPU support does not match desktop setting + Esxi console blank

Hi there, I have the 2 x Tesla M60 cards in an esxi 6.5 host. I have allocated the VM etc all fine but i have 2 issues there after.

  1. Once I go back to the Horizon connection server to allocate a new VM to a pool i get the "NVIDIA GRID VGPU support does not match desktop setting" intermittently. It seems random and i’m not sure what is happening but the newly configured VM’s don’t initially show up. I left it overnight and 1 VM then showed up ok.
    I created a new VM restarted the server this morning and it showed up. Is there anything I’m missing to get them to instantly appear? When it says unavailable even the ones i have successfully created don’t show up available…

  2. I have lost Esxi console access once the nvidia driver is installed (windows 7 machine). RDP OK.

OH and 3. i sometimes get a blank screen when using the horizon client to connect to a VM and blast extreme set?

thanks guys in advance…

Hi Peter

I’m more XenDesktop / XenApp than Horizon, but I’ll see if I can help …

Q1 & Q3 - Which version of Horizon / Horizon Client are you using?

Q2 - This is to be expected as a result of the using the NVIDIA driver, so don’t worry about it.

Something that may help, when installing VMTools at the start of the build, personally I don’t install the vSGA driver as the NVIDIA driver will supersede it once installed. Also, once the NVIDIA driver has been installed, after a reboot I go into Windows Device Manager and disable the other display adapter, leaving the NVIDIA adapter being the only choice, then I enable "Show Hidden Devices" and remove all Ghost adapters as well.

See if that works for you.

Regards

Ben

HI Ben, thanks for the quick reply.
Q1 and 3.
horizon client: 4.4.0 build-5171611
Horizon: 7.1.0 build-5170113

Q2 - I Will disable thanks.

I also have to tag on a 4th question (sorry) that i have unfortunately ran into!

Q4. I have 10 x Win 7 VM’s. I want to allocate 6 x VMs with a 4GB profile and 4 x VM’s with a 2GB profile (i have 2 x Tesla M60 GPUs in one Esxi Host). I have configured 4 x VM’s with the profile "grid_m60-4q". I have tried to go on and configure the next VM but it says "The a mount of graphics resource available in the parent resource pool is insufficient for the operation". I tried the profile "grid_m60-2q" with this attempt. I went out to lunch came back and was able to assign a 4q profile. I though OK ill stick it back on the 2q. It failed again.

Thanks…

As you’re running differing vGPU Profiles on the same Host, you’ll need to configure the VM deployment / GPU allocation differently. By default, GRID is configured for performance, so it will spread VMs on to as many different physical GPUs as possible. You need to change that so that it’s the opposite and will consolidate as many VMs on to the same GPU as possible.

In vCenter, select your GPU Host and navigate to:

Configure > Hardware > Graphics

Make sure you’re on the "Host Graphics" tab, and on the right click "Edit" then select "Group VMs on GPU until full (GPU Consolidation)".

I’ve attached an image of my settings …

Regards

Ben

Ben, Unfortunately i still get the same message. " The amount of graphics resource available in the parent resource pool is insufficient for the operation" when trying to power on the VM.

Were all VMs powered off before making that change?

Did you restart Xorg after making the change? Failing that, reboot your host, then double check the config change has persisted after reboot.

Use Putty and SSH to your ESX Host, run "nvidia-smi" and you’ll see where each VM is located on the M60s. If you start all your 2GB VMs first, they should all be on the same physical GPU.

Regards

Output below. Am i missing something based on my original specification plan above re the 10VM’s and allocation of resources?

Should i not see 8 outputs not 4?

±----------------------------------------------------------------------------+
| NVIDIA-SMI 367.92 Driver Version: 367.92 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M60 On | 0000:06:00.0 Off | Off |
| N/A 31C P8 24W / 150W | 4099MiB / 8191MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla M60 On | 0000:07:00.0 Off | Off |
| N/A 29C P8 24W / 150W | 4099MiB / 8191MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 2 Tesla M60 On | 0000:84:00.0 Off | Off |
| N/A 34C P8 25W / 150W | 4099MiB / 8191MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 3 Tesla M60 On | 0000:85:00.0 Off | Off |
| N/A 28C P8 24W / 150W | 4099MiB / 8191MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 697303 C+G VM1 4080MiB |
| 1 862898 C+G VM2 4080MiB |
| 2 872777 C+G VM3 4080MiB |
| 3 874792 C+G VM4 4080MiB |
±----------------------------------------------------------------------------+

the xorg failed to restart - i may need to reboot host.

That’s still set for performance. They should be on 2 GPUs, not 4.

You will need a reboot. Says on the top of the .jpg I attached, "Settings will take affect after restarting the "Xorg" service" :-)

Ben, Magic - thanks. Looking better after a reboot…ill report back once i config and spin up the others.

I’ll also see if the Q1 above is resolved after rebooting.

thanks for your speedy reply on this topic.
Peter

No worries mate, glad you can now make full use of the GPUs!

Keep us posted!

Regards

UPDATE: I have all the allocated resources as planned! :)
My issue 1 above still exists but i rebooted the connection after every VM installation and configuration was complete. It seems when i add anything new it does not appear automatically in the Pool. Anyway, once all are setup technically it’s not an issue. Maybe just a bug in the version which can hopefully be sorted.

±----------------------------------------------------------------------------+
| NVIDIA-SMI 367.92 Driver Version: 367.92 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M60 On | 0000:06:00.0 Off | Off |
| N/A 32C P8 25W / 150W | 8179MiB / 8191MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla M60 On | 0000:07:00.0 Off | Off |
| N/A 30C P8 24W / 150W | 8179MiB / 8191MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 2 Tesla M60 On | 0000:84:00.0 Off | Off |
| N/A 35C P8 26W / 150W | 8179MiB / 8191MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 3 Tesla M60 On | 0000:85:00.0 Off | Off |
| N/A 30C P8 24W / 150W | 8179MiB / 8191MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 69266 C+G VMPC001 4080MiB |
| 0 69267 C+G VMPC004 4080MiB |
| 1 69516 C+G VMPC003 4080MiB |
| 1 72818 C+G VMPC005 4080MiB |
| 2 70461 C+G VMPC006 2040MiB |
| 2 70574 C+G VMPC007 2040MiB |
| 2 73558 C+G VMPC002 2040MiB |
| 2 85313 C+G VMPC010 2040MiB |
| 3 79437 C+G VMPC008 4080MiB |
| 3 84319 C+G VMPC009 4080MiB |
±----------------------------------------------------------------------------+

Nice work.

I’ll do some more digging for you on Issue 1 tomorrow.

Just out of interest, what applications are you planning to run on them?

The versions you mention look up to date, but for sanity, can you double check you’re running the latest components from here: https://my.vmware.com/web/vmware/details?downloadGroup=VIEW-710-ENT&productId=641&rPId=16353

After that, can you have a look at this and see if it helps: View 7.1.0 - VGPU Desktop Pool Limit? - VMware Technology Network VMTN

And if neither of those options work, can you have a look at some of these and see if they help: NVIDIA GRID VGPU support does not match desktop setting - Google Search

Let me know how you get on

Regards

Morning Ben,

We are using it mainly for ArcGIS (including PRO). CAD, Trimble and a few others.

I seem to be all good for the versions. The client and Agent are fully up to date. I have not yet installed the direct agent but have the latest version and its on the to do list today. I have clicked on most of the links you sent above funnily yesterday. Thanks for sending them. All my VMWare tools etc are up to date.

This morning I am running into a few bugs…some Nvidia related some horizon

  1. I seem to be able to access the VM’s now via Esxi console…
  2. I cant launch the nvidia control panel or display settings on each VM? This is quite important as i need to register the license on the VM’s with my license server - it appears its not using the nvidia card?
  3. Horizon Client stalls and goes into non responding mode on log off of the VM’s.
  4. using blast extreme connecting to PC on 2 monitors gives a black screen.

Hopefully not anything else…

cheers
Peter

It’s definitely not utilizing the NVidia card anymore. Researching this now but would appreciate any feedback on what the issue may be.
Regards

update: i have resolved my nvidia display issues.I disabled all display drivers including the VMware SVGA one so that only nvidia was shown. ONce i enabled the Vmware SVGA adapter it has come back to life.

Hi

Sorry I’ve not been able to get on here, busy day!

That’s strange, disabling the vSGA should not impact the NVIDIA adapter, they’re completely separate. As said, I don’t even install the driver for it. I’ve taken a screen grab (attached) of my Display Adapter settings in Device Manager, this works without issue and I can run multiple screens (I regularly run up to 4) from it without issue.

On your questions …

Q1 - I seem to be able to access the VM’s now via Esxi console…

A1 - Strange, but Ok. Shouldn’t cause any issues. For reference, I have ESXi 6.5 in some of our deployments and can’t use the Console, "Remote Connections" from the vCenter Flash Client are fine though.

Q2 - I cant launch the nvidia control panel or display settings on each VM? This is quite important as i need to register the license on the VM’s with my license server.

A2 - Funny you should bring this up. I was chatting to one of the GRID Product Managers about this on Monday :-) Ideally you should have done this on your Master Image immediately after the GRID driver install so you don’t forget it, but definitely before you cloned it. That way, you only have to do it once. Also, when you’re ready to set this value, you need to log in with a different protocol to RDP (something like TightVNC) so that when you "Right Click" on the Desktop, the NVIDIA Control Panel is visible and opens when selected. You can then set the license server value.

However, a simpler way to do it, is just create a registry key and merge it with your Master Image before you create clones from it. Save it on a file-share so you can use it again with your next Master Image. Much much easier :-D

Advice -

If you have non-persistent VMs, don’t bother trying to do it with a GPO. The NVIDIA Display Driver Service starts before the GPO can be applied. So when you connect to your VM using Blast, HDX etc etc and open the NVIDIA Control Panel, there won’t be a value present and GRID won’t be licensed. If you check in the registry, the value will be there, but as said, the service starts before the GPO is applied so the value is not captured (and yes, that’s a Computer GPO, not User).

If you have persistent VMs, then you CAN use a GPO to set this value, however, note that the first time you start the VM, when you check this value in the NVIDIA Control Panel, for the reasons mentioned above, it will not be present. You will either have to reboot the VM so the service picks up the new value applied by the GPO, or manually restart the NVIDIA Display Driver Service. As the VMs are persistent, the change will be persistent. You’ll only have to do that once (for each VM).

It’s a bit of a faff and defeats the point in trying to apply it with a GPO, as most larger deployments are probably going to be non-persistent, so for this option, it’s easier to just manually add it to the Master Image before Cloning. Page 14 - 16 of this guide will help you create the registry key (it’s very simple). You can either enter the details manually into a .txt file then change it to a registry key and merge it, or just add the details directly into the registry of your master VM and then export the key, that way you know the settings are spot on.

The only 3 entries you really need are: "ServerAddress", "ServerPort" and "FeatureType". If you want to hide the licensing server name from the NVIDIA Control Panel, then set the "NvCplDisableManageLicensePage" value.

Actually, I’ll save you the hassle. Copy / paste the below bold text into a .txt file. Replace "YOUR_LIC_SERVER_FQDN" with your GRID license server FQDN and save it as a .reg, then just merge it on the Master VM. Licensing done :-)

[b]Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\NVIDIA Corporation\Global\GridLicensing]
"ServerAddress"="YOUR_LIC_SERVER_FQDN"
"ServerPort"="7070"
"FeatureType"=dword:00000001
"CurrentFeatureType"=dword:00000001[/b]

Q3 - Horizon Client stalls and goes into non responding mode on log off of the VM’s.

A3 - Not sure about this one.

Q4 - using blast extreme connecting to PC on 2 monitors gives a black screen.

A4 - Nearly all blackscreen issues that I can find in searches are related to the PCoIP connection protocol being used. Not sure why Blast does it.

I obviously have some troubleshooting knowledge gaps with Horizon as I’ve only ever tried Horizon once, and that was very briefly before deciding it wasn’t for me. Maybe someone who does use Horizon can offer some guidance …

Regards
Display Adapters.jpg

Morning,

The VM display via esxi issue stopped after i re-enabled that driver and the nvidia setup kicked in. So it went back to not being able to access - as expected.

I managed to log in via horizon to set the license to each VM.

We have gone a slightly different way for the VM’s right now. This is likely to change but we already had provisioned VM’s on our hosts and have linked this into horizon to use the same machines.

We will likely deploy via horizon in the future.

I very much appreciate the level of information you are providing. Thanks for the above also re the reg key!

We are making progress but still have a few lingering issues.

  1. the display is still blank using blast with 2 monitors. Resizing seems to fix this and i get a display OK.

  2. Access from outside is not working properly.Inside the network its fine. We have ensured all FW ports are set. We actually get presented with the VM’s from the Horizon app or the the web page HTML access. But when we try connect it always fails…

regards

Hi

No worries about the detail, glad it’s useful.

When you resize the displays, what resolution were they originally, and what resolution do you resize them to so they work? Sounds like a display memory issue / limit that’s being hit … I think you can adjust that in one of the management consoles. Does this help … This site is undergoing maintenance

Access from outside the network, yeah my first guess would be a Firewall Port(s) that’s not open. Don’t forget it uses TCP & UDP … Have a look at this to make sure nothing’s been missed:

And this:

Just out of interest, have you followed any Horizon / GRID deployment guides or are you just having a go and seeing if you can make it work first before resorting to the instructions? :-)

Oh, and those Apps you listed earlier, they should fly on this system! Long as your CPUs and storage are good as well, no worries!

Regards

HI Ben,

The display memory has helped. thank you.
Access from outside was resolved with 1 x firewall port .

I did read the guides yes. Funnily, I have found inconsistencies between guides from VMWare, Horizon configuration documents and Nvidia. I had to rely on some other sources to get information which you cant always guarantee is correct. Some of those can set you back but its all good learning. I will continue to optimize the setup and explore some of the features to get a better understanding but thats what its all about. We have had some good feedback from users today who were testing so i am pleased so far…