Horizon View with vGPU: GPU Resources Not Assigned

Taskman · April 27, 2016, 3:49pm

We have been running vSGA for a couple years off of Nvidia K1s. Upgraded to Horizon View 6.2 and testing vGPU profiles. The initial testing went very well but once I scaled my pool out several VMs failed customization. However, no error occurred in Horizon, they just remained in a customization status.

If I forced a power reset and then responded to windows recovery to boot normally, it would usually continue customization and finish. The best part is because it never boots, I can’t VNC into it as the console is disabled with the vGPU K100 attached to the VM. It is very odd behavior and I have an open ticket with VMware.

As I dug through the logs, I found this interesting item in the vmware.log for the VM:

2016-04-27T01:44:09.329Z| mks| W110: GLWindow: Unable to reserve host GPU resources
2016-04-27T01:44:09.339Z| vmx| I120: [msg.mks.noGPUResourceFallback] Hardware GPU resources are not available. The virtual machine will use software rendering.

If you look at the work around, where I power reset the VM and it eventually works. It seems like there is a failure during power on for some VMs to get assigned a GPU core on the K1s. I haven’t found ANYTHING online referring to this issue. I’ll keep this post updated.

System Environment

Super Micro

Dual K1s

ESXi 6.0 U2

Nvidia VIB 361.40

Windows Nvidia 362.13

JasonSouthernNV · April 27, 2016, 4:11pm

Can you check the vBIOS of the K1 cards installed and ensure they’re at the latest version.

You may need to request this update from SuperMicro.

Also, why use K100?

K120Q is a better choice, more Graphics Memory and exactly the same density as each GPU only supports a maximum of 8 vGPU sessions ( so that’s 32 on a K1).

Taskman · April 27, 2016, 5:53pm

Thanks Jason, I contacted Super Micro but they are not aware of any "authorized" BIOS updates for the Nvidia GRID cards. I also tried looking online but didn’t find a BIOS version history for the GRID cards.

Running the nvidia-smi command, it reports that they are running:

VBIOS Version : 80.07.BE.00.04
MultiGPU Board : Yes
Board ID : 0x8300
GPU Part Number : 900-52401-0020-000
Inforom Version
Image Version : 2401.0502.00.02

Do you know where I could find that info? Also, regarding the K100 choice. I agree, we just wanted to test the K100 and K120Q separately to understand the performance gains on applications being used. I plan to go K120Q for production since we get the same user density.

Thanks for the quick response!

JasonSouthernNV · April 28, 2016, 10:25am

You’re on the latest VBIOS so no update required.

I would avoid the K100 profile, it’s only there for legacy support and I would recommend all new projects / deployments to not use it.

Out of interest, how many VM’s do you have in the pool you’re creating, and hown many K1’s are available in those hosts?

Taskman · April 29, 2016, 1:41am

Thanks for checking on the vBIOS.

I will test out the K120Q then and report back. Regarding the pool size, it was planned to be 55 VMs with the target host having two K1s. A second host with two K1s would be a standby in case of host failure in the cluster (I know vmotion isn’t supported).

JasonSouthernNV · April 29, 2016, 1:05pm

What are the pool settings?

Taskman · April 29, 2016, 9:17pm

We are good to go, the K100 was the issue. Once I switched over to the K120q and tested re-provisioning, all VMs came up normally. We had users on the new vGPU profile today without issues.

I think Nvidia should drop the K100 from their deployment documentation as it definitely impacted us. I realize the K100 and K120q have the same user density but a good POC means you test up in complexity. I would have avoided the K100 if it was marked as legacy.

Thanks Jason for being very responsive and informative! That was the exact info I needed to help root cause the issue.

JasonSouthernNV · April 29, 2016, 11:15pm

Good to know it’s resolved!

I’ll raise the point about dropping the K100. There are some reasons for it to persist, but we can always ask…

Topic		Replies	Views
vGPU Utilization Per VM NVIDIA Virtual GPU Technology	22	39444	August 25, 2016
GPU profile assignment General Discussion	4	8049	June 8, 2016
K2 vGPU help NVIDIA Virtual GPU Technology	5	12877	May 17, 2016
Issues starting VMs with K220Q and K240Q in the same server General Discussion	3	6174	November 18, 2015
NVIDIA GRID VGPU support does not match desktop setting + Esxi console blank General Discussion	20	24352	June 15, 2017
Putting best foot forward General Discussion	3	1975	February 20, 2020
GPU card, licenses, technology (vDGA etc) for Horizon View General Discussion	5	1936	February 8, 2024
GRID K1 with horizon view 6.01 not loading all vm's General Discussion horizon_vsga	3	12459	December 10, 2015
HPE Apollo Xl190r Gen10 + ESXi 6.7 U3 + 2x Tesla V100 = Virtual Machines Crashing NVIDIA Virtual GPU Drivers	6	2688	February 10, 2021
Vmware HA fails with vGPU configured on VM NVIDIA Virtual GPU Technology	2	8952	July 8, 2016

Horizon View with vGPU: GPU Resources Not Assigned

Related topics