vGPU on VMware Horizon 6.1 zero-client disconnections

Just began rolling out several vGPU enabled Horizon 6.1 desktop pools with Dell P25 PCoIP zero clients and dual-displays. Many users are now reporting frequent session disconnects, but it is very random. Screen goes black for a few seconds, then shows a message that you have been disconnected. This same error seems to be described in the latest NVIDIA driver bundle for vGPU on vSphere 6 release notes (http://www.nvidia.com/download/driverResults.aspx/85390/en-us) as known issue #59. Is this a fix that you see coming soon in the real near future, or will we need to continue telling our engineers and designers to just deal with it? Any help on a workaround would be helpful as well. Thank you.

In conjuction with VMware we believe we’ve now identified the cause of the issue and are working on releasing an update as soon as we can.

I don’t have a timeline at present but it is high priority.

Hi, I have the same issue. Do you have any news, how I can solve it or when the new driver will be available?
Thank you for an answer.

Yesterday we released a new set of passthrough drivers which should resolve this issue if your encountering it in Passthrough / vDGA.

The vGPU drivers will follow soon, due to being a 2 part package it goes through an extended QA cycle which is why it takes a little longer.

Release is close, watch the driver pages over the next week or so and as soon as I know the vGPU package is available I’ll update the thread.

Do you know where we can get those passthrough drivers?

The drivers are always available from the driver downloads page on the NVIDIA website.

It’s now more than two weeks and no new driver is availble. When is the new driver online? Its an very important point for me because the users can not work correctly on the horizon-view enwirement.

OK, new drivers are now available. It works now correct with multi-monitor-system.

ManuelS, did you log a bug or support call? If so you would have been notified.

The driver was released and made available for download Friday 10th July.

When updating your environment check the version of vGPU Manager you already have in your ESX hosts. If it is already 346.68 you will not need to update the vGPU manager, only the driver in the VM / Master image.

If you are running an earlier release in the host e.g. 346.42, then you need to update vGPU Manager in the host before updating your VM / Master image.

Hi Jason, some of my users reported these disconnects and black screens while starting a new session.
Even without multi-monitor-system and on Windows and OS X.
It works fine when the resolution is 1280x1024, but everything greater can result in a client crash.

I have updated my Master image with the latest driver 348.27 and vGPU Manager is 346.68.
We’re using NVIDIA Grid K1 (K120Q)

I’m getting these errors:
IMG_FRONTEND :configure_displays: 1 display(s) initially reported!
IMG_FRONTEND :configure_displays: (1) single display[0]: pos:(0, 0) w: 2560, h: 1600, bpp: 32
IMG_FRONTEND :Calling open display in Tera2 mode.
IMG_FRONTEND :sw_host_get_display_modes: - warning - primary display not found

As described in Issue #59, pcoip server also crashes on the vm.

Any solutions for this problem?

Thank you
Patrick

We are seeing the exact same issue as Patrick is having.

We just upgraded vGPU Manager vib to 346.68 and installed driver 348.27 on VM/parent (Win 7) image. This pair of drivers resolved one known issue and now running into this issue. What is going on??? This is causing delay to our VDI project.

What’s the workaround? Force single monitor and keep the screen resolution to 1280x1024 and below?

Please resolve this. Thanks.

The Nvidia driver release resolved the disconnect issues that were due to the NvFBC api being crashed by the mouse.

If you’re still experiencing disconnects please log a support call with VMware detailing the reproduction steps.

Cheers

VMware support request sent me to Nvidia, now after lots of questions and log files I’m back with VMware.
Still something wrong with NvFBC:

1.189 | NvFBC | PID 4700 ## : CheckIfCanCreateD3D :: IDirect3D9::CreateDevice() for adapter ordinal 1 failed with error 0x8876086c

1.190 | NvFBC | PID 4700 ## : CheckIfCanCreateD3D :: Error (NVFBC_ERROR_GENERIC). Could not Create DX9 Device…

I hope to get some answers soon as we are now having disconnects even with single monitor and 1280x1024 almost constantly…

Thanks for the update Patrick. Please keep us updated if you get any new info. I have a SR open with VMware for this issue, but I haven’t heard much from them. They must be all at the VMworld conference.

Have you logged a support query for this? If not you need to do this asap.

Ok, I got it resolved and you may want to try this as well. When I installed 346.68 VIB driver on a host with K1, I’d installed on top of the previous version. You need to fresh install 346.58 VIB.

In summary:
uninstall the existing VIB from the ESXi host > reboot the host > Install 346.68 VIB driver on the host > reboot the host again. Voila!

You can find the exact command for uninstall in grid-vgpu-deployment-guide.pdf

  • They really should mention this on the "346.68-348.27-nvidia-grid-quick-start-guide.pdf"

Thank you joeVM for your input. I will test it as soon as possible.
VMware support is still asking me for more log bundles and so far without a solution…

Thanks for sharing this.

It really shouldn’t be necessary to uninstall the .vib, but we’ll feed this back into QA to make sure they test upgrades as well as remove / install.

Thanks Jason for your comment.

That’s what I thought and this really stumped me. The ESXi host (K1) was put in the VMware maintenance mode before the upgrade and the verification commands all showed the driver version as 346.68 after the upgrade, and then the host was rebooted.

I reinstalled the driver on every host but with no luck.
First time on clean vSphere6 hosts with VUM. Now like it’s suggested in the quick start guide.

Back to collecting log files I guess ;-)