K2 vGPU on ESXi 6.0 and HV 6.2 driver 354.97 Win 10 x64 - Issues with random system lock ups.

mohb60 · July 29, 2016, 10:55pm

We have deployed a Horizon View setup on a dell R730 with a GRID k2. Our Win 10 Ent x64 pool is experiencing a few issues which make the user experience unreliable.

Summary of environment

ESXi version 6.0
Horizon View 6.2
Dell PE R730
NVIDIA K2; running 354.97 driver both on Host and vGPU
Client pool running Win 10 Ent 1511
WYSE P45 Zero Client running PCoIP - Terra 2 chip - 5.2 firmware from teradici
Dual monitor setup, 1920x1200

Issue 1.

Full screen video within a web browser (any web browser) freezes up within seconds of going full screen, the only way to restore it is to escape out of full screen. The work around is to disable hardware acceleration within the browser, but this defeats the purpose in investing in the GRID infrastructure.

Issue 2.

At least once a day at random the win 10 VDI client will experience a major failure of the graphics driver. This manifest itself at first as a lock up of the screen, the audio then fails followed by the session ending and unable to be reinitialized through the Zero Client menu. The workaround is to initiate a restart of the VM from the Zero Client menu which reboots the entire machine losing all unsaved work. It seems that the OS itself has not crashed as the reboot when initiated is a clean reboot of the OS not a hard reset of the Virtual hardware.

Can anyone please help us address this issue or point us to a known stable build of the GRID drivers?

RachelBerry · August 1, 2016, 2:25pm

Hi Mohb60,

I don’t know of any known issue like this and the drivers should be stable. The best thing you can do is to raise a support ticket because this is GRID 1.0 product (K2/K1 boards) you need to do this via the OEM who supplied the board (Dell in this case) and they in turn can escalate it into NVIDIA engineering if it’s a driver issue.

In the GRID 2.0 SUMS support is available so there’s a process to raise tickets directly with NVIDIA but in the older hardware sales model - I’m afraid you do need to go via the OEM who sold you the card.

Best wishes,
Rachel

mohb60 · August 4, 2016, 7:36pm

Hi Rachel,

Thanks for your feedback. Anyway, I have some good news, we managed to find a work around for issue #1 in our environment. From what I can tell, the video freezing up during full screen playback in a browser was due to the image quality tolerance settings. Counter-intuitively, it seems that setting the lower threshold too low on the image quality tolerance on the PCoIP client produces the issue. Setting the bar from 80% to perceptively lossless seems to be the best setting on our setup and no longer results in the screen locking up during full screen playback. I hope that this helps others with a similar setup and are facing the same issue.

We are still looking into issue #2.

mohb60 · September 8, 2016, 6:18am

This issue still persists in our environment. We have taken the following steps and concluded that the newer Nvidia drivers are unstable.

Our environment is the same as what is listed above.

To investigate, our initial step was to look into the logs to determine why VMs were randomly crashing and rebooting. The logs show very little, windows logs show that windows had recovered from an unclean shutdown after experiencing a failure of the VM’s OS. The VM logs show an error with approximately the same time stamp correlating the failure with what’s seen in the windows logs.

This instability was introduced into our environment after upgrading both the vib and vGPU drivers from 348.27 to the newer 354.97. This was confirmed by rolling back the vib on one of our hosts and creating a new identical vGPU pool with 348.27 drivers installed. The new pool has been stable.

Can anyone from Nvidia give us a reason for the instability in the newer drivers?

RachelBerry · September 8, 2016, 3:03pm

Thanks for the feedback very helpful - I’ll see what is known internally.
Rachel

Oletho · September 13, 2016, 6:45am

I have the same scenario at a customer as your problem #2.

Our problems began after upgrading from vGPU-346.68-348.27 to vGPU-361.45.09-362.56. Since then we tried vGPU-367.43-369.17 but no luck.

VMware has closed the case and pointed at Nvidia, and Nvidia are referring to HP who sold us the GRID K2. So far the only suggestion from HP was to upgrade the Proliant server BIOS. Sigh!

What a big disappointment. I have several customers who could benefit from this technology, but I am not touching it again before this is solved.

Oletho · September 13, 2016, 6:46am

Oh, forgot https://gridforums.nvidia.com/default/topic/974/how-to-get-support-problem-with-win10-k2-vgpu-and-view7/#3417

mohb60 · September 14, 2016, 10:19pm

I have an update from my end, I worked with VMware and have come to the conclusion that Issue #2 has to do with a particular scenario. Win 10 1511 VDI, running newer NVIDIA drivers and the presence of an emulated SATA CD-ROM and SATA controller on the guest VM. As an experiment I’ve tried removing the SATA CD-ROM and controller from the guest VM and ran pools with both the older and newer drivers both have proven to be stable.

@Oletho, try to remove your SATA CD-ROM and controller, perhaps this will work for you? I have been told that ESXi 6.0.0 Update 2 addresses the issue with the CD-ROM emulation crashing the VM. I have not had a chance to confirm this in my environment.

Oletho · September 20, 2016, 11:47am

I just saw your reply and will immediately set up with the customer to test the CD-ROM solution. Thanks a lot.

We are running Update2 with latest patches already.

Topic		Replies	Views
Horizon View 7 - Nvidia K2 Grid - BSOD on Windows10 desktop General Discussion horizon_vsga	2	11393	September 7, 2016
vGPU on VMware Horizon 6.1 zero-client disconnections NVIDIA Virtual GPU Technology	27	41111	October 10, 2015
How to get support? Problem with Win10, K2, vGPU and View7 General Discussion	5	6816	September 12, 2016
NVIDIA GRID VGPU support does not match desktop setting + Esxi console blank General Discussion	20	24374	June 15, 2017
VDA 7.11 & 7.12 Logon issues - Grid K2 XenDesktop	12	59150	March 2, 2017
NVIDIA Grid K2 session freeze XenDesktop	2	8116	February 8, 2017
vDGA Grid with VMWare ESX (no Horizon View) General Discussion	19	51827	September 9, 2016
Windows 10 vGPU GRID K1 black screen on 1st login General Discussion	3	10941	November 16, 2017
GRID K2 Horizon View + Zero Client on Dell Ultrasharp Monitor 3415w General Discussion	14	20324	May 20, 2016
W10 1809 Stability Issues - Driver update? XenDesktop	27	31416	December 3, 2019

K2 vGPU on ESXi 6.0 and HV 6.2 driver 354.97 Win 10 x64 - Issues with random system lock ups.

Related topics