K120Q VDI Vsphere Desktops - failing to use NVIDIA adapter @ login

I’ve raised this at the vmWare NVIDIA community forums also
We’re deploying a vSphere 6/Horizon 6.2.0 solution using HP Gen9 Servers fitted with GRID K1 cards.
We’re experiencing an issue where the NVIDIA adapter is not always identified, i.e. the VDI falls back to the standard VMware VGA adapter instead.
The behaviour is strange but consistent:
Connect to View Server > Login > VDI uses VMware SVGA
Log Off > Login > VDI uses VMware SVGA
But…
Connect to View Server > Login > VDI uses VMware SVGA
Disconnect > Login > VDI uses NVIDIA k120q
Disconnect > Login > VDI uses NVIDIA k120q
Log Off > Login > VDI uses VMware SVGA

So… summarily, any login pursuant to either a reboot or log off, and the VDI uses the VMware SVGA adapter; whereas any login pursuant to a disconnect uses the NVIDIA.
Guest OS is Win 7 Professional x64

As things stand we would be asking end users to login, disconnect, then re-connect in order to make use of the NVIDIA adapter. Not a good look.
Thanks in advance for any input or ideas.

UPDATE
Having scanned through the Process Tree in ProcMon, and done another review of the Event log, I am at this point entirely bereft of ideas as to how to troubleshoot and resolve this.
Does anybody have any clues? I’ll settle for the smallest grain of hope at this point.
Also, does anyone know where we go for support on a case like this? Is it to NVIDIA or to VMware?
For reference, further details on the problem are here:
https://communities.vmware.com/message/2609971

UPDATE
I think this issue is a match with:
https://gridforums.nvidia.com/default/topic/777/nvidia-grid-vgpu/standard-vga-graphics-adapter-being-used-instead-of-nvidia-grid-k120q-vgpu/?offset=13#3176
If anyone has a solution, please let me know?
thanks
Robert

Because I could not find any option for "GRID" on the NVIDIA Support pages, I have now filed as a BUG using reference 160724-000143

Cross referencing this:
https://communities.vmware.com/message/2610200#2610200
Not sure if anyone would be able to comment on the driver releases?
thanks

Hi rellis,

Support calls for the K1 are not provided by NVIDIA, you need to raise an issue via the OEM/whoever you purchased the cards via e.g. server vendor DEll/Cisco etc and they can raise a case. This is the disadvantage of selling GPUs as "hardware" and in our newer M60/M60/M10 we have moved to a license model to provide direct support.

We will continue to do best will with K1/K2 customers but really your support contract is with the OEM who provided the card and you have to raise an issue with them.

I actually think your best call is to VMware as this sounds like a configuration issue as we have lots of customers wusing the same server and cards without issue. The new licensing model has funded an increase in support material for all custoemrs and you might want to search our knowledge base http://nvidia.custhelp.com/app/answers/list/st/5/kw/vsga/page/1 and use resources like the configuration guide video: http://nvidia.custhelp.com/app/answers/detail/a_id/4190/kw/vsga

Best wishes,
Rachel

There is a known issue with Win7 and it’s driver selection which I believe VMware View have workaround but Citrix have not (http://support.citrix.com/article/CTX201804).

As the problem is associated with the disconnect/login workflow it’s really likely to be either Microsoft (it’s the OS that picks the driver) and/or VMware stack (as that can influence and trigger the OS picking the driver).

The GPU is a resource below that and as such it’s generally not us dictating whether our driver is picked up.

Best wishes,
Rachel

Thanks for this Rachel.
Is anyone at your end able to offer any clarification around the K1 vGPU driver package that appears in the "old drivers" section of the NVIDIA web site with a release date of 2nd June 2016?
The release date of this driver appears to be later than the release date of the "current" driver; but the version numbers are lower.
thanks

I’ll ask the driver release team.

Rachel

Ok the latest driver is GRID 3.1 -

Version: 362.56 WHQL
Release Date: 2016.5.24
Operating System: VMware vSphere ESXi 6.0
Language: English (US)
File Size: 820.82 MB

The latest “maintainance” driver is 354.97 June 2, 2016. That’s the GRID 2.3 driver, i.e. a maintenance release on the GRID 2.x R352 driver branch. Date is later because it was indeed released after 3.1.

So these are two different drivers off different build trees. Where the maintainance one takes fixes and the newer branch also takes new functionality.

I can’t rule out a faint possibility that there is something in the way the driver presents itself that could prevent it being picked up but it’s very unlikely as we would have heard a lot more about it I’d have thought and technically it’s just the driver sitting there.

As the login workflow triggers a chance my best guess it is in the VMware code that works around the underlying fault in win 7 which randomly picks up the wrong driver. I would ask VMware if disabling the vSGA driver is appropriate.

Thank you for the information, Rachel.

VMware are continuing with the support case and we may consider installing the maintenance driver branch to see what difference, if any, that makes.

Just a note on Rachel’s post:

I typically do this on all my VMs with a GPU, regardless of Operating System. Custom install of VM Tools and do not install the display driver, then once the NVIDIA driver is installed, I have no use for the basic display adapter, so disable it. That way, I am only presenting 1 choice to the Operating System. We use XenApp / XenDesktop for our platforms, so unsure if VMware places any requirements on having the base adapter available.

I’ve not experienced any issues (performance or other) by doing this so far.

I’m not sure this is the answer you’re looking for, but I hope it helps in some way.

Regards

Thanks for this. I’m imagining that if we could start from scratch with a “clean” gold image, we might be experiencing a lot less pain all the way around. Unfortunately, we don’t readily have that option because we work exclusively with pools of persistent desktops. We are going to continue with the VMware support ticket effort for a few days longer, but if we don’t get a positive outcome and it becomes a choice between dumping NVIDIA and rebuilding all new desktops, we will dump NVIDIA. We really want the K1 to work, because it adds a real snappiness and sharpness to the user experience; but we can live without it. I’m not re-building all our desktops from scratch just to make it work.

Thanks anyway, for the additional advice. I do appreciate it.

I do suspect the underlying issue is the known Win7 issue, which thankfully Microsoft resolved in win 8.x and up. windows 7 is going away and migrations speeding up so hopefully this will long term be resolved :-/ https://support.microsoft.com/en-us/help/13853/windows-lifecycle-fact-sheet

I believe VMware workaround it by disabling the sVGA adapter so manually doing it should have no ill-effect (but you should check with VMware), the one effect it has under XenDesktop is to lower the console performance but I have a vague idea VMware doesn’t have that problem.

Really though I think the people with the code that controls this are VMware and that’s where the support issue is most likely to be resolved… if you have a ticket reference I can ping VMware support and ask them if they need assistance. What have you actually been told by VMware support?

Rachel

Hello Rachel,

Thanks for that. The VMware support case reference is #16195560707. If you think you may be able to somehow assist them, that would be great.

As yet we have not been told anything conclusive by VMware. They have analysed our PCoIP logs and determined that, "in the initial session the driver was not loaded to be detected by the PCOIP session. But when you re-launch the session the drivers were detected"

The actions we have taken under the support case under their guidance are listed here, point 23 onward:
https://communities.vmware.com/message/2610377#2610377

We have also received an initial response from NVidia following our Bug submission but that is in the very early stages. The last email notification I received stated that it was going to "Level 2".

Although I’m keeping an open mind, I’d be more convinced that this was a Microsoft issue were we not getting the display adapter to work at all. The fact that it seems to work flawlessly on a reconnect but not on a login is what makes me lean more toward this being a VMware/NVidia problem rather than a ‘Windows’ problem. But yea… always keeping an open mind.

Any assistance anyone at your end can offer will be gratefully acknowledged by us. It may be that events will overtake this issue going in to next week, but I would ideally like to get it resolved.

thanks
Robert

Hi Rachel,
FYI, this no longer carries any urgency. The VMware ticket is now being escalated to engineering and given the potential timescales involved, I have decided to drop GRID for our current deployment. I will consider rolling GRID out for our users as and when there is demand for ‘new’ desktops only.
I will keep the VMware ticket running and if any solution is ultimately forthcoming I will update this thread accordingly.
Writing this off as a bad experience and lesson learnt.
Thanks for your assistance, I appreciate it.
Robert