Tesla M10 - Server 2016 RDSH on ESXi 6.5 Host - Problem with dwm.exe

Hi together

I just ran into an issue which freaks me out.

Some users can’t log in correctly on the RDSH Host.
If they login, first there is a cmd window seen after this the screen goes grey after this, the session is automatically closed.

As an result of this, the user is “half logged in” user is shown in task manager but you can’t logoff them neither a task is open.

My deployment:

  • HPE GEN10 Server with actual firmwares
  • Nvidia GRID Tesla M10
  • ESXI 6.5 U1
  • Nvidia Host Driver: NVIDIA-VMware_ESXi_6.5_Host_Driver_384.111-1OEM.650.0.0.4598673
  • Server 2016 RDSH
  • Server 2016 Driver: 386.09_grid_win10_server2016_64bit_international

Eventlog is showing following two errors:
1.Der Desktopfenster-Manager-Prozess wurde beendet. (Prozessexitcode: 0x000000ff, Neustartzähler: 7, ID des primären Anzeigegeräts: NVIDIA GRID M10-0B)
| Source: Dwminit
| Event ID: 0

  1. Ausnahmecode: 0xc00001ad
    Fehleroffset: 0x00000000000f5956
    ID des fehlerhaften Prozesses: 0x404c
    Startzeit der fehlerhaften Anwendung: 0x01d3935419b6c0d1
    Pfad der fehlerhaften Anwendung: C:\Windows\system32\dwm.exe
    Pfad des fehlerhaften Moduls: C:\Windows\system32\dwmcore.dll
    | Source: Application Error
    | Event ID: 1000

Has anyone the same issue?

PS: No vmware horizon is used, there are about 20 user on the rdsh host.

Thanks!

Did you open a ESP ticket for this already? Sounds like a bug/regression.
Will try to repro if I find some spare time.

Regards

Simon

Hi Simon

Thanks for your answer.
No I didn’t created a ticket yet, but if I can’t resolve it by myself or by community power in the next few days I will create one.

Regards Dominic

Hi,

I just recognized you’re trying to use a M10-0B profile for RDSH? This is not supported nor working. You should use at least 4GB FB for this amount of users. And you need to use the A profile for RDSH. Just tested with M10-8A profile and I was able to start 35concurrent user sessions without issues.

Regards

Simon

Hi Simon

I just checked the user guide for vgpu M10-8A is intended for virtual application user with only 1 display head and max resolution of 1280x1024.

In our environment most users are using 2xFHD displays with direct session to a desktop not rdp-apps.

What do you recommended to use for this situation?

Do you think the wrong profile is causing my problem?

Thanks for your help!

Regards Dominic

Hi Dominic,

you need to read the documentation properly. The 1 display/1280er resolution doesn’t apply for RDS sessions.
snip
A-series virtual GPU types are targeted at virtual applications users.

A-series NVIDIA vGPUs support a single display at low resolution because they are intended to support remote application environments such as RDSH and Xenapp. In these environments, virtualized applications are typically rendered in an off-screen buffer. Therefore, the maximum resolution for the A-series NVIDIA vGPUs is independent of the maximum resolution of the display head.

Read more at: http://docs.nvidia.com/grid/5.0/grid-vgpu-user-guide/index.html#ixzz54vThj6rz
Follow us: @GPUComputing on Twitter | NVIDIA on Facebook
snip

For sure you can use 2xFullHD with RDSH. As you use RDSH you will need to use vApps licensing means you need a vApps license for every CCU on the host and yes you need to use the A profile. In this case not the wrong profile but the FB requirement is causing the issue.
0B profile is intended for Win7 VDI or Win10 with a single screen so you can imagine that this is not sufficient for 20 users running RDSH…

Regards

Simon

Hi Simon

I see, thats my fault.
I will give it a try in my test environment.

Thanks!

Regards Dominic

Hi Simon

I was able to reproduce the issue with the M10-0B profile on my test environment.
The problem occured when about 8-9 users were logged in.

Since I have changend the profile to M10-4A I can’t reproduce it anymore.

Thanks for your help.

Is there any rule of thumb for video memory per user?

Regards Dominic

Hi Dominic,

good to hear that it’s working as expected. Framebuffer requirements for RDSH is pretty hard to answer as it really depends on the workload but I would expect 100-200MB per user. My tests with 8A profile runs fine for >35 users. But if you run for example AutoCAD in RDS sessions the numbers might be totally different.
Best way would be to test with a number of users and your specific workload and check the FB usage with nvidia-smi or GPUProfiler as example and then you could calculate your max numbers.,

BTW: I’m always happy to get some real use numbers from the field…so if you test it in your environment and could share it that might help others…

Best regards

Simon

Hi Simon

Yesterday evening i configured the correct GPU profile (M10-8A) for the production terminal server.
Currently 15 Users are logged in and working.

nvidia-smi on the terminal server reports a memory usage of ~4325MiB.
In this environment i would say ~300MiB/user could be fine.

Used software on terminal server:

  • Main application - java-based
  • Office suite
  • chrome / firefox
  • some other smaller apps - e.g. cti client

I hope this helps you :-)

Regards Dominic

Thanks for the feedback!

Hi

I am having same issue with this running the card in vDGA mode passthrough. Anyone any idea howto solve this?

VMware 6.5
Tesla M10 card
RDS 2016 ~20 sessions
Windows drivers: 370.21_grid_win10_server2016_64bit_international.exe
No VIB driver installed.

Faulting application name: dwm.exe, version: 10.0.14393.0, time stamp: 0x578999ab
Faulting module name: dwmcore.dll, version: 10.0.14393.2248, time stamp: 0x5ae3f893
Exception code: 0xc00001ad
Fault offset: 0x00000000000f5990
Faulting process id: 0xc4a0
Faulting application start time: 0x01d4023546237d0d
Faulting application path: C:\Windows\system32\dwm.exe
Faulting module path: C:\Windows\system32\dwmcore.dll
Report Id: 53c2493e-0110-49c1-9fd2-0c2036e94f2f
Faulting package full name:
Faulting package-relative application ID:

The Desktop Window Manager process has exited. (Process exit code: 0x000000ff, Restart count: 1, Primary display device ID: NVIDIA Tesla M10)

Hi TTX,

you should use a newer driver that includes a fix for this issue. Please use latest GRID6.1 driver as example from R390 branch.

Regards

Simon

Where do I find this driver pack? When I login to Grid license page I see old version offered as latest.

https://nvidia.flexnetoperations.com/

Hi TTX,

do you have a valid license? Otherwise you won’t see the latest releases. You need vApps licenses for your use case, no matter if you run Passthrough only.

Regards

Simon

Hi Sschaber,

Thanks! I have found the location of the latest drivers on my license login site.
NVIDIA-GRID-vSphere-6.5-390.42-391.03.zip

I dont need to install the esx host vib driver right when running passthrough?

What are the affects if you run unlicensed passthrough?

Correct, no need for the host driver with Passthrough.
Without a license you are in "degraded mode" with limited resolution and 3fps.

Regards

Simon

i have same problem, here.