GPU-Accelerate >>one<< RDSH (Windows Server 2019) on VMWare Essentials Plus

Dear NVIDIA experts.

Since Windows Server 2016/2019 RDSH we are experiencing high CPU load with at many customers, with the same amount of users, where Server 2012R2 had no problems at all.
Trying out and talking a lot with others, we’ve run into the result, that 2016/2019 simply needs GPU performance

Now we want to accelerate NORMAL RDSH users (no CAD application, etc.) on ONE RDSH host with NVIDIA GPU performance at the minimum level of at least license costs (so NO GRID, NO VMWare Enterprise plus if possible).
As Hypervisor we run VMWare 7.0 (Essentials Plus license).
We know without GRID and VMWare Enterprise plus, we can only share the full GPU direct to the VM, and lose also functions like VMotion, Snapshot, etc.

We’ve checked the VMWare Compatibilty guide, and the
TESLA T4
would be supported in our Hardware/VMWare constellation with vDGA, so far. So we’ve already put a T4 to one of our VMWare hosts, and it’s recognized. But now we have some questions:

  • There are around 20 (or more) PCI devices, all called T4 in the server now, some let themselve configure for direct pass-through, some not, why is this?
  • Is it possible to run the TESLA T4 without GRID licenses at all, passing it through to ONE VM directly?
  • Which driver should we use to enable GPU performance on a Windows Server 2016/2019 with RDSH
  • Is it important to change the default graphic device to the T4 on that RDSH VM, to experience the better performance?
  • Are there any other settings, they should be enabled for RDSH optimization in combination with T4 GPU?

Hi

You’re using the wrong GPU.

If you want to use any Tesla GPU with Graphics, you’ll need to pay for that feature. If you don’t want to pay for any licenses, you need to use a Quadro GPU in Passthrough, then you can use the standard Quadro driver from the public website.

You need to configure the default Graphics adapter and you can do that with a GPO.

Regards

MG

Hi,

Sorry for that delay, didn’t got the notification.
So TESLA GPU is only possible with GRID implementation and licenses, right?

For our purpose (one 2019-RDSH, fixed to one VM 7.0 ESXi host), would you say the Quadro adapter is the better choice, if we just experience high CPU load, because 2016/2019 seems to expect GPU for a lot of things?
Would be a NVIDIA Quadro RTX4000 be usable WITHOUT GRID, so for our purposes?

Another thing, I’m confused: How can you change the default graphics adapter?
I just found the GPO, which tells RDP using the default graphics adapter, but not how to define a NVIDIA card as the default graphics adapter (instead of the VMWare one)

And a second question using GRID licenses:

Of course we would like to pay for the GRID licenses as they are not that expensive.
But is there actually any method, using the T4 with GRID WITHOUT a VMWare Enterprise license?

In the moment we just have VMWare vSphere 7 Essentials Plus licensed at the customer. And the really expensive seems to be the VMWare Enterprise license, not the GRID itself.

Publishing the T4 just to one VM, limited to one ESXI host wouldn’t be the problem for us. As well, not having VMotion available or that memory has to be taken the full amount as reserved.

Because upgrading the license to VMWare Enterprise is the really expensive topic here, i think.

Or what would you recommend?

Hi

Yes, the RTX4000 will be fine for initial testing, but it only has 8GB of framebuffer, this may be ok depending on your workload and user density, but it will be the first thing you max out.

You don’t need to explicitly define the Default Graphics Adapter, just configuring the GPO will be sufficient.

If you run the T4 in Passthrough, you should be able to use the vGPU driver with it and then license it accordingly (QvDWS / vApps / vCS) per CCU depending on your workload. As you’re not virtualizing the GPU, you won’t need (VMware) Enterprise Plus licensing. Obviously, you’ll then run in to all the usual limitations of not virtualising, but at least it should work.

Regards

MG

Hi Mr. Grid,

So in our case I think you would recommend using the T4, as there is a "cheaper" way to use it without VMWare Enterprise Plus, as you described, and in case it’s too less, we still have the option to use more complex GPU virtualization scenarios like cascading the T4 and use GPU virtualization with Enterprise plus license, right?

I’m a little bit afraid of the RTX4000, because you can find some informations of crashing DWM.exe, if there are more than 15-30 users on a RDSH, so it could be a one-way-street, with problems:

Or is there a third way, you could recommend to us in our case?

Hi,

We’ve successfully passthrough the T4 to the GRID driver in Windows Server over ESXi 7.0 without VMWare Enterprise license
GPU is responsable and GRID licensing is working.
Just a last question:

In the PCI Device passthrough-section of ESXI-host I have 32 device IDs from 1 T4 card.
First of all, I cannot select all of them for passthrough, but is it enough to select ONE device ID, and the whole GPU is addressed to my VM?
And how can I check, that the whole GPU performance is available on my VM?


Hi

If that link is the only source you have of a reported issue with the RTX4000, then I really wouldn’t worry about it. Besides, with only 8GB of FB and 30 users crammed on to it, the user you’ve linked to is probably running out of Framebuffer which is causing his issue. He states he tested with lower resolutions compared to his production ones. No idea why you’d test with one use case, then use a different one in production? …

For the T4, adding just one of them will be sufficient. You can test whether it’s working by using tools like GPUProfiler: https://github.com/JeremyMain/GPUProfiler/releases

Regards

MG

Hi,

Thank you, so checking one T4 hardware ID in the passthrough section is sufficient.

Regarding dwm.exe there are several issues reporting crashing dwm.exe with RTX4000 like also:

https://social.technet.microsoft.com/Forums/lync/en-US/6779b586-c158-491c-b76b-353d5a490642/server-2016-rds-connections-maxing-out-and-crashing-dwmexe?forum

In the meantime, ourself we experienced crashing DWM.exe, with T4 passthrough and GRID driver, when around 50-55 users are logged in to the server, and GPU memory of 15 GB is nearly used completely, that’s the time, when dwm.exe crashes.
Is there any possibility to avoid that? If we put a second T4 into the system, and passthorugh again, we can use both T4 and the double of memory, over the GRID driver, and should be able to avoid that crashing dwm.exe, or not?

Hi

You can’t put 2x GPUs in the same VM and split the load across the GPUs, that’s not how RDSH works. You’ll need another VM and run the additional T4 in Passthrough with that. Then you’ll need to load-balance the VMs so you get an even user / workload distribution.

Regards

MG

1 Like

Okay, so that won’t solve my problem.
What’s about that crashing dwm.exe? Is it actually connected to the size of the buffer, and if the buffer is full or not, or is it about something else, which won’t be solved with a SINGLE GPU with 32GB memory for example?
Offloading the user to two RDSH will be a solution of course too, but we normally would have just one…

Btw: Are you sure, RDSH is not supporting multiple GPUs?
Because at Microsoft they tell, loadbalancing from multiple GPUs presented to the OS is supported since Server 2019

Yes, I’m sure. Try it and see for yourself if you like :-)

Is 50 concurrent users on a single VM not enough? The whole idea of this is that you then have multiple VMs of the same spec running on the host and scale out across the physical host. With most modern servers supporting at least 6x T4s, if not more, that would be 300 concurrent users per host, assuming you didn’t run into CPU or Storage contention before hitting that number.

Regards

MG

2 Likes

I’ve built a lot of VM infrastructure for high density deployments in data center hosting environments. The amount of RAM and/or IOPS are always the first two things to run out of if unless you solve for both with providing ample amounts of RAM and IOPS with with SSD or NVME disks. You should be quite happy with 50 concurrent users on a single VM honestly. For high density, you can push it, but you need more physical resources to realistically accommodate that requirement. You’re trying to push Niagara Falls through a garden hose there.

Hello,

I have read this thread completelly and we are in the similiar situation, but in our case it is a little bit strange. We have replaced our server setup from 3x Quadro M4000 to 2x RTX A4500. After replacing the cards we have the issue that we can not have more then 16 user sessions, after the 16 user is starting to crash dwm. From my point of view it is a limitation by firnware??? The M4000 in our test can easy handle 22 Users, with 2x M4000 we have nearly double the users. With one RTX A45000 we are not able to get more then 8 users online before geting black screen and error message. When I compare the hardware between M4000 and A45000 it should not a ressource problem. Is there any settings to made in Windows or driver do get this running. Or is there any explamation why this is the case with the A4500?

We are running Windows Server 2022, I have also try to get directly support from Nvidia but no way for help which is really frustrating

  • 32 Cores CPU AMD Epyc
  • 256GB of RAM
  • 2x Nvidia RTX A5000
  • 2TB NVMe storage
  • Hyper-V
  • GPU DDA to one VM

We have created 30 Testuser and set the following groupe policies:

  1. Disabled UDP protocol only TCP
  2. Disabled WDDM driver
  3. Set physical graphics adapter to use for all RDP sessions

Many thanks in advanced!

Hi,

I don’t understand what you mean with “disabled WDDM driver”. WDDM is necessary for Windows to run properly.
There is definitely no restriction in terms of firmware but it looks like to be a resource issue. Unfortunately you are not using a DC GPU for this use case with vGPU licenses so this setup in not eligible for enterprise support.
Did you run the M4000 also on Win2022 for your comparison test? I have already seen less CCU density with 2022 compared to 2019 or 2016.
Keep in mind that there are a lot of dependencies like GPU channel count that might be relevant here.

Hello @sschaber,

thanks for your reply. I mean we have deactivated the windows groupe policy:


Use WDDM graphics display driver for Remote Desktop Connections

This policy setting lets you enable WDDM graphics display driver for Remote Desktop Connections.

If you enable or do not configure this policy setting, Remote Desktop Connections will use WDDM graphics display driver.

If you disable this policy setting, Remote Desktop Connections will NOT use WDDM graphics display driver. In this case, the Remote Desktop Connections will use XDDM graphics display driver.

For this change to take effect, you must restart Windows.

Your question regarding the M4000, we have built now an test lab with the same server and one VM we only replacing the GPU. The server VM and configuration is always the same. We have also logged on with the M4000 20 users and start different application inside the VM without any issue. We have repeat the tests with the following cards:

RTX A4500

Quadro P6000

Quadro M5000

Quadro M4000

Quadro M2000

Quadro K2200

All older GPU can handle more users as the A4500. We definitely not running in a hardware limitation issue from my point of?! Otherwise, GPUs with a lot of less resources should not able to run higher number of users. I know that the A4500 is not vGPU applicable, but after discussing with a Nvidia stuff about our project he mentioned is should be fine to use this card in DDA mode without need to user the RTX A5000 with vGPU as we want to use this card directly attached to only one VM.

I would never have thought that we would have such problems with the hardware equipment offered by the A4500 if we had previously used the P6000 and M4000. I thought we would get at least the same number of users, if not more.

With … dependencies like GPU channel count that might be relevant here. What do you mean exactly with that? On our board we are able to handle up to 40 PCIe lanes and we have 2x NVMe and the 2x GPUs as described. So I think PCIe lanes and bandwidth should be able to handle this

Thanks!

Please reenable the WDDM policy. I’m working with RDSH for around 20 years now and never disabled this policy.
For the GPU channel count: This is a pure GPU hardware topic and has nothing to do with PCI lanes or other server hardware.
See here for example:

This is a known issue for vGPU but applies for sure the same way for RDSH as each user on RDSH requires GPU channels and especially W2022 requires a massive amount of these channels.
In addition there is an architectural change in the channel handle between Ampere and pre-Ampere generations because of SRIOV. So this could explain why it worked fine in the past with older GPUs but could now cause an issue.
At this point this is just an assumption that you may hit the channel count limit. Without enterprise support and the option to work with our engineering it is pretty hard to prove if this would be the cause of your issue.