Memory consumption of ctxgfx.exe and wdm.exe

Ok, thanks for the additional information. So we can at least rule out FB exhaustion. But I recognized that it seems that you have ECC memory still active. I would strongly recommend to disable ECC memory for the T4s.
On the host you can simply do this with nvidia-smi -e 0 and reboot afterwards. This massively reduces the FB overhead and ECC is never needed for graphics.

We disabled ECC yesterday afternoon and rebooted, thank you for the tipp.

I found something interesting. Every test was performed with one Xencenter Console Session and one RDP session.

Test 1: vanilla Server 2019 PVS booted → Memory usage normal, commit charge roughly the used memory
Test 2: as Test 1 incl. VDA 1912 → -> Memory usage normal, commit charge roughly the used memory
Test 3: as Test 2 added nvidia driver 472.39 → Memory usage normal, commit charge doubled the used memory

Screenshot on the left Test 2 and on the right Test 3:

This is the basic behavior we see that is causing the problems on the production machines. Commit Charge far higher than used memory.

At the moment i have one production machine with 8 Users where used memory is at 13GB and commit charge is allready at 35.9/40Gb. Compared to the screenshot this is the same behavior we now see in the vanilla machine with installed vGPU Driver.

Is my assumption somewhat understandable?

Got it. Sounds like normal behavior as the OS needs to be able to page more data once the GPU is present. But why is this a problem? Commit charge doesn’t mean it’s allocated. You can still oversubscribe?! Or what do I miss? My understanding of commit charge:
snip
The total amount of virtual memory which Windows has promised could be backed by either physical memory or the page file
snip
As long as you don’t see a heavy increase in physical memory usage I don’t think this is an issue at all. Most likely, therefore nobody else ever claimed this as an issue.

I think you may only see this as an issue if you disabled the pagefile for the OS. Otherwise it would use the pagepage instead of physical memory for the commit charge (reservation).

Your definition is absolutly right. It does not happen in our case. When commit charge reaches 100% then browsers start chrashing, sessions show glitches, sessions start crashing, applications cant be started, etc.

This is a production worker from this morning with 8 Users, right before it starts to crash applications. Not even 14GB in Use, 16GB avail, 17GB in Standby. 32 Gig RAM, 8GB Pagefile. Still with the next user connecting or starting a new browser or application as soon as commit reaches 40/40GB things go bad.

Understood. Thanks for clarification. So I would summarize that adding the GPU requires at least more pagefile space to prevent issues from the beginning. But diskpace should be way cheaper than sysmem.

I have some production workers with 36GB and 8GB Pagefile, that can handle one or two more users. Your summary is correct, throwing more pagefile at the problem will somewhat prevent the issue. But wont this hurt performance?

Is it expected behavior to only be able to fit 8 Office workers on a server with 32GB and 8GB Pagefile?

sessions

I mean i could at least up the pagefile from 8 to 16 or even 24gb. But that sounds so exessive compared to what our workload was before vGPU. I just checked and 11 Users rarely crack 25GB commit charge on a old worker. I would need 20 - 30GB more (either in RAM or Pagefile) as overhead for the graphics card. If this is what is normal, then i can try it but it sounds so much more.

We tested over the past two days with more RAM and more Pagefile and are able to combat the issue.

We are thinking about overcommiting memory on the citrix hypervisor to maybe 64GB RAM on the Workers. When we look in the Xencenter we only see about 10 - 15GB active memory.

Hi,
sounds good. Exactly what I thought but I’m not sure if XS allows memory over commitment. At least they had the dynamic memory option which seems to be deprecated in 8.2 as I always get some errors on my machines where I have this still active.

regards
Simon