Performance difference P2000 vs. T4-2Q

Hello all,

in my company a HDX environment was set up for the purpose of a home office.
The NVIDIA T4 is used as a virtual GPU.
This is used in the configuration 2Q (T4-2Q), meaning eight users share one card.

Previously, we had desktop PCs that had an NVIDIA P2000.
My question is now, how far are performance losses here.
We employees notice that it runs much slower than before.
We work with very large assemblies.

Our IT tells me that the T4 is more performant than the P2000.
I think so too, but not if you divide it by 8 people.
So everyone has only 2GB VRAM available, with the P2000 it was 5GB VRAM.

The goal is to build a system that at least matches or exceeds the performance of the P2000. What would be necessary for this?

Thanks in advance for your help.

In general the performance should be fine. The “magic” of vGPU is the scheduler so that every VM gets 100% GPU for the timeslice of the rendering job. But you need to make sure that the FB is sufficient. Please check with tools like GPUProfiler that you don’t run out of FB which will result in swapping into sysmem and reduces the performance massively. Or did you already test with 4Q profile to see if it runs better?


Hi Simon,

Thank you for your quick reply.

I should have given some more information before.
We use the vGPU to run Siemen NX11 on it.
It often happens that eight users turn their large models at the same time.
This would mean that in this case only 12.5% of the CPU performance is available for each user.
Does this still correspond to the performance of a single P2000?

FB stands for frambuffer right?
I will pass on the tip to have IT take a closer look to see if this jumps into system memory.

I have already mentioned changing the T4 profile from T4-2Q to T4-4Q or even T4-8Q. Unfortunately, there are not enough machines available at the moment to test this. I think it would help, but since more and more have to go to the home office, the test is currently not possible.

I would have otherwise also thought that for our needs rather need an A40 than the T4.

Thanks in advance for the further help.

Thanks for the additional information. Which CPU are you using? Hopefully a CPU with >3Ghz Clock Speed? NX also requires a lot of CPU on the single thread.
You always have 100% GPU for each timeslice (1ms) interval, no matter if you run a single or multiple VMs on the same GPU. Even if you think they are doing the same tasks simultaneous it’s not the case in reality. I would rather assume that you run into FB (framebuffer) exhaust.
A40 would only help if you run bigger profiles. Best idea would be to add additional T4s but it is almost impossible to get one anymore.


Hi Simon,

the CPU used is an Intel Xeon Gold 6226R @2.9GHz.
So this could be the bottleneck among other things, besides the framebuffer.

What kind of CPU would you recommend, especially in terms of large assemblies? Just one that has more than 3GHz? Only 100MHz more will probably not be enough.

When I have the answer, I will pass this on to our IT and then get back to you here and report what came out.

By the way, 32GB are available as RAM.

Many thanks in advance.

Most customer try to have 3.2GHz but it is always a trade-off between number cores and clock speed. Therefore 2.9GHz is not that bad but most likely not comparable to the workstation they had before (3.6GHz or higher).
Yes, please check the FB usage as a starting point to discuss further options afterwards.