Physical GPU shared between user/license types


We are in the final stage of preparing a quote for a large virtual desktop environment that will make heavy use of GRID.
This environment involves close to 90 branch locations, and we expect the transition to move in phases. Our proposal involves hosting the virtual desktops at the branch locations as well as some in our datacenter.
We are building the pricing model now, and because of the need for 100’s of high-end users we have many nodes configured for only 2 users per physical GPU (M-60).
This presents a problem with pricing if a branch office rolls out VMware Horizon and we have an odd number of high-end users. We would essentially have a single high-end user with a node dedicated to them, unless we could also place several standard users on the same physical GPU.
I found a PDF titled nvidia-grid-vgpu-user-guide which contains the statement on page 51 that says “Note: Due to vGPU’s requirement that only one type of vGPU can run on a physical GPU at any given time, not all physical GPUs may be available to host the vGPU type required by the new VM.”

Not sure what he question is…

Looks like you’re running everyone with M60-4Q profiles (2 per GPU, so 4 per board, 8 per server)

You’re aware of homogenous vGPU profiles, and may have an odd number of users in a location, but I don’t see the question your asking.

Can you clarify?

If we put out pricing for this customer based on 2 high end users per physical GPU, and they expand by an odd number of users we are paying for hardware that the customer isn’t paying for. Unless we can put other types of users onto that same physical GPU.
This is our first go-round with VMware Horizon and GRID, and we’re trying to put pricing together that takes these kinds of situations into account.
They have over 80 branch offices, and our proposal puts much of the Virtual desktop footprint at each office, so it is VERY likely they could have odd numbers of users at these offices. So if we pay for hardware to support 8 high end users and they only have 7 at let’s say 50 locations, that is a tremendous amount or resources going to waste if we can’t mix the type of user per physical GPU.
I hope that clarifies it better.

I found this document
but it only talks about K series cards which I was told are End of Life. We’re building our proposal using M60 cards.
The link above was from this nVidia page.

You cannot mix vGPU profile types on the physical GPU, but that doesn’t exclude another user with the same vGPU profile being run on the physical GPU.

It looks like you’re setting up as a Cloud Service Provider, or at least operating in that manner as you’re owning the hardware, though you’re referring to the virtual desktops being at the branch sites, which suggests you’re not hosting them centrally but shipping servers to these location?

A centralised architecture would help tp minimise the issues you’re describing.

We will be releasing a Service Provider licensing model in the future, and I would strongly recommend taking a look at the NPN Partner program, as this will connect you into the various program options available for the licensing.


Just to be sure you’re talking in the same terminology as Jason, each M60 board has 2x physical GPUs on it (8GB each). So where you say 2x users per physical GPU, (as Jason states above) that would indicate you intend to give each user a 4Q profile to fully populate the M60, so 4x users per M60 board.

The M60 can run different profiles between the GPUs (not on the GPU). So on a single M60, one GPU could be running a full 8Q profile or Passthrough, and the other GPU could be running multiple 4Q, 2Q, 1Q or 0Q profiles depending on the application requirements.

The document you mention above ( is correct and fully up to date. The bottom of page 4 / top of page 5 is what you’re looking for. It even lists the brand new M10 (which has 4 GPUs per board) so may be an option on some servers where you are looking for more density with less performance.

Where you intend to allocate different GPU profiles on the M60, be careful you don’t run out of CPU resource. If you’re allocating large GPU profiles to high-end users, you’ll usually want a nice helping of un-contended CPU to go with it. If that same CPU is also being contended for by multiple low-end users, you may run into contention issues giving a less than optimal experience. For example, with an 8Q profile on one GPU and 0Q profile on the second GPU, that’s potentially 16 low-end users contending for the same CPU resource as the single high-end user… Just something to consider.



There is nothing to stop you adding lots of non-vGPU enabled VMs to hoover up any spare CPU/RAM capacity on the server.

If you have a server provisioned with 4Q profiles and only one is used the GPU compute time will be reallocated to the sole user so you’ll have the GPU compute time of an 8Q.

It is an interesting request and one I will highlight to product management. It isn’t one that I’ve come across as a significant issue as I think as solutions scale whilst if you have a lot of servers not using half a GPU on one of them becomes noise for many.

Best wishes,

Thanks all, BJones, I was using the same terminology as Jason, as far as my understanding with what he was saying.
Like I said, our model seems to indicate that we’ll need to put the servers at the office locations based on this customer’s decision to move from MPLS to IPsec for their connectivity. The numbers just don’t pan out when they’re going to try to stream several high-end systems from a datacenter as well as several low and mid-range systems. So in our case the more granular the solution the better. Because our pricing model has to take into account our full cost of hardware and not what the customer is actually consuming.
So if I have 5 M60-2Q users I am fully using 1 physical GPU on the card, and only 25% of the other GPU on the card and I have to price that based on the total cost of hardware. Then if they open another office with 8 M60-2Q users which will fully use the entire card and we give them the same price…
The other thing I’m trying to figure out now is the amount of overhead created by Blast Extreme. If it offloads to the GPU, how much GPU processing do I really have left on the card per profile used? If I’m using M60-0B profiles compared to M60-2Q profiles?
This deployment would end up being for 3,000 GPU-enabled desktops, so the math is EXTREMELY important. Our initial proposal is for a 250 user standup.

Yes. The more users you add on to the same hardware, the cheaper it is per seat. If you really want to get the maximum out of the hardware, use an RDS model and assign a GPU to the server. That way, you’re not bound by the limitation of GPU profiles.

The M60, M6 and M10 have dedicated H.264 encoders to handle this process, so the graphics processing won’t be affected by the encode process. Specifically, the M60 has the following: 2 GPUs x 18 H.264 streams per GPU = 36 simultaneous 1080p30 H.264 streams.



Thanks BJones. I’m not sure if RDS will work for us. This customer is an engineering/architectural firm and the majority of their users require GPU acceleration. I’m afraid the RDS model with shared GPU would not come close to giving a satisfactory experience for them.
Your 2nd point is fully understood, and I appreciate the input.

No problem, happy to make suggestions if it helps, even if it’s just to rule things out.

Regarding your client, you mention Architectural / Engineering, would they be using Autodesk or Dassault suites? Or something else?

Here is some of their software throughout the company.

Adobe CS
ArcGIS Desktop 10
AutoCAD (various products and versions)
Autodesk 3ds Max
AutoTURN 8 and 9
Blender 2
Google Earth
Microsoft Office 2013
Revit (Autodesk)


Great, thanks.

I touched on it earlier, but some of those applications are more CPU intensive than GPU. Revit and AutoCAD in particular both require fast CPU (3.0Ghz+), and also quite a few cores depending on what they’re being used for, Revit especially. Blender can also require quite a few cores for production environments. I’ve seen these applications (AutoCAD / Revit) perform quite poorly on systems where the CPU has been under spec’d. Sometimes, designers will substitute clock speed for cores to gain density, which sometimes is fine, but when the application relies on clock speed for performance, it’s obviously not great.

I fully understand your apprehension with delivering some of these applications on RDS, however, I’ve seen some really good results by doing this, plus it solves your GPU profile issue and quite a few of those applications are supported on RDS. If you have the time, and have not already done so, I would strongly recommend evaluating some of these applications for yourself before hand to fully understand how they perform and understand how they perform on contended resources.

Careful with SketchUp, I don’t believe that’s supported on a virtual platform yet.



Thanks for all the info. Nutanix is our hardware partner/supplier, and they used E5-2680v4 processor in the proposal. (not my choice, as you said usually you want higher clocks)
The good thing is that this is for the initial rollout of 250 desktops, and we have the flexibility of altering the technology for the remaining 2750 desktops as they are needed. Our current VDI customers are all running on Parallels Virtuozzo Containers.

No worries, as said, happy to help :-)

E5-2680v4… Echoing my point exactly ;-)

Although they’re Broadwell, you can’t get away from that base clock speed. And with multi-core applications, Turboboost won’t help because of the way it works.

I’m glad you have the option of altering the specs at a later date to suit, I think that will be very beneficial.

Let us know how you get on.



Two items to keep in mind with Blast Extreme using H.264 offload is that this currently only works with single monitor clients. CAD users are typically two or more monitors. Also, you need to be careful that H.264 encoding provides the level of image quality expected by users of these types of applications (see:

Wow, I didn’t know Blast Extreme only works with single monitor clients. I guess they hide that info in all the marketing fluff about that protocol.
One more thing to go back and look at I guess.

Some details of blast extreme were covered in a recent webinar, includign the single monitor limitation:

Worth a read and finding the webinar recording. The VMware NVIDIA forum is a good place to ask such questions and highlight your need for > 1 monitor!


First a clarification. Blast Extreme does work with multiple monitors except when an NVIDIA GRID GPU is present and using the NVENC H.264 encoder to offload the CPU-heavy H.264 encoding. This is a VMware restriction, not an NVIDIA restriction, so it will probably go away some day. However, VMware confirmed at VMworld in their public session that this restriction will not be fixed in the upcoming release that is coming out "soon".

First a clarification. Blast Extreme does work with multiple monitors except when an NVIDIA GRID GPU is present and using the NVENC H.264 encoder to offload the CPU-heavy H.264 encoding. This is a VMware restriction, not an NVIDIA restriction, so it will probably go away some day. However, VMware confirmed at VMworld in their public session that this restriction will not be fixed in the upcoming release that is coming out "soon".