Can Hyper-Q/MPS be used by processes from multiple virtual machines?

Hyper-Q is an excellent technique for maximizing GPU utilization. If I have multiple virtual machines offloading tasks to GPU, can they utilize hyper-Q, and how?


1 Like

The only current method by which a GPU may be accessed from a virtual machine for CUDA/Compute tasks, is via PCI passthrough. PCI passthrough places the GPU HW resource entirely inside the VM, and so it becomes invisible to other VMs

A VM cannot access a GPU that is external to the VM, for CUDA/compute tasks, currently. (This could change in the future.)

Thanks for your reply.

Another question about Hyper-Q:
In MPS (, it states that dynamic parallelism cannot be used with MPS. Does that mean dynamic parallelism cannot be used together with only MPS or Hyper-Q even with a single process?

MPS and Hyper-Q are two separate, but related concepts. Hyper-Q is basically a hardware feature that is always available on any cc3.5+ GPU. MPS is a software entity, basically a multi-process software funnel to a single GPU.

dynamic parallelism can be used in an application that runs on a Hyper-Q enabled (i.e. cc 3.5+) GPU (when MPS is not involved). dynamic parallelism cannot be used in an application that will access a GPU via MPS.

Thanks! very helpful.

I have another question on Hyper-Q…

Under what situation will kernels concurrently execute on a machine?
For example, If I have a GPU with 8 multiprocessors, and a first kernel with 8 thread blocks,
and a second kernel with 2 thread blocks. Will the first kernel takes all the 8 multiprocessors even
the workload of each thread is very low?

Put it another way: If I want to concurrently execute the kernels, do I have to limit the number of thread blocks
of each kernel so that none of the kernels will fully occupy the GPU?

Thanks a lot.

There’s nothing magic here. When the momentary capacity of the GPU is filled up, new work will just be queued up, waiting for resources.

An SM can actually have multiple threadblocks “open” or “resident” on it, at any given point at time.

But in general, once a resource fills up, such as shared memory, registers, or threadblock execution slots (to name a few) then additional kernels will not run concurrently, if they depend on those limited resources.


Can MPS be used by multiple virtual machines in 2021? In other words, is this capability added for using multiple VM?

Can MPS provide full isolation when we have some containers that use MPS?


MPS can run within a single VM. Any clients within that VM can use/share that MPS instance. You cannot have a single MPS instance that is servicing 2 different virtual machines. The NVIDIA GPU technology to share a GPU among two different virtual machines is vGPU/vCS.

MPS running on volta and beyond provides inter-process isolation.

1 Like

Thanks so much for your answer.

I study nVIDIA documents about vGPU/vCS.

I have another question. I would be so grateful if you answer me.

Do multiple virtual machines share a GPU spatially or temporarily ( time sharing)?

Thanks in advance.

There may be some of both going on (certain resources e.g. video encode/decode units may be assigned to a particular VM, other resources may be temporally shared or time-sliced). Generally speaking, I think time-slicing is the mental model to use here. However, a GPU that provides MIG slices is essentially spatially shared. Also the vGPU scheduler has user-accessible controls which may affect the actual behavior of time-slicing. Beyond that, I probably won’t be able to answer further questions about the detailed design of vGPU sharing. You might wish to read the documentation. If your questions are not answered there, I probably wouldn’t be able to answer them.

Also note that for CUDA usage, only certain profiles are supported depending on the GPU. For some GPUs, CUDA is only supported in profiles that effectively assign the entire GPU to a single VM.

1 Like

Continuing the discussion from Can Hyper-Q/MPS be used by processes from multiple virtual machines?:

Happy new year.

As I understand multiple processes can share a gpu spatially by mps. But I have some more questions.
1-Can multiple containers share the gpu spatially? How about temporarily?
2-Can processes from multiple containers share the gpu spatially? How about temporarily?
3-How long does context switch take in any of these cases?

Thanks so much

virtual machines and containers are not the same thing.
this may be of interest.

Yes, with MPS.

Yes, without MPS.

Answers are same as previous 2 answers.

Not documented.

I won’t be able to give recipes or examples, other than the link I indicated above.

Thanks for your reply.

But according to the “multi-process-service” document, it seems 2 users can’t use GPU cores simultaneously.
In ‘provisioning sequence’ section, Alice and Bob can’t use GPU concurrently.
On the other hand, in ‘limitations’ section, it has been written “…, leading to serialized exclusive access of the GPU between users …”

Could you please tell me under what circumstances the above statements are true?

Can multile containers belonging to 2 different users share the GPU and run concurrently?

Thanks in advance.

This is the first time that you have mentioned multiple users. That is correct, an MPS server has an association to a particular user.

1 Like

One another question:

When multiple containers of one user share the gpu spacially with MPS, can we migrate one of these containers? (There is no dependency between these containers)

Thank you for your help

I’m not sure what “migrate” means. Are you referring to something like vmware vmotion?


Thank you

Your question doesn’t make sense (to me), then. vmware vmotion implies that you are using VMs. I can think of two possible configurations:

  1. each VM has one container, and you want to do vmotion on that VM. In this case, “multiple containers of one user share the gpu spacially with MPS” is not possible. MPS cannot be used outside the VM, to service requests coming from within one or more VMs. Therefore MPS would have to be inside the VM, and in that case you would not have multiple containers of one user sharing a GPU.

  2. you have a VM with multiple containers. In that case, MPS could be running in that VM, and sharing a GPU that has already been passed into that VM, across those containers. In that case, doing vmotion on that VM would move all the containers, everything associated with that VM. You could not “migrate one of these containers”. You would have to migrate them all.

1 Like