Can Hyper-Q/MPS be used by processes from multiple virtual machines?

Hyper-Q is an excellent technique for maximizing GPU utilization. If I have multiple virtual machines offloading tasks to GPU, can they utilize hyper-Q, and how?

Thanks.

1 Like

The only current method by which a GPU may be accessed from a virtual machine for CUDA/Compute tasks, is via PCI passthrough. PCI passthrough places the GPU HW resource entirely inside the VM, and so it becomes invisible to other VMs

A VM cannot access a GPU that is external to the VM, for CUDA/compute tasks, currently. (This could change in the future.)

Thanks for your reply.

Another question about Hyper-Q:
In MPS (https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf), it states that dynamic parallelism cannot be used with MPS. Does that mean dynamic parallelism cannot be used together with only MPS or Hyper-Q even with a single process?

MPS and Hyper-Q are two separate, but related concepts. Hyper-Q is basically a hardware feature that is always available on any cc3.5+ GPU. MPS is a software entity, basically a multi-process software funnel to a single GPU.

dynamic parallelism can be used in an application that runs on a Hyper-Q enabled (i.e. cc 3.5+) GPU (when MPS is not involved). dynamic parallelism cannot be used in an application that will access a GPU via MPS.

Thanks! very helpful.

I have another question on Hyper-Q…

Under what situation will kernels concurrently execute on a machine?
For example, If I have a GPU with 8 multiprocessors, and a first kernel with 8 thread blocks,
and a second kernel with 2 thread blocks. Will the first kernel takes all the 8 multiprocessors even
the workload of each thread is very low?

Put it another way: If I want to concurrently execute the kernels, do I have to limit the number of thread blocks
of each kernel so that none of the kernels will fully occupy the GPU?

Thanks a lot.

There’s nothing magic here. When the momentary capacity of the GPU is filled up, new work will just be queued up, waiting for resources.

An SM can actually have multiple threadblocks “open” or “resident” on it, at any given point at time.

But in general, once a resource fills up, such as shared memory, registers, or threadblock execution slots (to name a few) then additional kernels will not run concurrently, if they depend on those limited resources.

Hello

Can MPS be used by multiple virtual machines in 2021? In other words, is this capability added for using multiple VM?

Can MPS provide full isolation when we have some containers that use MPS?

Thanks.

MPS can run within a single VM. Any clients within that VM can use/share that MPS instance. You cannot have a single MPS instance that is servicing 2 different virtual machines. The NVIDIA GPU technology to share a GPU among two different virtual machines is vGPU/vCS.

MPS running on volta and beyond provides inter-process isolation.

1 Like

Thanks so much for your answer.

I study nVIDIA documents about vGPU/vCS.

I have another question. I would be so grateful if you answer me.

Do multiple virtual machines share a GPU spatially or temporarily ( time sharing)?

Thanks in advance.

There may be some of both going on (certain resources e.g. video encode/decode units may be assigned to a particular VM, other resources may be temporally shared or time-sliced). Generally speaking, I think time-slicing is the mental model to use here. However, a GPU that provides MIG slices is essentially spatially shared. Also the vGPU scheduler has user-accessible controls which may affect the actual behavior of time-slicing. Beyond that, I probably won’t be able to answer further questions about the detailed design of vGPU sharing. You might wish to read the documentation. If your questions are not answered there, I probably wouldn’t be able to answer them.

Also note that for CUDA usage, only certain profiles are supported depending on the GPU. For some GPUs, CUDA is only supported in profiles that effectively assign the entire GPU to a single VM.

1 Like