Fine grained Kernel scheduling with MPS

I am working on using MPS for running multiple TensorFlow inference jobs using the same GPU. For my use-case, when GPU is being shared by more than one processes, I sometimes need to prioritize execution of kernels of one process over the other. Is this supported?

To explain the problem in more detail, consider an example in which we have two processes, p1 and p2 (each with just one kernel execution stream) sharing a GPU.

Scenario: When there are one or more kernels in ready queue for both p1 and p2.
Default MPS behavior (My understanding):
If there is enough resources, execute multiple kernels at the same time from both p1 and p2.

Desired behavior:
Ability to decide based on priority if:

  1. Execute kernel of p1 first then p2.
  2. Execute kernel of p2 first then p1.
  3. Incase there is enough resources, execute multiple kernels at the same time from both p1 and p2.

If this kind of customized scheduling is not supported, It will be great if someone can guide what code changes will be needed to make it work.


MPS doesn’t support priority assignments to kernels or processes. MPS allows a limited amount of resource reservation, for that I suggest reading the MPS doc or this.

Thanks for the response!

Is MPS open sourced? Can I modify MPS to add this functionality?
In case it’s not open sourced, is there a way to add a wrapper on top to enable this functionality?



Nothing comes to mind. Others may have ideas.

1 Like

Does stream priority work across multiple processes? CUDA Runtime API :: CUDA Toolkit Documentation

No. That priority is only applied within a process.

I have another follow up question. If a process, p1 is running on a 4-GPU machine and is using all the 4 GPUs, but I want to limit the thread usage i.e. CUDA_MPS_ACTIVE_THREAD_PERCENTAGE for p1 for one of the GPUs instead of all of the GPUs. Is this supported ?