I am working on using MPS for running multiple TensorFlow inference jobs using the same GPU. For my use-case, when GPU is being shared by more than one processes, I sometimes need to prioritize execution of kernels of one process over the other. Is this supported?
To explain the problem in more detail, consider an example in which we have two processes, p1 and p2 (each with just one kernel execution stream) sharing a GPU.
Scenario: When there are one or more kernels in ready queue for both p1 and p2.
Default MPS behavior (My understanding):
If there is enough resources, execute multiple kernels at the same time from both p1 and p2.
Ability to decide based on priority if:
- Execute kernel of p1 first then p2.
- Execute kernel of p2 first then p1.
- Incase there is enough resources, execute multiple kernels at the same time from both p1 and p2.
If this kind of customized scheduling is not supported, It will be great if someone can guide what code changes will be needed to make it work.