How to use CUDA Green Context with MPS

Hello,
I’m conducting an experiment where I run identical MMUL workloads on separate SMs using the Green Context API. Without enabling MPS, the two processes do not execute in parallel as expected. However, when I enable MPS, I encounter a limitation: it seems that the ability to specify the number of SMs via Green Context is no longer available. Instead, MPS takes over, and the CUDA_MPS_ACTIVE_THREAD_PERCENTAGE environment variable appears to override the SM allocation.

Could you clarify if this behavior is expected, and is there a recommended way to maintain explicit SM allocation when using MPS alongside the Green Context API?

Yes this is expected behavior - you have to specify the SM affinity through either MPS’s dynamic active thread percentage or through the static partitioning of green contexts. If you want to maintain the static SM allocation while still taking advantage of not having the context switch between multiple processes, you can still use MPS + Green Contexts but you need to either unset or set to 100 the CUDA_MPS_ACTIVE_THREAD_PERCENTAGE variable.

Let me know if you have any further questions!