Hello,
I’m conducting an experiment where I run identical MMUL workloads on separate SMs using the Green Context API. Without enabling MPS, the two processes do not execute in parallel as expected. However, when I enable MPS, I encounter a limitation: it seems that the ability to specify the number of SMs via Green Context is no longer available. Instead, MPS takes over, and the CUDA_MPS_ACTIVE_THREAD_PERCENTAGE
environment variable appears to override the SM allocation.
Could you clarify if this behavior is expected, and is there a recommended way to maintain explicit SM allocation when using MPS alongside the Green Context API?