Unaccurate # SMs when using MPS

Hello,
I’m recently studying changes of # SMs when using Multi-Process-Service.
As far as I know, # SMs change according to MPS percentage.
However, It does not change as I thought and seems to have specific rule.

For example, I’m using RTX 5060ti which has 36 SMs in total. I ran the test MPS percentage[60%]
AFAIK, the # SMs should be 22 or 21 since 36 * 0.6[60%] is 21.6. However, Nsight compute tells me that # SMs is 20, which is not what i expected.

Also, when i run the test with MPS percentage 20%, # of SM gets rounded “up” and becomes 8(36 * 0.2 = 7.2, 7.2 ->8).
However, when i run the test with MPS percentage 80%, # of SM gets rounded “down” and becomes 28(36 * 0.8 = 28.8, 28.8 → 28).

It seems like # SMs changes according to the policy that i’m not tracking.
I wish someone can explain or hand me any link that can help me.

Thank you!

Hi, @namch0101

Yes. The value gets rounded up or down in different cases.

When using the env variable CUDA_MPS_ACTIVE_THREAD_PERCENTAGE , it gets rounded down
When using cuCtxCreate_v3 (or green contexts), it will get rounded up. Refer here https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__GREEN__CONTEXTS.html

The reason the env variable is rounded down, because if you launch two applications at CUDA_MPS_ACTIVE_THREAD_PERCENTAGE=50, you expect both to fit exactly, not oversubscribe. So if the answer is something like 21.6, you do not want to use 22, as that would mean 2 SMs most likely would overlap and be shared across those two clients.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.