Misunderstand about MPS non-uniform partitioning

How to understand non-uniform partitioning in MPS document.

The limit constrained by the non-uniform active thread percentage is configured for every client CUDA context and can be changed throughout the client process.

From what I understand using CUDA_MPS_ACTIVE_THREAD_PERCENTAGE=25 is uniform partitioning - once I start the process with this env. var set to 25%, I cannot change the %age allocated to this process. With non-uniform, it seems like you can edit the active %age after starting the process. Is that true and how to edit active %age when the process is running? (Same question in https://forums.developer.nvidia.com/t/mps-set-default-active-thread-percentage-not-working-as-expected/194593?u=byte.xiaobin)

You cannot edit the percentage assigned to a process today. We are actively looking into it.

If you control the application source code, you could handle this by creating multiple contexts and switching which context you submit to. This is much easier to accomplish with execution contexts now ( CUDA Runtime API :: CUDA Toolkit Documentation ).