How to understand non-uniform partitioning in MPS document.
The limit constrained by the non-uniform active thread percentage is configured for every client CUDA context and can be changed throughout the client process.
From what I understand using CUDA_MPS_ACTIVE_THREAD_PERCENTAGE=25
is uniform partitioning - once I start the process with this env. var set to 25%, I cannot change the %age allocated to this process. With non-uniform, it seems like you can edit the active %age after starting the process. Is that true and how to edit active %age when the process is running? (Same question in https://forums.developer.nvidia.com/t/mps-set-default-active-thread-percentage-not-working-as-expected/194593?u=byte.xiaobin)