May anybody explain how set_default_active_thread_percentage works when configuring mps server daemon?
For example, when I use set_default_device_pinned_mem_limit, it just limits memory that’s all. I thought, that set_default_active_thread_percentage works in same behaviour, but for GPU only. But somehow, when I use this limit it also affects memory too. For example, without all limits my application uses 229 MiB memory, but when I use set_default_active_thread_percentage 50, my memory consumes 190 MiB (and 30 MiB if I use 3%), while set_default_device_pinned_mem_limit was untouched at all.
Is there any way to limit GPU and memory limits with two separate configurations, which doesn’t affect each other?
I think its possible that the thread percentage limit could affect an applications memory use in an indirect way.
Of course, if your application is doing a cudaMalloc for 229 MiB of memory, then that should be unaffected. But if your application has memory usage that corresponds to active threads, then that will be affected by limiting the threads to a smaller number. Two examples of this that I can think of are local memory usage per thread as well dynamic in-kernel memory allocation.
It may also be the case that the application does something similar using host allocation methods (e.g. querying the active threads, then sizing an allocation accordingly), although that seems less likely to me.
If that is the affect you are observing, there is no way to separate the memory usage from the number of active threads permitted.