i am using mps in multi thread mode, each thread with a cuda context, i wonder know why it performance bad compare with non-mps service with one cuda context and multi streams
MPS handles multi-process situations
There is no suggestion that MPS is a better or equivalent approach to doing the same work in a single process with a single context and multiple streams.
what really mps do? I think it will reduce the mutex in multi cuda contexts, not only in multi processes,in our test, it helps a lot when using tensorflow involve multi cuda contexts and cuda streams, am i right?