Why is MPS not default?

I was experimenting with MPS and it did improve the performance when single task doesn’t saturate GPU.
I’m wondering is there a performance-level reason MPS is not default when running multiple processes?
Or is it just convention to give all the resource to one process and finish it ASAP?

Also, what’s the difference between using MPS to share hardware and using multi-cuda-stream in terms of scheduling and workload balancing?


There are some limitations on CUDA behavior when MPS is used. These are covered in the MPS manual.