When using MPS, the GPU memory is enough, but CUDA shows that out of memory

I execute the inference in the Machine P100 with 16000M GPU memory. When using the MPS and the process is 8, everything is ok. But when the inference process is 9 with an occupation of 6000M GPU memory, CUDA shows error, out of memory.
Many Thanks !!!

I am facing the same problem. On T4, using only 10% of memory, it starts to say out of memory when I have 4 MPS processes. Things work well if only 3 or fewer.

In the documentation, it says this is a known issue. Following the suggestions there, I call cudaSetDevice(0) at the first line of my main function, and also compile my executable with -fPIE and -fPIC. However, I am still getting the same problem.

Any suggestions on how I can fix this problem? Thank you.

Hi there @nkwkelvin,

since the original post is more than a year old, would you mind creating a new post in the CUDA forums for this? That is a better place to ask for advice on how to implement certain workarounds in CUDA.

Thanks!