I execute the inference in the Machine P100 with 16000M GPU memory. When using the MPS and the process is 8, everything is ok. But when the inference process is 9 with an occupation of 6000M GPU memory, CUDA shows error, out of memory.
Many Thanks !!!
I am facing the same problem. On T4, using only 10% of memory, it starts to say out of memory when I have 4 MPS processes. Things work well if only 3 or fewer.
In the documentation, it says this is a known issue. Following the suggestions there, I call cudaSetDevice(0) at the first line of my main function, and also compile my executable with -fPIE and -fPIC. However, I am still getting the same problem.
Any suggestions on how I can fix this problem? Thank you.
Hi there @nkwkelvin,
since the original post is more than a year old, would you mind creating a new post in the CUDA forums for this? That is a better place to ask for advice on how to implement certain workarounds in CUDA.
Thanks!