What is Max number of MPI processes that can be used with Multi-Process Service?

Hi everyone,

I’m using MPS service on MPI processes (my GPU is K20M). I can increase the number of processes to 10, but after that I get error on cudaMalloc (memory cannot allocated).

In my program each process simply cudaMallocs 4 Byte. I increased CUDA_DEVICE_MAX_CONNECTIONS to 16 and 32 but it does not help. I expected CUDA_DEVICE_MAX_CONNECTIONS to be able to increase to 32 but when MPS Server is in use I cannot go beyond 10.

Do you know what is the problem? I am using Fedora 20 and CUDA 6.5. Do you think if I upgrade to CUDA 7.0 this problem may be resolved? Any suggestion is appreciated.


MPS underwent some updates for CUDA 7. I certainly think CUDA 7 is worth a try.

According to the MPS documentation for the pre-7.0 version, the maximum connection limit (number of clients) should be 16:


refer to section