I’m using MPS service on MPI processes (my GPU is K20M). I can increase the number of processes to 10, but after that I get error on cudaMalloc (memory cannot allocated).
In my program each process simply cudaMallocs 4 Byte. I increased CUDA_DEVICE_MAX_CONNECTIONS to 16 and 32 but it does not help. I expected CUDA_DEVICE_MAX_CONNECTIONS to be able to increase to 32 but when MPS Server is in use I cannot go beyond 10.
Do you know what is the problem? I am using Fedora 20 and CUDA 6.5. Do you think if I upgrade to CUDA 7.0 this problem may be resolved? Any suggestion is appreciated.