CudaMalloc fails when more of 2 linux process acces to the GPU 0

I have a machine with 2 C1060
I would like to share one GPU between more of one linux process.
I have 4 process, and one thread per process. On each thread, I call cudaSetDevice(0).

I launch the 4 process at the same time, each process create on thread, and on this thread, cudaSetDevice(0) is called firstly, followed by three cudaMalloc() calls.

All the cudaMalloc() fails with a “unknown error”.
If I reduce the number of process to 3 process, I get the same error. But if I launch only 2 process, all cudaMalloc runs perfectly without “unknown error”.

But, and it is very strange, if I use the second GPU (cudaSetDevice(1)) I can run the 4 process without any problem, the 4 process calls cudaMalloc for the GPU 1 and all it’s OK.

Bug or feature ?


Is X running while you’re doing this?

No X is running on both GPU