MPI + CUDA Problem

Hi Everyone,

I was wondering if some one could help me with my problem.
I am programming a Solver for a CFD application with CUDA and MPI, using the libraries cublas and cusparse, and i am doing the test in one machine of 4 cores and one GTX590 (2 GPUs) … If i run the program with one core i have no problems, but when i try to run it with 4 cores the program behaves in a random way, sometimes gives the results ok and sometimes returns only trash. I traced out the problem, but the program not always fail in the same step or iteration of the solver… its looks like a shared memory problem, but i am only accessing the GPUs throw cublas and cusparse.

Does anyone knows if there is some limitation on CUDA when you use it with MPI? I mean considering that the GTX590 has only 2 GPUS maybe i can not access it with more than 2 cores?

I hope you can help me to solve this problem.