I’d like to run a CUDA aware MPI (openMPI). However I have a simple problem when running on my development laptop:
launching “mpirun -np myprogram” on my machine with single GPU and calling cudaMalloc, each process ends up with a pointer to the same memory location (can provide MWE).
My question is:
- is the problem due to some issue with my MPI/CUDA configuration?
- is the problem due to the fact that you are not supposed to use a single GPU withing a MPI program?
- is the problem due to the fact that my card is too old?
Extra details:
CUDA 7.5, Driver Version: 361.28, card: NVS 5400M capability 2.1 on linux
under cuda aware openMPI (tried also intel mpi).