CUDA aware MPI

I’d like to run a CUDA aware MPI (openMPI). However I have a simple problem when running on my development laptop:

launching “mpirun -np myprogram” on my machine with single GPU and calling cudaMalloc, each process ends up with a pointer to the same memory location (can provide MWE).

My question is:

  • is the problem due to some issue with my MPI/CUDA configuration?
  • is the problem due to the fact that you are not supposed to use a single GPU withing a MPI program?
  • is the problem due to the fact that my card is too old?

Extra details:
CUDA 7.5, Driver Version: 361.28, card: NVS 5400M capability 2.1 on linux
under cuda aware openMPI (tried also intel mpi).

Maybe each host thread got its own CUDA context with separate (virtual) address space? This might explain why the pointers are identical.

I have to test this, but I had the impression the memory space was overridden (i.e. also same physical space).

Each CPU process gets its own virtual address space. This is a general principle and is not specific to CUDA.