How to use MPI based P2P on 1 mpi node and 2 mpi process?

Hey. I have Linux, compiled MVAPICH2 2.3 with --enable-cuda, there are 2 GPU (supp CUDA 8+) that support P2P and RDMA.

I write on c++. I compile with MV2_USE_CUDA 1. I run 1 MPI node and 2 MPI process for this node, each process use his GPU (0 rank use GPU_0, 1 rank use GPU_1). And how can I ensure that MPI exchanges take place through P2P?

I would like so

// code code

// MPI exchange which use P2P
MPI_Status status;
MPI_SendRecv(d_data_out, count, MPI_DOUBLE, rankTo, 777, data_in_out, count, MPI_DOUBLE, rankTo, 777, &status);
// ....................

// code code

Next step, i run N MPI nodes and 2 MPI process for each node, each MPI process use his GPU_(#LocalRank). And how can I ensure that MPI exchanges take place through P2P (inside each node) and through RDMA (beetween nodes)? I use MPI_AlltoAllv func in which part of the exchanges must be P2P (if exchange for GPUs inside node) and part must be RDMA (for exchange for GPUs on different nodes).