Communication problem

In my system, I have two nodes CPU1 and CPU2 and each
node has two GPU cards.

CPU1 has GPU_A1 and GPU_B1
CPU2 has GPU_A2 and GPU_B2

I know that GPU_A1 and GPU_B1 can direct access each other
via CudaMemCpyPeer function. I want to know that if it is
possible that GPU_A1 can direct access GPU_A2 or GPU_B2
or not ?

I use PGI community edition 16.10 with its built-in OpenMPI 1.10
I refer to the following slides (In page 13)
It uses MPI_Send and MPI_Recv function to send the array in GPU
to other GPU. But it does not work. Is there anyone can tell me how to do it ?

Thank you very much.

Hi Neo,

Are you using OpenACC? Jiri (one of the authors of the slide deck you noted) has posted example of his code online, with this one being from the book “Parallel Programming with OpenACC”.

Hopefully it gives you some clues to the problem.