using UvA for identical code running on multiple GPUs


i hope you are doing good. I have a code that runs on multiple GPUs that are connected through network (each GPU is attached to a CPU). They talk to each other using MPI (via CPU). I wish to port this code onto a machine which has 8 GPUs and use UvA instead of MPI in order to communicate data between GPUs. I want to see what advantage will this change give and how much can i gain on (hopefully) lesser communication time.

However after going through the simpleP2P example in the SDK for CUDA 5 i have a sinking feeling that in order to take advantage of the UvA P2P based access of GPUs on a single machine one has to use ‘different’ variable names on different GPUs.

If not then how will one GPU understand which data belongs to it and which to some other GPU when both variable names are the same (since i am running the same code on all GPUs so all variables will have same names)?

Does anyone have an idea to get out of this conundrum or perhaps i should just wait for my cluster administrator to install the backend for GPUDirect (RDMA)?

kindly suggest/advise.