Exploring GPUDirect on a Local Area network

I want to understand various advantages provided by GPUDirect along with RDMA.

I will be using CUDA-Aware MPI to offload my jobs. I am not having fast interconnect, and thus will be doing this on slow lan we are having in our college lab. All our PCs are equipped with NVIDIA GPU cards.

   I want to know what are the specific system requirement I must meet to test some simple simple algorithms, such as vector addition. So for example I want to access memory in node 5 from node three using DMA, GPU Direct provides. 

Additionally I might want to fetch some memory from a third party device.

I am not sure but I don’t think GPU Direct RDMA will work without IB.