I want to understand various advantages provided by GPUDirect along with RDMA.
I will be using CUDA-Aware MPI to offload my jobs. I am not having fast interconnect, and thus will be doing this on slow lan we are having in our college lab. All our PCs are equipped with NVIDIA GPU cards.
I want to know what are the specific system requirement I must meet to test some simple simple algorithms, such as vector addition. So for example I want to access memory in node 5 from node three using DMA, GPU Direct provides.
Additionally I might want to fetch some memory from a third party device.