cudaDeviceCanAccessPeer does not need to be 1. That would imply the existence of a second device in the same machine/node, and certainly GPUDirect RDMA does not depend on having a minimum of 2 GPUs in each node. You only need 1. (two devices in separate nodes can never be in a CUDA peer relationship, as the CUDA runtime would only have 1 device in view)
However the GPU in question as well as the network adapter must be on the same PCIE fabric. If they are both enumerated from the same PCIE root complex, that is a sufficient condition to satisfy the requirement of “on the same PCIE fabric” but not a necessary condition. It’s generally also sufficient if they are both connected (via PCIE) to the same CPU socket, however there are some nuances here with certain recent Intel CPUs. And if you’re using an AMD CPU, I wouldn’t have anything to say about that.
However if you had a fabric issue, I don’t think you would be getting to the point you are at where it halfway works (CPU memory in node 1 to GPU memory in node 2). Unless you have the fabric issue on one machine but not the other.
nvidia-smi topo -m
will spell out the relationships, including gpu to network adapter
the output could look something like this:
$ nvidia-smi topo -m
GPU0 GPU1 GPU2 GPU3 mlx4_0 CPU Affinity
GPU0 X PIX SYS SYS PHB 0-5,12-17
GPU1 PIX X SYS SYS PHB 0-5,12-17
GPU2 SYS SYS X PHB SYS 6-11,18-23
GPU3 SYS SYS PHB X SYS 6-11,18-23
mlx4_0 PHB PHB SYS SYS X
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe switches (without traversing the PCIe Host Bridge)
PIX = Connection traversing a single PCIe switch
NV# = Connection traversing a bonded set of # NVLinks
The SYS indicates P2P and GPUDirectRDMA transactions cannot follow that path.
The PHB (and PIX) indicates that the devices in question are on “the same PCIE fabric” - P2P or GPUDirect RDMA is possible.
So we see that GPUs 0 and 1 in the above diagram can communicate (for purposes of GPUDirect RDMA) with the ConnectX4 adapter. This assumes the GPUs in question are Quadro or Tesla GPUs also.