Is there a way to perform cross-node GPU memory copying without using NCCL?

I want to implement cross-node GPU memory copying without using NCCL’s P2P, as it would be more efficient. However, I haven’t found a suitable method to achieve this functionality, so I want to ask if you have any recommended approaches?

Thank you for all the replies.

Hi 308166554,

Thank you for posting your inquiry to the NVIDIA Developer Forums.

You’ll want to look into GPUDirect as a starting point:
https://developer.nvidia.com/gpudirect

These APIs and libraries allow direct communication between your network adapter and GPU via GPUDirect RDMA.

More information can be found at that link - and by reaching out to the mailing list (gpudirect@nvidia.com).

Best,
NVIDIA Enterprise Experience