Concerning NCCL2.4 across multiple nodes without MPI

chrishkchris · June 19, 2019, 2:50am

My own situation/context is that I am working on multi-node multi-GPU deep learning using NCCL2 to all-reduce the gradients without MPI.

Assume I want to get rid of MPI, I have three questions:

The best way or most convenient way to do is to broadcast the ncclUniqueId using UDP socket?
For multi-node NCCL, we cannot use ncclCommInitAll instead of ncclCommInitRank?
Instead of broadcasting the ncclUniqueId, can we initialize all the communicators at one nodes and send them to different nodes instead?

Thanks a lot!

Topic		Replies	Views
NCCL2 across multiple nodes without MPI? CUDA Programming and Performance	6	3432	January 27, 2025
How to run nccl-tests without MPI? CUDA Programming and Performance	0	220	August 1, 2024
How to use NCCL2 to communicate other server? CUDA Programming and Performance	1	663	February 12, 2018
How to use NCCL to communicate between nodes? CUDA Programming and Performance cuda , openmpi	0	1235	June 19, 2023
How to perform inter-GPU communication using NCCL2 across different hosts without MPI? GPU-Accelerated Libraries	1	874	May 10, 2018
NCCL vs. MPI for Distributed DL Deep Learning (Training & Inference)	0	782	January 2, 2019
New Scaling Algorithm and Initialization with NVIDIA Collective Communications Library 2.23 Technical Blog	1	17	January 31, 2025
Multi-GPU Unified Memory and Communication nvc, nvc++ and nvfortran	4	670	October 27, 2023
How can I tell whether NCCL is using PCIe or IB network interface while doing AllReduce? Deep Learning (Training & Inference)	0	717	March 6, 2020
ncclAllReduce hangs GPU-Accelerated Libraries nccl	1	720	December 18, 2023