NCCL allreduce in a high performance DGX A100 cluster

whatdhack · May 18, 2024, 5:51pm

Where can I find conceptual details of how allreduce , reduce-scatter and allgather happens in a cluster of DGX A100 systems ? Does it happen in a hierarchical way ? Is there any tool available to trace the steps ?

Robert_Crovella · May 18, 2024, 8:11pm

nccl will do an evaluation of cluster topology, and run some tests to determine how it will communicate. Some methods are hierarchical, like tree, and some are not, or less, hierachical, like ring. From here:

We now have up to 9 choices for Algorithm x Protocol ({Ring,Tree,CollNet}x{LL,LL128,Simple}) so we have models of each combination and for each size, we estimate how much time each would take, then take the lowest.

I don’t know of “conceptual” documentation, but there is a GTC presentation.

Also, NCCL is open source.

If you use a profiler, you can see what nccl is doing, to some degree.

Topic		Replies	Views
Fast Multi-GPU collectives with NCCL Technical Blog	14	1196	May 11, 2018
NCCL and D2D data moving across GPU devices CUDA Programming and Performance	0	1189	October 28, 2017
can NCCL be used in distributed environment? across machines. GPU-Accelerated Libraries	0	498	August 10, 2018
Scaling Deep Learning Training with NCCL Technical Blog	1	855	November 6, 2018
Does NCCL2.4 uses Hierarchical All-Reduce by default? Deep Learning (Training & Inference)	0	388	June 19, 2019
can NCCL be used in distributed environment? across machines. CUDA Programming and Performance	0	467	August 10, 2018
Doubling all2all Performance with NVIDIA Collective Communication Library 2.12 Technical Blog	0	831	February 28, 2022
How can I tell whether NCCL is using PCIe or IB network interface while doing AllReduce? Deep Learning (Training & Inference)	0	779	March 6, 2020
User defined operations in NCCL GPU-Accelerated Libraries	0	431	March 1, 2018
NCCL AllGather & AllReduce error CUDA Programming and Performance	1	2617	April 18, 2018

NCCL allreduce in a high performance DGX A100 cluster

Related topics