this picture are from Session 5, Hot Chips 34 (2022), Tuesday, August 23, 2022.
I don’t know what are allreduce and bidirection bandwidth literally.
I want to calculate these outputs myself by hand.
but I don’t know How to calculate these outputs.
plese let me know how to calcuate those.
allreduce is a term referring to a reduction operation that combines data from several GPUs into one data set, and then optionally redistributes that data back to the GPUs.
It’s a test that can be done using a variety of mechanisms, such as MPI, or NCCL.
In the NCCL case, there is an additional explanation here. You can study the math yourself but my takeaway is that it asymptotically approaches the bandwidth you would observe if you did a single transfer from one GPU to another, on the fabric.
The other number here is not “bidirection” but “bisection”. You can find writeups elsewhere of what that number means, its not unique or specific to CUDA or NVIDIA, but is a general networking term. Roughly, it means, if you took all the processors (i.e. GPUs) and had one half of them communicate to the other half of them, what is the bandwidth you would observe.