Hi! My group has 128 3090, how can I connect them together? I know like DGX A100 can connect 8 GPU together, and then use nvswitch to connect them, but can we do this on 3090? Thanks!!!
RTX 3090 can be connected pairwise with nvlink bridge. You cannot use NVSwitch with RTX3090.
I see! So 3090 also can not use NVLINK? That means, if I have 128 number of 3090, the only way to connect them is, for pair, use NV bridge, for these 64 pairs, we can only use PCIe?
NVBridge is NVLink.
correct. The pairs can communicate with themselves using NVLink (over the NV Bridge). When a GPU in a pair wants to communicate with a GPU in another pair (or any other GPU or entity besides its “pair partner”) it will use PCIE.
Emmm…So do you know how should we train this 128 piece of 3090 for LLM? Do you have any suggestion? I mean…seems the communication cost would be huge…
Like… for such large communication cost, how to use computation to cover it? Any solid solutions? Thanks!
Training a LLM is certainly an advanced topic. Nobody at NVIDIA would recommend for serious work in that space that you lash together 128 consumer GPUs. However, the problem of communication hiding is present regardless of the underlying hardware platform.
The current NVIDIA primary solution in this space is Nemo Framework (and, somewhat related, Nemo Service). You can learn more about it using many of the resources already available such as here and here. To get another “view” of it, you can take a look at the foundational work done by NVIDIA research in this space, some of which is published here.
LLM training takes advantage of a number of characteristics of the underlying model training to exploit various parallelism avenues. Several of these avenues allow for the overlap of computation with communication, which is a key aspect of communication hiding. You can read more about it in the papers linked to the last link.