Hi Guys,
I read the following threads:
https://forums.developer.nvidia.com/t/can-nvlink-combine-2x-gpus-into-1x-big-gpu/73594
https://discuss.pytorch.org/t/split-single-model-in-multiple-gpus/13239
and also watched the following video:
https://youtu.be/_d3xs1L4jeA
Which led me to the following question:
In my group, we are interested in buying a server with 8 Nvidia A40 GPUs, such that those 8 GPUs are split into 4 groups of 2, where each pair of GPUs are physically connected using a NVLink bridge.
I wonder how using 4 pairs of NVLink GPUs will affect the utilization of data parallelism and model sharding. How would it be different compared to using the same 8 GPUs without any NVLink bridges between them?
Thanks