Model sharding, data parallelism and NVLink

xenomarz · January 11, 2022, 10:36am

Hi Guys,
I read the following threads:
https://forums.developer.nvidia.com/t/can-nvlink-combine-2x-gpus-into-1x-big-gpu/73594

https://discuss.pytorch.org/t/split-single-model-in-multiple-gpus/13239

and also watched the following video:
https://youtu.be/_d3xs1L4jeA

Which led me to the following question:
In my group, we are interested in buying a server with 8 Nvidia A40 GPUs, such that those 8 GPUs are split into 4 groups of 2, where each pair of GPUs are physically connected using a NVLink bridge.
I wonder how using 4 pairs of NVLink GPUs will affect the utilization of data parallelism and model sharding. How would it be different compared to using the same 8 GPUs without any NVLink bridges between them?

Thanks

spolisetty · January 18, 2022, 10:25am

Hi,

This doesn’t look like cuDNN related. We recommend you to please post your concern on related platform to get better help.

Thank you.

yanxu · January 25, 2022, 4:28am

CuDNN today does not support actively partitioning the computations onto multiple GPUs
Some of the DL frameworks might support those features, it’s possible that you might also modify your model training scripts to achieve it. If you are able to utilize those features, faster communication through NVLink will definitely speed up the process (compared to the slower PCI-E).

Topic		Replies	Views
Can I make a NVLinked 2x RTX 2080Tis as 1x big GPUs? CUDA Setup and Installation	1	465	May 16, 2019
CUDNN and multi-GPU parallelism GPU-Accelerated Libraries	1	2636	February 22, 2016
Is possible multiples GPUs work as one with more memory via NVlink? cuDNN	2	2994	April 27, 2021
Multiple GPU processing and SLI CUDA Programming and Performance	4	1852	December 16, 2018
Mixing A40s and A100 in the same server CUDA Programming and Performance cuda , tensorflow	4	444	November 8, 2023
Using multiple RTX 2080 Ti cards in parallel not possible? CUDA Programming and Performance	7	4227	May 13, 2019
Optimal multi-GPU system CUDA Programming and Performance	7	1022	September 6, 2017
NVLINK support for connecting 4 GPUs GPU - Hardware	9	5947	May 29, 2023
4x RTX Titan and NVLink TensorRT	8	4087	February 15, 2019
Can NVLink combine 2x GPUs into 1x Big GPU? Frameworks tensorflow	3	11389	June 5, 2019

Model sharding, data parallelism and NVLink

Related topics