I have two GPUs (24GB memory per GPU) with NVlink.
I must train a NN model which requires 20GB.
In Pytorch, nn.parallel.DistributedDataParallel is used for training in multiple GPUs.
According to my understanding, NN model is copied into each GPU. In other words,
the remaining 8GB (4x2) can be used for saving training data.
Is it possible multiples GPUs work as one with more memory via NVlink?
In other words, I can write the code like one GPU without nn.parallel.DistributedDataParallel.
I suppose that one GPU with big memory only saves the model once.
Thus, there is 28GB memory,