Is possible multiples GPUs work as one with more memory via NVlink?

I have two GPUs (24GB memory per GPU) with NVlink.

I must train a NN model which requires 20GB.

In Pytorch, nn.parallel.DistributedDataParallel is used for training in multiple GPUs.
According to my understanding, NN model is copied into each GPU. In other words,
the remaining 8GB (4x2) can be used for saving training data.

Is it possible multiples GPUs work as one with more memory via NVlink?
In other words, I can write the code like one GPU without nn.parallel.DistributedDataParallel.
I suppose that one GPU with big memory only saves the model once.
Thus, there is 28GB memory,

Hi @parvaty316 ,
Apologies for delayed response.
Please allow me some time, as I am currently checking on this.

Hi @parvaty316 ,
No it’s not possible to treat these two devices as one “visible” device, but you could use model sharding to split the model onto these two devices as described here: Split single model in multiple gpus - #2 by ptrblck - PyTorch Forums

Thanks!