Enquires about running jobs using multiple GPUs


I think I managed to run my job using 2 GPUs. I tried to benchmark 2 jobs - 1st one running 1 GPU, and 2nd one with 2 GPUs. The step size is 10,000.

Both took around the same time using 1.5hrs.
The loss for the 2nd case is lower, but not by much -

[step: 10000] loss: 2.716e-02, time/iteration: 2.154e+02 ms
[step: 10000] loss: 3.458e-02, time/iteration: 2.006e+02 ms

In the 2nd job, there’s a msg which seems to imply I’m using 2 GPUs:

Initialized process 0 of 2 using method “openmpi”. Device set to cuda:0
Initialized process 1 of 2 using method “openmpi”. Device set to cuda:1

So am I using 2 GPUs in the 2nd job? Why is it that both job took the same time to complete?

In https://docs.nvidia.com/deeplearning/modulus/user_guide/features/performance.html

it is mentioned that:

This data parallel fashion of multi-GPU training keeps the number of points sampled per GPU constant while increasing the total effective batch size. You can use this to your advantage to increase the number of points sampled by increasing the number of GPUs allowing you to handle much larger problems.

Is this what is happening now? So in a multi-GPU run, I am actually using twice the batch size in total for 2 GPUs, is this correct? Hence, the time taken will be the same.

Please clarify. Thank you.

Hi @tsltaywb

Thanks for your interest in Modulus, and good question. You are correct. Batch sizes in Modulus are defined such that this is the batch size per process. In physics driven problems we generate training input “data” on initialization, which is simply batch_size * batch_per_epoch. So if you don’t half this number going from one to two GPUs, you’re actually increasing your dataset by a factor of two.

This is weak scaling and maintaining the same performance (wall clock) is the ideal. Theres some additional information in this thread here that can hopefully clear things up.