Enquires about running jobs using multiple GPUs

tsltaywb · April 14, 2023, 3:11pm

Hi,

I think I managed to run my job using 2 GPUs. I tried to benchmark 2 jobs - 1st one running 1 GPU, and 2nd one with 2 GPUs. The step size is 10,000.

Both took around the same time using 1.5hrs.
The loss for the 2nd case is lower, but not by much -

[step: 10000] loss: 2.716e-02, time/iteration: 2.154e+02 ms
vs
[step: 10000] loss: 3.458e-02, time/iteration: 2.006e+02 ms

In the 2nd job, there’s a msg which seems to imply I’m using 2 GPUs:

Initialized process 0 of 2 using method “openmpi”. Device set to cuda:0
Initialized process 1 of 2 using method “openmpi”. Device set to cuda:1

So am I using 2 GPUs in the 2nd job? Why is it that both job took the same time to complete?

In https://docs.nvidia.com/deeplearning/modulus/user_guide/features/performance.html

it is mentioned that:

This data parallel fashion of multi-GPU training keeps the number of points sampled per GPU constant while increasing the total effective batch size. You can use this to your advantage to increase the number of points sampled by increasing the number of GPUs allowing you to handle much larger problems.

Is this what is happening now? So in a multi-GPU run, I am actually using twice the batch size in total for 2 GPUs, is this correct? Hence, the time taken will be the same.

Please clarify. Thank you.

ngeneva · April 14, 2023, 7:24pm

Hi @tsltaywb

Thanks for your interest in Modulus, and good question. You are correct. Batch sizes in Modulus are defined such that this is the batch size per process. In physics driven problems we generate training input “data” on initialization, which is simply batch_size * batch_per_epoch. So if you don’t half this number going from one to two GPUs, you’re actually increasing your dataset by a factor of two.

This is weak scaling and maintaining the same performance (wall clock) is the ideal. Theres some additional information in this thread here that can hopefully clear things up.

Topic		Replies	Views
Testing performance on multiple GPUs Technical Support (PhysicsNeMo Only)	4	1687	November 4, 2022
How to use multi-GPUs on a single mechine to run the cases in Modulus Technical Support (PhysicsNeMo Only)	7	1228	June 4, 2023
Submitting multiple jobs to GPU at the same time CUDA Programming and Performance	7	5629	November 7, 2013
Multiple GPUs CUDA Programming and Performance	2	1659	January 10, 2009
simpleMultiGPU processing time slower on dual than single? CUDA Programming and Performance	4	2252	November 30, 2008
Performance with multiGPU ... and the 9800 GX2. CUDA Programming and Performance	4	7946	October 22, 2008
multiple GPU initialization is slow with openMPI on cluster CUDA Programming and Performance	0	466	March 21, 2017
Performance drop when using the processes using different gpus on one machine CUDA Programming and Performance	1	881	March 28, 2015
OpenMP and CUDA Legacy PGI Compilers	5	4000	October 12, 2017
Why multi-GPU does not work better? CUDA Programming and Performance	2	827	July 24, 2015

Enquires about running jobs using multiple GPUs

Related topics