Testing performance on multiple GPUs

bvss891 · November 3, 2022, 3:48pm

Hello all,

We are testing the performance of modulus on our cluster with multiple GPUs. In this regard, we ran the basic wave_equation tutorial with 10000 batch points on one GPU and two GPUs. Surprisingly the time consumed by a single GPU is 45 min and on two GPUs it is 50 min. The GPUs are v100. please see the attachment with the GPU activity for two GPUs. The modulus is installed with docker. I don’t think this is supposed to happen. Can anyone give me any suggestions, please? Thanks.

ngeneva · November 3, 2022, 4:01pm

Hi @bvss891

In Modulus the batch size defined in the config are by default the local (per GPU) batch size. What this means is that if you keep the batch size the same between 1 and 2 GPUs is that you’ve gone from a global batch size of 10000 → 20000.

Thus this would be a weak scaling test and the best you could ask for is the same performance speed between 1->2 GPUs. This of course won’t happen because to the communication between the GPUs adding some overhead. The exact overhead depends on how the hardware you have and how its configured, the size of the model, etc…

For strong scaling tests, Reduce your batch size when running on 2 GPUs to 5000. But keep in mind that if your GPUs are fully saturated with the smaller batch size you’re not going to see ideal scaling.

There are some scaling stats in our user-guide for weak scaling.

bsarkar · November 3, 2022, 4:37pm

I was initially confused by the same sort of tests, until I understood the scaling system, as @ngeneva explained it.

Other things I noticed is that we weren’t using anywhere close to the total ram available on a single GPU yet.
So increasing batch-size(s), and decreasing # of iterations, was the first step to training a solution to the same accuracy in less time.

After that, increasing GPU # increased accuracy for the same configuration, such that we could lower the # of iterations again for the same accuracy, but I haven’t found the sweet-spot yet

bvss891 · November 4, 2022, 7:41am

Thanks, @ngeneva and @bsarkar for the inputs. It gives clarity on how parallelization works.

system · November 18, 2022, 7:41am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Enquires about running jobs using multiple GPUs Technical Support (Modulus Only)	1	920	April 14, 2023
How to use multi-GPUs on a single mechine to run the cases in Modulus Technical Support (Modulus Only)	7	1177	June 4, 2023
simpleMultiGPU processing time slower on dual than single? CUDA Programming and Performance	4	2252	November 30, 2008
Performance with multiGPU ... and the 9800 GX2. CUDA Programming and Performance	4	7945	October 22, 2008
Performance gap for a short test code between GPU and CPU CUDA Programming and Performance	8	1838	October 26, 2017
Multiple GPUs CUDA Programming and Performance	2	1653	January 10, 2009
Submitting multiple jobs to GPU at the same time CUDA Programming and Performance	7	5523	November 7, 2013
Poor performance with dual GPU 10x slower? CUDA Programming and Performance	10	5886	August 15, 2007
device speed vs. host speed Why is my device program so slow? CUDA Programming and Performance	8	7890	August 16, 2007
Speed problems with multi-gpu on GTX295 CUDA Programming and Performance	6	3080	January 5, 2010

Testing performance on multiple GPUs

Related topics