Creating Faster Molecular Dynamics Simulations with GROMACS 2020

jwitsoe · August 25, 2020, 11:54pm

Originally published at: https://developer.nvidia.com/blog/creating-faster-molecular-dynamics-simulations-with-gromacs-2020/

GROMACS logo GROMACS—one of the most widely used HPC applications— has received a major upgrade with the release of GROMACS 2020. The new version includes exciting new performance improvements resulting from a long-term collaboration between NVIDIA and the core GROMACS developers. As a simulation package for biomolecular systems, GROMACS evolves particles using the Newtonian equations…

feba · April 19, 2021, 12:42pm

Dear all,

I have tried to apply all the suggestions of this super interesting article but I got some problems.
I perform simulations with Gromacs 2020.2 on an HPC which has 4xV100 for each node (Driver Version: 450.51.06; CUDA Version: 11.0).

as reported in the article, before starting the run I used the commands:

export GMX_GPU_DD_COMMS=true
export GMX_GPU_PME_PP_COMMS=true
export GMX_FORCE_UPDATE_DEFAULT_GPU=true

and in the mdrun I specified:

-nb gpu -bonded gpu -pme gpu -npme 1

Moreover, the tpr file has been built specifying in the .mdp “constraints = h-bonds”.

When the simulation starts I have these messages:

Update task on the GPU was required, by the GMX_FORCE_UPDATE_DEFAULT_GPU environment variable, but the following condition(s) were not satisfied:
Domain decomposition without GPU halo exchange is not supported.
With separate PME rank(s), PME must use direct communication.
Will use CPU version of update.

Any idea to solve the problem?

Thank you so much

Best regards,

Federico

alang · April 20, 2021, 5:15pm

Hi Federico,

I can see that the “GPU Update” feature is not being activated because the “GPU communication” features are not active, even though you are correctly setting the environment variables. My best guess therefore is that you are using an external MPI library rather than the GROMACS-internal thread MPI library - GPU communications are only supported in this release for the latter (where we now have support for the former merged into the master branch, to be included in GROMACS 2022). If so, please can you re-build with thread MPI and try again. If that’s not the case, please can you provide the full “md.log” file and I’ll take a look.

Alan

feba · April 22, 2021, 5:06pm

Dear Alan,

thank you so much for your reply! Your guess is right, the error appears when I use the Spectrum MPI library for multi-node/multi-replica simulations. I re-build a thread MPI version of GROMACS for a single-node run, and in that case everything works flawless.

Thank you

best regards,

Federico

xiaoyzhu1 · April 26, 2021, 10:38am

Hi Alan,

Can you please give some pointers on the 3 systems you use (like how to get them/download them)? Basically we want to reproduce the same result in our system.

ADHD (95,561 atoms)
Cellulose (408,609 atoms)
STMV (1,066,628 atoms)

Thanks!

xiaoyzhu1 · April 26, 2021, 10:47am

Maybe give the PDB ID or something like that?

alang · April 30, 2021, 9:00am

Hi,
Please can you provide your email address in this temporary form and I’ll get in touch to help you get set up with this
GROMACS query - Google Forms

Thanks,

Alan

bilalpharma77 · March 31, 2022, 1:49am

I followed the same steps but it throws the following error. Even though I increased the equilibration time but did not work.
Step 100: The total potential energy is nan, which is not finite. The LJ and
electrostatic contributions to the energy are 0 and -1.18497e+07,
respectively. A non-finite potential energy can be caused by overlapping
interactions in bonded interactions or very large or Nan coordinate values.
Usually this is caused by a badly- or non-equilibrated initial configuration,
incorrect interactions or parameters in the topology.

alang · March 31, 2022, 5:11pm

Hi,

Please can you re-try with the latest GROMACS version (2022). If you still get the error, please can you isolate which of the options described in the blog is triggering it, and then create an issue at GROMACS / GROMACS · GitLab (including your input files and command line options), and the dev team will take a look.

Best regards,

Alan

mmadrid · October 2, 2022, 6:31pm

hi, do you have similar benchmarks for Gromacs 2022 for comparison with 2020? thanks

alang · October 4, 2022, 3:10pm

Hi,

Please see slide 4 of my presentation available at Presentation – PASC Program for some comparisons across versions 2019, 2020, 2021 and 2022, each running on the same A100 hardware. Please note that for the multi-GPU results, we tune for the optimal number of MPI tasks (in particular, running 2 MPI tasks per GPU can be often faster than a single task).

Best regards,

Alan

mmadrid · November 30, 2022, 1:28am

thank you!

biohegedus · December 28, 2022, 6:33am

Hi,

I tried gromacs 2021.6 built using gcc-10.3.0 and cuda-11.1.
The system consisted of 81,743 atoms (slightly smaller than ADHD).
I exploited AMD EPYC™ 7763 (64 cores) and 4 x NVIDIA A100-SXM4.

With simple:
$gmx -nt 64 -pin on -v -deffnm mdtest
performance: 93ns/day

With your setup/parameters, using 64 physical cores:
$gmx -v -pin on -ntmpi 4 -ntomp 16 -nb gpu -bonded gpu -pme gpu -npme 1 -nstlist 400 -deffnm mdtest
-nsteps 100000 -resetstep 90000 -noconfout
perfomance: 341ns/day

Based on your benchmark I expect no less than 450ns/day.

Would you be so kind to comment

if I have false expectations
how to debug for increasing performance

Thanks for your help and suggestions,
Tamas

alang · January 6, 2023, 1:29pm

Hi,

GROMACS performance is not simply influenced by the system size, but also quite strongly by specific details of the scientific system such and geometry and atom arrangement. So it’s possible that you are already getting good performance for your specific system. Here are some things to try/check:

Use the latest release of GROMACS for best performance (currently v2022.4, with v2023 out soon)
Make sure you are setting the environment variables as described in the blog
Make sure you are using h-bonds constraints, as described in the blog
Check the output (in the latest version) on GPU task assignment, which should look like this:

Mapping of GPU IDs to the 4 GPU tasks in the 4 ranks on this node:
PP:0,PP:1,PP:2,PME:3
PP tasks will do (non-perturbed) short-ranged and most bonded interactions on the GPU
PP task will update and constrain coordinates on the GPU
PME tasks will do all aspects on the GPU
GPU direct communication will be used between MPI ranks.

Use the NVIDIA Nsight Systems profiler tool to check that the GPU is being kept busy for majority of the simulation.

Best regards,

Alan

biohegedus · January 7, 2023, 4:19pm

Dear Alan,

Thanks for the suggestions.

I have not thought about the effects of simulation system properties on performance.
Are not your tpr files available for benchmarking? I think that this would be very helpful not only for me but also for the community.

All other issues (e.g. setting the environment variables, output of GPU task assignments) seem to be OK.
NVIDIA Nsight Systems profiler tool: I might try it. However, its application seems to be complicated for me, especially in an HPC environment.

Best regards,
Tamas

alang · January 10, 2023, 4:15pm

Hi,

The STMV and ADHD input files are available for download as the Supplementary Information archive for the paper Heterogeneous parallelization and acceleration of molecular dynamics simulations in GROMACS

Best Regards,

Alan

Topic		Replies	Views
Maximizing GROMACS Throughput with Multiple Simulations per GPU Using MPS and MIG Technical Blog	11	2197	April 14, 2025
A Guide to CUDA Graphs in GROMACS 2023 Technical Blog	1	746	July 18, 2023
Massively Improved Multi-node NVIDIA GPU Scalability with GROMACS Technical Blog	0	393	February 9, 2023
RTX A4000 for MD Simulations CUDA Programming and Performance	8	1590	August 25, 2022
Monte Carlo simulations on GPU CUDA Programming and Performance	6	6233	November 28, 2009
Delivering up to 9X the Throughput with NAMD v3 and NVIDIA A100 GPU Technical Blog	0	481	August 25, 2020
Gromacs 2020.2 Container: HPC	0	1156	September 25, 2020
Optimization of Molecular Dynamics Code. (S CUDA Programming and Performance	14	7545	October 16, 2009
Optimizing Memory with NVIDIA Nsight Systems Technical Blog	1	470	June 28, 2023
Help with Hardware Bench for PHD Thesis I am lookign for people who can run my code on other GPUs CUDA Programming and Performance	41	5513	March 3, 2011

Creating Faster Molecular Dynamics Simulations with GROMACS 2020

Related topics