Creating Faster Molecular Dynamics Simulations with GROMACS 2020

Originally published at: https://developer.nvidia.com/blog/creating-faster-molecular-dynamics-simulations-with-gromacs-2020/

GROMACS logo GROMACS—one of the most widely used HPC applications— has received a major upgrade with the release of GROMACS 2020. The new version includes exciting new performance improvements resulting from a long-term collaboration between NVIDIA and the core GROMACS developers. As a simulation package for biomolecular systems, GROMACS evolves particles using the Newtonian equations…

Dear all,

I have tried to apply all the suggestions of this super interesting article but I got some problems.
I perform simulations with Gromacs 2020.2 on an HPC which has 4xV100 for each node (Driver Version: 450.51.06; CUDA Version: 11.0).

as reported in the article, before starting the run I used the commands:

export GMX_GPU_DD_COMMS=true
export GMX_GPU_PME_PP_COMMS=true
export GMX_FORCE_UPDATE_DEFAULT_GPU=true

and in the mdrun I specified:

-nb gpu -bonded gpu -pme gpu -npme 1

Moreover, the tpr file has been built specifying in the .mdp “constraints = h-bonds”.

When the simulation starts I have these messages:

Update task on the GPU was required, by the GMX_FORCE_UPDATE_DEFAULT_GPU environment variable, but the following condition(s) were not satisfied:
Domain decomposition without GPU halo exchange is not supported.
With separate PME rank(s), PME must use direct communication.
Will use CPU version of update.

Any idea to solve the problem?

Thank you so much

Best regards,

Federico

Hi Federico,

I can see that the “GPU Update” feature is not being activated because the “GPU communication” features are not active, even though you are correctly setting the environment variables. My best guess therefore is that you are using an external MPI library rather than the GROMACS-internal thread MPI library - GPU communications are only supported in this release for the latter (where we now have support for the former merged into the master branch, to be included in GROMACS 2022). If so, please can you re-build with thread MPI and try again. If that’s not the case, please can you provide the full “md.log” file and I’ll take a look.

Alan

Dear Alan,

thank you so much for your reply! Your guess is right, the error appears when I use the Spectrum MPI library for multi-node/multi-replica simulations. I re-build a thread MPI version of GROMACS for a single-node run, and in that case everything works flawless.

Thank you

best regards,

Federico

Hi Alan,

Can you please give some pointers on the 3 systems you use (like how to get them/download them)? Basically we want to reproduce the same result in our system.

  • ADHD (95,561 atoms)
  • Cellulose (408,609 atoms)
  • STMV (1,066,628 atoms)

Thanks!

Maybe give the PDB ID or something like that?

Hi,
Please can you provide your email address in this temporary form and I’ll get in touch to help you get set up with this
GROMACS query - Google Forms

Thanks,

Alan

I followed the same steps but it throws the following error. Even though I increased the equilibration time but did not work.
Step 100: The total potential energy is nan, which is not finite. The LJ and
electrostatic contributions to the energy are 0 and -1.18497e+07,
respectively. A non-finite potential energy can be caused by overlapping
interactions in bonded interactions or very large or Nan coordinate values.
Usually this is caused by a badly- or non-equilibrated initial configuration,
incorrect interactions or parameters in the topology.

Hi,

Please can you re-try with the latest GROMACS version (2022). If you still get the error, please can you isolate which of the options described in the blog is triggering it, and then create an issue at GROMACS / GROMACS · GitLab (including your input files and command line options), and the dev team will take a look.

Best regards,

Alan

hi, do you have similar benchmarks for Gromacs 2022 for comparison with 2020? thanks

Hi,

Please see slide 4 of my presentation available at Presentation – PASC Program for some comparisons across versions 2019, 2020, 2021 and 2022, each running on the same A100 hardware. Please note that for the multi-GPU results, we tune for the optimal number of MPI tasks (in particular, running 2 MPI tasks per GPU can be often faster than a single task).

Best regards,

Alan

thank you!

Hi,

I tried gromacs 2021.6 built using gcc-10.3.0 and cuda-11.1.
The system consisted of 81,743 atoms (slightly smaller than ADHD).
I exploited AMD EPYC™ 7763 (64 cores) and 4 x NVIDIA A100-SXM4.

With simple:
$gmx -nt 64 -pin on -v -deffnm mdtest
performance: 93ns/day

With your setup/parameters, using 64 physical cores:
$gmx -v -pin on -ntmpi 4 -ntomp 16 -nb gpu -bonded gpu -pme gpu -npme 1 -nstlist 400 -deffnm mdtest
-nsteps 100000 -resetstep 90000 -noconfout
perfomance: 341ns/day

Based on your benchmark I expect no less than 450ns/day.

Would you be so kind to comment

  • if I have false expectations
  • how to debug for increasing performance

Thanks for your help and suggestions,
Tamas

Hi,

GROMACS performance is not simply influenced by the system size, but also quite strongly by specific details of the scientific system such and geometry and atom arrangement. So it’s possible that you are already getting good performance for your specific system. Here are some things to try/check:

  • Use the latest release of GROMACS for best performance (currently v2022.4, with v2023 out soon)
  • Make sure you are setting the environment variables as described in the blog
  • Make sure you are using h-bonds constraints, as described in the blog
  • Check the output (in the latest version) on GPU task assignment, which should look like this:

Mapping of GPU IDs to the 4 GPU tasks in the 4 ranks on this node:
PP:0,PP:1,PP:2,PME:3
PP tasks will do (non-perturbed) short-ranged and most bonded interactions on the GPU
PP task will update and constrain coordinates on the GPU
PME tasks will do all aspects on the GPU
GPU direct communication will be used between MPI ranks.

  • Use the NVIDIA Nsight Systems profiler tool to check that the GPU is being kept busy for majority of the simulation.

Best regards,

Alan

Dear Alan,

Thanks for the suggestions.

I have not thought about the effects of simulation system properties on performance.
Are not your tpr files available for benchmarking? I think that this would be very helpful not only for me but also for the community.

All other issues (e.g. setting the environment variables, output of GPU task assignments) seem to be OK.
NVIDIA Nsight Systems profiler tool: I might try it. However, its application seems to be complicated for me, especially in an HPC environment.

Best regards,
Tamas

Hi,

The STMV and ADHD input files are available for download as the Supplementary Information archive for the paper Heterogeneous parallelization and acceleration of molecular dynamics simulations in GROMACS

Best Regards,

Alan