Maximizing GROMACS Throughput with Multiple Simulations per GPU Using MPS and MIG

Hello Dr Alan,

I appreciate your response to my queries.

1)
No, I am not setting the CUDA_VISIBLE_DEVICES environment variable.
(Though I had also tried running mdrun after setting this as detailed in your blog.)

The simulation.sh file solely consists of:

module load apps/gromacs/2021.4/gnu
export OMP_NUM_THREADS=1

mpirun -np 1 gmx_mpi mdrun -v -s md.tpr -o md.trr -x md.xtc -cpo md.cpt -e md.edr -g md.log -c md.gro -ntomp 1 -nstlist 150 -nb gpu -bonded gpu -pme gpu -update gpu

2)
I have tried launching jobs with multiple GPUs and used the CUDA_VISIBLE_DEVICES variable. This had worked as expected without errors. The simulations were running on GPU_ID 0 or 1 based on our CUDA_VISIBLE_DEVICES variable used with gmx mdrun.

Some observations:

  1. No user is able to use the second GPU using -nb gpu -bonded gpu -pme gpu -update gpu when MPS was activated by someone on the first GPU.
  2. GROMACS only uses CPUs when -nb gpu -bonded gpu -pme gpu -update gpu flags are skipped on the second GPU jobs when MPS is already running on the first GPU. Therefore we don’t see the “no GPU is detected” error.

I am attaching the tpr file, in case you would like to test them at your end.
md.tpr (6.1 MB)

Thank you,
Akshay.