GPU memory problem when running pmemd.cuda.MPI

Dear all, I recently tried to run multiple replica MD simulations using multiple GPUs with pmemd.cuda.MPI.
My system information:
OS: openSuSE 15.4, CUDA: 12.1, MPI: openmpi-4.1.4, AMBER22, GPU: NVIDIA A6000 * 9
36 replicas were run in parallel, when I checked the GPU usage with nvidia-smi, It listed all 9 GPU cards shared 4 jobs each with 944~990MiB GPU memory usage for each PID, which is normal. But GPU0 further listed other 32 PIDs (322MiB GPU memory usage each) , each of these PIDs could be exactly found from the PIDs of GPU1 ~ GPU8.
As a result, GPU1~GPU8 each takes 3963MiB GPU memory, but GPU0 takes 14229MiB, it seemed like the first GPU card need a huge amount of GPU memory for data-exchange. I tried the same kind of calculation with older GPU cards before (NVIDIA RTX 2080ti running on CentOS 7), but there is no such kind of GPU memory usage for GPU0.
Are there any suggestions to inhibit this GPU0 memory usage issue?
Thanks …

you may get better help with this by asking on a forum for AMBER users.

Yes, I also posted on the AMBER forum. Hope either NVIDIA experts or AMBER developers can solve this issue.

Robert_Crovella via NVIDIA Developer Forums <> 于2024年1月19日周五 01:19写道: