GPU memory problem when running pmemd.cuda.MPI

zhqhu.sioc · January 15, 2024, 6:15am

Dear all, I recently tried to run multiple replica MD simulations using multiple GPUs with pmemd.cuda.MPI.
My system information:
OS: openSuSE 15.4, CUDA: 12.1, MPI: openmpi-4.1.4, AMBER22, GPU: NVIDIA A6000 * 9
36 replicas were run in parallel, when I checked the GPU usage with nvidia-smi, It listed all 9 GPU cards shared 4 jobs each with 944~990MiB GPU memory usage for each PID, which is normal. But GPU0 further listed other 32 PIDs (322MiB GPU memory usage each) , each of these PIDs could be exactly found from the PIDs of GPU1 ~ GPU8.
As a result, GPU1~GPU8 each takes 3963MiB GPU memory, but GPU0 takes 14229MiB, it seemed like the first GPU card need a huge amount of GPU memory for data-exchange. I tried the same kind of calculation with older GPU cards before (NVIDIA RTX 2080ti running on CentOS 7), but there is no such kind of GPU memory usage for GPU0.
Are there any suggestions to inhibit this GPU0 memory usage issue?
Thanks …

Robert_Crovella · January 18, 2024, 5:18pm

you may get better help with this by asking on a forum for AMBER users.

zhqhu.sioc · January 19, 2024, 1:35am

Yes, I also posted on the AMBER forum. Hope either NVIDIA experts or AMBER developers can solve this issue.

Robert_Crovella via NVIDIA Developer Forums <notifications@nvidia.discoursemail.com> 于2024年1月19日周五 01:19写道：

Topic		Replies	Views
CUDA Multi GPU memory management CUDA Programming and Performance	0	574	April 13, 2023
Why does the GPU memory used by process not add up to memory used according to nvidia-smi? Video Processing & Optical Flow	2	1203	October 12, 2021
Nvidia-smi shows 0MB GPU memory utilization for docker processes CUDA Programming and Performance nvidia-smi	1	75	December 26, 2024
Inconsistent NVIDIA-SMI between two 8x GPU Linux nvidia-smi	3	330	April 16, 2024
cudaMemGetInfo returns similar result for 3 different GPUs CUDA Programming and Performance cuda , nvbugs	5	356	January 23, 2024
per-process resource accounting CUDA Programming and Performance	2	2678	December 22, 2022
Why do cudaMemGetInfo also occupies a lot of GPU memory? CUDA Programming and Performance cuda	2	629	December 22, 2021
GPU Memory usages after cusolverMgDeviceSelect GPU-Accelerated Libraries	5	598	September 29, 2020
Ways to reduce GPU memory usage Legacy PGI Compilers	1	2049	February 8, 2016
16.1 run-time Out of memory allocating x byte device memory Legacy PGI Compilers	5	5599	March 10, 2016

GPU memory problem when running pmemd.cuda.MPI

Related topics