cudaFuncGetAttributes failed: out of memory

bonjour899 · February 22, 2020, 7:23am

I’m afraid I’ve run into something rather strange. I’ve been using GROMACS on a GPU server and the performance was quite good. However a few days ago a fatal error suddenly occurred as:

Program: gmx mdrun, version 2019.4
Source file: src/gromacs/gpu_utils/gpu_utils.cu (line 100)

Fatal error:
cudaFuncGetAttributes failed: out of memory

For more information and tips for troubleshooting, please check the GROMACS
website at Common Errors — GROMACS webpage https://www.gromacs.org documentation

I can run other apps with GPU and the other modules in GROMACS still work but I cannot run GROMACS with GPU anymore. Sorry for posting this problem here, but it seems more like something wrong with CUDA in the server (access from GROMACS denied?) since I’ve reinstalled the GROMACS and still having the same error.

Please help me to solve this problem! (unfortunately I do not have the authority to reboot the server.) And the GPU information is as below:

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 20497 C /usr/bin/python3 5861MiB |
| 0 24503 C /usr/bin/python3 10137MiB |
| 2 23162 C /home/appuser/Miniconda3/bin/python 16049MiB |
±----------------------------------------------------------------------------+

njuffa · February 22, 2020, 4:43pm

Given that this problem has persisted for a few days already, have you tried requesting advice on the GROMACS mailing list?

I don’t know GROMACS’s GPU usage model in a multi-GPU environment. The output of nvidia-smi shows that pretty much all of the GPU memory on GPUs 0 and 2 has been grabbed by other compute processes, and a straightforward assumption would be that this leaves insufficient memory to run GROMACS on them. If these are zombie processes, you would want to kill them. If these are legitimate active processes, I would try restricting GROMACS to GPUs 1 and 3, which appear to be unused.

bonjour899 · February 23, 2020, 3:50am

I’ve tried to restrict my tasks to other GPUs but still with the same error. I’ve also sent email to GROMACS mailing list and got no replies yet. As you mentioned, maybe it’s because of the GROMACS’s GPU usage model in a multi-GPU environment. Still asking for help…

bonjour899 · February 23, 2020, 6:25am

I think I’ve temporarily solved this problem. Only when I use CUDA_VISIBLE_DEVICES to block GPUs 0 and 2, I can run GROMACS smoothly. I think there may be some bug in GROMACS’s GPU usage model in a multi-GPU environment, just as njuffa had mentioned.

Topic		Replies	Views
Fatal error: Unexpected cudaStreamQuery failure: unspecified launch failure CUDA Programming and Performance cuda	3	1010	August 3, 2022
cuda_driver failed_to_allocate problem CUDA_ERROR_OUT_OF_MEMORY CUDA Programming and Performance	0	1759	April 18, 2019
out of memory CUDA Programming and Performance	11	16518	April 13, 2009
CUDA driver and runtime mismatch CUDA Programming and Performance	3	3114	October 12, 2021
cudaMalloc: out of memory, although the GPU memory is enough CUDA Programming and Performance	2	1758	December 27, 2019
Segmentation Fault / Memory error on cuda program exit CUDA Programming and Performance	0	1674	December 11, 2009
Gmx_mpi mdrun cuda error #700 CUDA Programming and Performance cuda	2	1119	April 14, 2023
"out of memory" for all cuda function all CUDA Programming and Performance	0	846	December 21, 2011
GROMACS Molecular Dynamics simulations run increasingly slower as simulation progresses CUDA Programming and Performance cuda , ubuntu	3	423	August 25, 2024
GPU Cuda out of memory error CUDA Programming and Performance gpu , gpu-computing	2	1406	July 7, 2023

cudaFuncGetAttributes failed: out of memory

For more information and tips for troubleshooting, please check the GROMACS website at Common Errors — GROMACS webpage https://www.gromacs.org documentation

Related topics

For more information and tips for troubleshooting, please check the GROMACS
website at Common Errors — GROMACS webpage https://www.gromacs.org documentation