I’m afraid I’ve run into something rather strange. I’ve been using GROMACS on a GPU server and the performance was quite good. However a few days ago a fatal error suddenly occurred as:
Program: gmx mdrun, version 2019.4
Source file: src/gromacs/gpu_utils/gpu_utils.cu (line 100)
Fatal error:
cudaFuncGetAttributes failed: out of memory
For more information and tips for troubleshooting, please check the GROMACS
website at Common Errors — GROMACS webpage https://www.gromacs.org documentation
I can run other apps with GPU and the other modules in GROMACS still work but I cannot run GROMACS with GPU anymore. Sorry for posting this problem here, but it seems more like something wrong with CUDA in the server (access from GROMACS denied?) since I’ve reinstalled the GROMACS and still having the same error.
Please help me to solve this problem! (unfortunately I do not have the authority to reboot the server.) And the GPU information is as below:
±----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE… On | 00000000:04:00.0 Off | 0 |
| N/A 30C P0 32W / 250W | 16008MiB / 16280MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla P100-PCIE… On | 00000000:06:00.0 Off | 0 |
| N/A 30C P0 27W / 250W | 10MiB / 16280MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 2 Tesla P100-PCIE… On | 00000000:07:00.0 Off | 0 |
| N/A 30C P0 32W / 250W | 16063MiB / 16280MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 3 Tesla P100-PCIE… On | 00000000:08:00.0 Off | 0 |
| N/A 32C P0 28W / 250W | 10MiB / 16280MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 4 Quadro P4000 On | 00000000:0B:00.0 Off | N/A |
| 46% 23C P8 8W / 105W | 12MiB / 8119MiB | 0% Default |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 20497 C /usr/bin/python3 5861MiB |
| 0 24503 C /usr/bin/python3 10137MiB |
| 2 23162 C /home/appuser/Miniconda3/bin/python 16049MiB |
±----------------------------------------------------------------------------+