Dear all,
I am studying cuda Fortran to develop a subroutine for a numerical computation executed on NVIDIA’s GPU.
I am facing in a error of out of memory when calling the subroutine second time.
The first calling successfully returns, but the second calling fails in out of memory.
The memory/arrays in the subroutine are allocatable device array to keep the memory consumption low,
and the allocatable array should be deallocated on leaving the variable scope.
However, the memory consumption is gradually increasing during the execution of the subroutine,
even after returning from the subroutine call, the second call yields out of memory error at the beginning of
the subroutine where allocating device array.
So, I also include explicit deallocates in the code, but it does not help.
I also checked the result of ‘cudaMemGetInfo(free,total)’ and nvtop memory consumption.
free size does not decrease, but this number could be different from available memory size.
The code uses, cublas and cutensorex.
compute-sanitizer --tool memcheck does not show any memory leakages until the out-of-memory error.
I would like to know how to free allocated device memory for multiple calling of the subroutine?
I could post the test/developing code here, but the code length is rather large and the reproducer needs a sizable data input.
So I should post the code in a different communication path.
Best,
Ken-Ichi
System/env information follow
% cat /etc/redhat-release
Fedora release 39 (Thirty Nine)
% nvfortran --version
nvfortran 25.3-0 64-bit target on x86-64 Linux -tp skylake-avx512
NVIDIA Compilers and Tools
Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
% nvidia-smi
Thu Apr 24 16:46:32 2025
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.135 Driver Version: 550.135 CUDA Version: 12.4 |
|-----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX A5000 Off | 00000000:65:00.0 Off | Off |
| 30% 42C P8 28W / 230W | 1MiB / 24564MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
±----------------------------------------------------------------------------------------+
==============================================================================================