Nvc++ cufft memory leak

Hello everyone,

I have observed a strange behaviour and potential memory leak when using cufft together with nvc++. It seems like the creation of a cufftHandle allocates some memory which is occasionally not deallocated when the handle is destroyed.

I was able to reproduce this behaviour on two different test systems with nvc++ 23.1-0 and Cuda 11.4 and Cuda 12.0.

cufftleak.cpp

#include <iostream>
#include <cufft.h>
#include <accel.h>

int main() {

    int length = 256;
    int batch = 1024;
    
    size_t mem = 0;
    int n = 0;

    while(1) {

        // Check for memory leak
        n++;
        if(mem != acc_get_free_memory()) {
            mem = acc_get_free_memory();
            std::cout << "free: " << double(mem)/(1024.*1024) << " MiB (n=" << n << ")" << std::endl;
            n = 0;
        }

        // Create and destroy cufft handle
        cufftHandle plan;
        cufftResult res = cufftPlan1d(&plan,length,CUFFT_C2C,batch);
        cufftDestroy(plan);
    }
            
}

Test A (mem leak):

> nvc++ -acc=gpu -cudalib=cufft -o testa cufftleak.cpp
> ./testa
free: 5238.06 MiB (n=1)
free: 5228.06 MiB (n=1)
free: 5226.06 MiB (n=142)
free: 5224.06 MiB (n=167)
free: 5222.06 MiB (n=167)
free: 5220.06 MiB (n=167)
free: 5218.06 MiB (n=167)
free: 5216.06 MiB (n=167)
free: 5214.06 MiB (n=167)
free: 5212.06 MiB (n=167)
...
free: 5032.06 MiB (n=167)
free: 5030.06 MiB (n=167)
free: 5028.06 MiB (n=167)
...

As you can see, about every 167 cufftPlan call the program allocates 2MiB memory. n varies depending on the FFT size. I have tested different cufftPlan* and cufftMakePlan* variants, all with the same result.

Some debugging lead to the conclusion that this behaviour is caused by the -cudalib=cufft flag. When replaced with -lcufft everything works just fine.

Test B (no mem leak):

> nvc++ -acc=gpu -lcufft -o testb cufftleak.cpp
> ./testb
free: 5230.69 MiB (n=1)
free: 5226.69 MiB (n=1)
free: 5234.06 MiB (n=268)
free: 5232.56 MiB (n=1104)
free: 5234.06 MiB (n=7)
free: 5233.69 MiB (n=1017)
free: 5234 MiB (n=1)
free: 5233.69 MiB (n=1)
free: 5233.38 MiB (n=1)
free: 5233.69 MiB (n=1)
free: 5234.06 MiB (n=4)
free: 5233.56 MiB (n=230)
free: 5232.56 MiB (n=1)
free: 5234.06 MiB (n=8)
...

Even more strange is that any library linked by -cudalib seems to force the memory leak. The following compile command also leads to the memory leak:

Test C (mem leak):

> nvc++ -acc=gpu -lcufft -cudalib=cublas -o testc cufftleak.cpp

In Nsight System one can see, that in test A cuModuleLoad, cudaMalloc and cudaFree is called every loop. In test B there is an additional call to cuModuleUnload but I dont think this is the reason for the memory leak, since it does not happen every loop but only ever several hundred loops.

What is the difference between -cudalib=cufft and -lcufft. From my unterstanding both should link the cufft lib and behave the same.

Hi Josh,

For good or bad, I’m not able to recreate the issue on my H100 device. All three tests produce similar output, which is what I’d expect.

Here’s the output from my testa run, which I let go for about 5 minutes:

% nvc++ -acc=gpu -cudalib=cufft -o testa cufftleak.cpp -V23.1 -gpu=cuda12.0
% ./testa
free: 80634.8 MiB (n=1)
free: 80626.8 MiB (n=1)
free: 80626.8 MiB (n=40209)
^C -- killed after 5 minutes

I’m guessing the message at 40209 is due to a small memory difference being returned from the device.

Maybe something else is running on the device or the free memory being returned by driver is inconsistent?

What CUDA driver version are you using and what device?

-Mat

Hi Mat,

thank you for the reply and for testing the code.

My tests were performed on a Workstation with an Nvidia Quadro P4000.
Cuda version 12.0
Driver version 525.147.05
HPC SDK version 22.11

Further tests revealed that the two variants differ in the linked cufft path.

> ldd testa
...
libcufft.so.10 => /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcufft.so.10
...

which links to libcufft.so.10.9.0.58

> ldd testb
...
libcufft.so.10 => /lib/x86_64-linux-gnu/libcufft.so.10
...

which links to libcufft.so.10.6.0.107

I have double checked and changed the softlink /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcufft.so.10 to link libcufft.so.10.6.0.107 and the memory leak was no longer present in testa.

The memory leak seems to occur in cufft library version v10.9.0.58 (delivered with Cuda 11.8), but not in v10.6.0.107 (somewere between Cuda 11.5 and 11.6).

The release notes for Cuda 11.8 mention a known issue:

cuFFT fails to deallocate some internal structures if the active CUDA context at program finalization is not the same used to create the cuFFT plan. This memory leak is constant per context, and will be fixed in an upcoming release.

which is neither in the fixed issues nor in the known issues in Cuda 12. This is the only hint for any memory leak that might have been fixed after Cuda 11.8.

Can you check, if your program is linked against an older or a newer cufft version, so we can decide if we need to upgrade our systems (which is kinda painful, since those systems are air-gapped) or if we have to switch to an older version.

Ok, I’m able to recreate the issue with the CUDA 11.8 version of cufft, but not with 12.0 so I’m guessing that they indeed did fix this issue.

We’ll have NVHPC 24.1 released here soon so if you decide to update, you might want to wait for it so you have the latest version.

Alternatively, you might try grabbing the cuFFT from the CUDA 12.0 SDK and then set the environment variable NVHPC_CUDA_HOME to the CUDA install directory. Then NVHPC will use this CUDA version. However the convenience flag “-cudalib” wont work in this case and you’ll need to manually add the include (-I) and library (-L) paths as well as “-lcufft”.

Hello Mat,

Thank you very much for confirming this bug and suggesting some solutions. The information that this is solved in Cuda 12 helps very much.

Best regards
Josh

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.