Hello everyone,
I have observed a strange behaviour and potential memory leak when using cufft together with nvc++. It seems like the creation of a cufftHandle allocates some memory which is occasionally not deallocated when the handle is destroyed.
I was able to reproduce this behaviour on two different test systems with nvc++ 23.1-0 and Cuda 11.4 and Cuda 12.0.
cufftleak.cpp
#include <iostream>
#include <cufft.h>
#include <accel.h>
int main() {
int length = 256;
int batch = 1024;
size_t mem = 0;
int n = 0;
while(1) {
// Check for memory leak
n++;
if(mem != acc_get_free_memory()) {
mem = acc_get_free_memory();
std::cout << "free: " << double(mem)/(1024.*1024) << " MiB (n=" << n << ")" << std::endl;
n = 0;
}
// Create and destroy cufft handle
cufftHandle plan;
cufftResult res = cufftPlan1d(&plan,length,CUFFT_C2C,batch);
cufftDestroy(plan);
}
}
Test A (mem leak):
> nvc++ -acc=gpu -cudalib=cufft -o testa cufftleak.cpp
> ./testa
free: 5238.06 MiB (n=1)
free: 5228.06 MiB (n=1)
free: 5226.06 MiB (n=142)
free: 5224.06 MiB (n=167)
free: 5222.06 MiB (n=167)
free: 5220.06 MiB (n=167)
free: 5218.06 MiB (n=167)
free: 5216.06 MiB (n=167)
free: 5214.06 MiB (n=167)
free: 5212.06 MiB (n=167)
...
free: 5032.06 MiB (n=167)
free: 5030.06 MiB (n=167)
free: 5028.06 MiB (n=167)
...
As you can see, about every 167 cufftPlan call the program allocates 2MiB memory. n varies depending on the FFT size. I have tested different cufftPlan* and cufftMakePlan* variants, all with the same result.
Some debugging lead to the conclusion that this behaviour is caused by the -cudalib=cufft flag. When replaced with -lcufft everything works just fine.
Test B (no mem leak):
> nvc++ -acc=gpu -lcufft -o testb cufftleak.cpp
> ./testb
free: 5230.69 MiB (n=1)
free: 5226.69 MiB (n=1)
free: 5234.06 MiB (n=268)
free: 5232.56 MiB (n=1104)
free: 5234.06 MiB (n=7)
free: 5233.69 MiB (n=1017)
free: 5234 MiB (n=1)
free: 5233.69 MiB (n=1)
free: 5233.38 MiB (n=1)
free: 5233.69 MiB (n=1)
free: 5234.06 MiB (n=4)
free: 5233.56 MiB (n=230)
free: 5232.56 MiB (n=1)
free: 5234.06 MiB (n=8)
...
Even more strange is that any library linked by -cudalib seems to force the memory leak. The following compile command also leads to the memory leak:
Test C (mem leak):
> nvc++ -acc=gpu -lcufft -cudalib=cublas -o testc cufftleak.cpp
In Nsight System one can see, that in test A cuModuleLoad, cudaMalloc and cudaFree is called every loop. In test B there is an additional call to cuModuleUnload but I dont think this is the reason for the memory leak, since it does not happen every loop but only ever several hundred loops.
What is the difference between -cudalib=cufft and -lcufft. From my unterstanding both should link the cufft lib and behave the same.