cufftEstimate*() memory consumption

jalilia370 · June 28, 2019, 7:24am

Hello everyone
I’m new to cuFFT library. I’m doing a project in which I need to estimate the size of work area needed before computing FFT of an array. The documentation says:

“During plan execution, cuFFT requires a work area for temporary storage of intermediate results. The cufftEstimate*() calls return an estimate for the size of the work area required, given the specified parameters, and assuming default plan settings.”

But doesn’t mention anything about its memory consumption.I wrote this simple piece of code:

#include<cuda_runtime.h>
#include<cufft.h>
#include<cufftXt.h>
#include<stdio.h>
 
int main(){
    size_t free_mem;
    size_t work_area;
    cudaMemGetInfo(&free_mem, NULL);
    printf("free_mem: %zu\n", free_mem);
    cufftEstimate1d(98000000, CUFFT_R2C, 1, &work_area);
    printf("work_area: %zu\n", work_area);
    cudaMemGetInfo(&free_mem, NULL);
    printf("free_mem: %zu\n", free_mem);
    return 0;
}

The output I get is:

free_mem: 1780219904
work_area: 784000512
free_mem: 1751908352

My question is why free memory is reduced about 30MB after calling cufftEstimate1d()? I have another program, putting this piece of code in it and running it causes a 1GB reduction in free memory! So am I missing something? Does this really have something to do with cufftEstimate1d()? for example when I change nx from 98000000 to 98000001 the result I get is:

free_mem: 1762394112
work_area: 8
free_mem: 167247872

which is a very strange result.

Robert_Crovella · June 28, 2019, 1:35pm

The 30MB reduction is probably due to CUFFT library initialization.

The difference in work area sizes for the two cases may be due to the fact that CUFFT uses different algorithms depending on the size of the transform, in particular the prime factorization of the size. If the largest prime factor of the size is relatively small (say, 7 or less), then CUFFT has fast algorithms to handle those kind of transforms, and those algorithms may require substantial extra memory to run fast.

If the largest prime factor is large (which might be the case with 98000001), then CUFFT must use a slower algorithm, and its possible this algorithm requires little if any extra memory.

This is just a guess however. Anyway, these don’t look like problems to me.

I’m not trying to communicate any heuristic for memory required. There is no point trying to infer it. Use the estimation method provided by the API. Assumptions that because there is some pattern in a particular input parameter, I should have an expectation about the memory required, are not documented anywhere, and may be incorrect.

Regarding the 1GB statement, I wouldn’t try to explain the behavior of code you haven’t shown. It may still be some aspect of initialization. The first usage of CUDA or library functions triggers various kinds of initialization, which carries various kinds of overhead with it.

jalilia370 · June 29, 2019, 8:55am

Robert_Crovella:

The 30MB reduction is probably due to CUFFT library initialization.

The difference in work area sizes for the two cases may be due to the fact that CUFFT uses different algorithms depending on the size of the transform, in particular the prime factorization of the size. If the largest prime factor of the size is relatively small (say, 7 or less), then CUFFT has fast algorithms to handle those kind of transforms, and those algorithms may require substantial extra memory to run fast.

If the largest prime factor is large (which might be the case with 98000001), then CUFFT must use a slower algorithm, and its possible this algorithm requires little if any extra memory.

This is just a guess however. Anyway, these don’t look like problems to me.

Regarding the 1GB statement, I wouldn’t try to explain the behavior of code you haven’t shown. It may still be some aspect of initialization. The first usage of CUDA or library functions triggers various kinds of initialization, which carries various kinds of overhead with it.

Thanks for the answer.
About the 1GB statement, the second result, which shows a reduction of 1GB in free memory is for the same code I have provided. The only difference, as mentioned, is that I incremented input 98000000 by one and ran the code. Now having your answer in mind, it is obvious that the largest prime factor of 98000001 is greater than 7. So it makes sense that the work area size be much larger than the first case. Two problems though: first, the result returned in work_area is obviously incorrect. second, why is that much memory consumed just by calling cufftEstimate1d()? 30MB for library initialization makes sense, but 1GB?

EDIT: Having read my post again, I understand why one wouldn’t get that the second result is for the same code. I apologize for the ambiguity.

Robert_Crovella · June 29, 2019, 1:49pm

That’s not a 1GB reduction in free memory. That’s 100MB.

And when I run your program on CUDA 10.0, I see a 30MB reduction in free memory whether I use 98000000 or 98000001.

$ cat t2.cu
#include<cuda_runtime.h>
#include<cufft.h>
#include<cufftXt.h>
#include<stdio.h>

int main(){
    size_t free_mem;
    size_t work_area;
    cudaMemGetInfo(&free_mem, NULL);
    printf("free_mem: %zu\n", free_mem);
    cufftResult r = cufftEstimate1d(980000001, CUFFT_R2C, 1, &work_area);
    printf("r = %d\n", (int)r);
    printf("work_area: %zu\n", work_area);
    cudaMemGetInfo(&free_mem, NULL);
    printf("free_mem: %zu\n", free_mem);
    return 0;
}
$ nvcc -o t2 t2.cu -lcufft
$ ./t2
free_mem: 31569018880
r = 0
work_area: 0
free_mem: 31535464448
$

And regarding work area size, I didn’t say the work area size would be larger for the 98000001 case. Re-read my answer.

My suggestion would be that you not worry too much about trying to understand the logic of the memory allocations, since the details are entirely unpublished, and instead just do the work you’d like to do.

Topic		Replies	Views
Large data size for cuFFT GPU-Accelerated Libraries	8	3902	September 8, 2018
cufftGetSize1d fails with a CUFFT_ALLOC_FAILED error GPU-Accelerated Libraries cufft	5	657	April 12, 2023
Internal details/limitations of cuFFT, general questions GPU-Accelerated Libraries	2	592	July 19, 2018
cufftPlan2d fails CUDA Programming and Performance	14	21020	September 17, 2007
CUFFT plans, why the memory waste? CUDA Programming and Performance	8	1739	March 14, 2011
CUFFT_INTERNAL_ERROR during creation of a 1D Plan in CUFFT GPU-Accelerated Libraries cuda , cufft	11	3784	October 19, 2022
allocation problem in cuFFT CUDA Programming and Performance	2	2555	September 16, 2009
Does cufftPlan3d allocate additional memory? Why? CUDA Programming and Performance	1	1087	April 7, 2009
Memory needed for CUDA 1D FFT plan creation - or how to make saussage with CUDA hacks CUDA Programming and Performance	8	12255	February 9, 2011
Adjusting CUDA buffer sizes by GPU type CUDA Programming and Performance	2	3262	February 2, 2017

cufftEstimate*() memory consumption

Related topics