Internal details/limitations of cuFFT, general questions

saulocpp · July 19, 2018, 8:44am

Good morning, all.

I wrote code which uses cuFFT for 1D operations and it works as it should, but I came across some doubts of its internal work. Maybe you know some of these?

Function cufftPlan1d(), second argument is “int nx”, the length of the transform. Is there any reason as to why it is int, and not unsigned int or size_t?
Do you manage to get any transform bigger than 2^28 (268435456) to run? This is the biggest I can get to successfully run on a 1080Ti. As far as byte counting goes running a R2C, the float array (input) will be 1GB and the cufftComplex array will be 2GB. When I try a length of 2^29 (536870912), on which total size will be 6GB for the arrays, the operation will stop on the allocation. Is it an internal limit on 1D or something else? I made another operation that takes almost 10GB of the 11GB in the 1080Ti without issues.
In 2.2.1 of the cuFFT documentation, [url]https://docs.nvidia.com/cuda/pdf/CUFFT_Library.pdf[/url], it suggests to first create a plan and THEN allocate the memory, which seems to be the opposite of, for example, FFTW. Do you know of any prejudice if we do the opposite? What about when freeing things? First destroy the plan and then cudaFree the arrays? My program currently allocates memory and then creates plan, and destroys plan and then deallocates memory.
We don’t use kernel functions to launch a cuFFT process, so how does it do its parallelization? I have the same doubt for cuRAND, which we also launch without kernel specifications.

If you guys know any of these, then I’d like to hear from you.
Thanks a lot for your time and assistance provided many times.

Robert_Crovella · July 19, 2018, 11:35am

There are other planning functions that can be used for larger array sizes. Read the docs. For instance:

https://docs.nvidia.com/cuda/cufft/index.html#unique_1980839421

Ordinary CUFFT usage will involve the planning step doing an underlying allocation. I think the size of the allocation is not published, but you can break out the memory allocation step separately and manage it yourself. In so doing you can get an idea of how much temporary memory CUFFT requires for various operations. Read the docs. I would assume if your array sizes are 6GB that you are running out of memory due to the temporary allocations that CUFFT makes/requires.

If you are managing the memory allocation yourself, it should not matter if you do the planning process before or after the memory allocation, as long as you allocate sufficient memory.

CUFFT calls are a call to a function in a C-library (the cufft library, i.e. libcufft). The library functions generally will make kernel calls, and may do other CUDA runtime activity as well. You can use a profiler to get an idea of what is happening under the hood.

saulocpp · July 19, 2018, 12:48pm

Thanks for these clarifications, txbob.

Following the document you linked, it looks like I can use the cufftMakePlanMany64() function passing NULL to inembed and onembed so it behaves like an ordinary FFT call, except that it handles the bigger types.

As for the other answers, thanks again. They have enough information for me to dig a bit more, like printing extra return values and running the profiler.

Topic		Replies	Views
Large data size for cuFFT GPU-Accelerated Libraries	8	4213	September 8, 2018
CuFFT :: Invalid Plan CUDA Programming and Performance	2	3266	June 17, 2009
[SOLVED] cuFFT not liking a given length (error 2), but will accept larger work GPU-Accelerated Libraries	5	930	July 2, 2019
Memory needed for CUDA 1D FFT plan creation - or how to make saussage with CUDA hacks CUDA Programming and Performance	8	12377	February 9, 2011
Arbitrary sizes in cuFFT GPU-Accelerated Libraries	0	313	May 26, 2020
allocation problem in cuFFT CUDA Programming and Performance	2	2624	September 16, 2009
CUFFT_INTERNAL_ERROR during creation of a 1D Plan in CUFFT GPU-Accelerated Libraries cuda , cufft	11	4139	October 19, 2022
cufftPlan2d fails CUDA Programming and Performance	14	21149	September 17, 2007
CUFFT 1D Memory Usage Inconsistencies CUDA Programming and Performance	1	2889	September 25, 2008
Problem with the cuFFT library . CUDA Programming and Performance	3	929	October 8, 2013

Internal details/limitations of cuFFT, general questions

Related topics