Good morning, all.
I wrote code which uses cuFFT for 1D operations and it works as it should, but I came across some doubts of its internal work. Maybe you know some of these?
Function cufftPlan1d(), second argument is “int nx”, the length of the transform. Is there any reason as to why it is int, and not unsigned int or size_t?
Do you manage to get any transform bigger than 2^28 (268435456) to run? This is the biggest I can get to successfully run on a 1080Ti. As far as byte counting goes running a R2C, the float array (input) will be 1GB and the cufftComplex array will be 2GB. When I try a length of 2^29 (536870912), on which total size will be 6GB for the arrays, the operation will stop on the allocation. Is it an internal limit on 1D or something else? I made another operation that takes almost 10GB of the 11GB in the 1080Ti without issues.
In 2.2.1 of the cuFFT documentation, https://docs.nvidia.com/cuda/pdf/CUFFT_Library.pdf, it suggests to first create a plan and THEN allocate the memory, which seems to be the opposite of, for example, FFTW. Do you know of any prejudice if we do the opposite? What about when freeing things? First destroy the plan and then cudaFree the arrays? My program currently allocates memory and then creates plan, and destroys plan and then deallocates memory.
We don’t use kernel functions to launch a cuFFT process, so how does it do its parallelization? I have the same doubt for cuRAND, which we also launch without kernel specifications.
If you guys know any of these, then I’d like to hear from you.
Thanks a lot for your time and assistance provided many times.