Performing 2D FFTs on large images


If I am trying to do FFT convolutions on images that are large , what is the best way to do this on a CUDA device? I am not an FFT expert, but the code I wrote basically does what the 2D convolution example does. FFT on an up-sized kernel, FFT on the original image, multiply, and do an inverse FFT.

How do I calculate how much space this will take? It seems to me like I have to count 2x my image size for the R2C plan, 2x for the C2R plan, 1xData, 1xKernel, 1xPaddedData, and 1x PaddedKernel.

This very quickly will limit the usability of this on a GPU for processing large images.

So what strategies are there for processing these images? Can you do this same process on Tiles of the original image, and then just recombine the output all at the end?

Also - what is considered a “large FFT?” you see this floated around in some papers and stuff, but it seems like they are talking about something like 256x256? This is a really small image. I am trying to apply convolutions to images at least 1000x1000, and maybe up to about 5000x5000 pixels