CUFFT without padding?

I recently got into CUDA in order to speed up a specific image processing task. I need to read an image, transform it into frequency space, multiply it with a kernel there, and transform the result back into the spatial domain. I found an official example that does almost what I need to do (NVIDIA_CUDA-8.0_Samples/3_Imaging/convolutionFFT2D/). In this example, an input image and a convolution kernel are padded, transformed, multiplied and then transformed back.

My question is, is there a way to perform the cuFFT without padding the input image? Using the original image dimensions results in a CUDA error: code=2(CUFFT_ALLOC_FAILED) “cufftPlan2d(&fftPlanInv, fftH, fftW, CUFFT_C2R)”

I only have my kernel in the frequency space, of the exact same size as the input image. Padding of the image before the FFT results in a spectrum that is larger than the convolution kernel. Is there another possible solution?

I’m not sure the error you get is really due to padding or lack of it. Quickly reading the plan functions in the documentation, I don’t see it as a requirement. See section 3.2.2:
[url]https://docs.nvidia.com/cuda/cufft/index.html[/url]

Since I didn’t see this sample myself, does it mention why it is applying padding? Does the particular implementation require a square image, or sizes that are powers of 2, or something else?

How big is the image and how much free memory do you have at the time of creating the plan? See cudaMemGetInfo(). If you can load the image to device memory, a raw estimation of how much you will need is: the image itself + (2 * this amount, for the complex array) + the output image. There are cuFFT specifics behind the scenes that might require some more memory, but cuFFT has functions to estimate how much the operation will require based on the plan you pass. See section 3.4

Hi, thank you. The example code is barely commented, I’m not sure why they use padding, either for speed or because it is necessary. But I hadn’t read either that padding is necessary, and I successfully used the OpenCV cuda::dft() without padding, but they might apply it internally.

My image is 7716x5364 in dimensions. Using these original dimensions results in the above error. OpenCV has a function for finding good image sizes for FFT, it’s called cv::getOptimalDFTSize(). This returns 5400x7776, and padding my input image to these dimensions avoids the CUFFT_ALLOC_FAILED error. I’m just not sure how to adjust my kernel to the padded size in the frequency domain.

But this minimal example works:

int fftH = 5364;
int fftW = 7716;

cufftHandle
fftPlanFwd,
fftPlanInv;

checkCudaErrors(cufftPlan2d(&fftPlanFwd, fftH, fftW, CUFFT_R2C));
checkCudaErrors(cufftPlan2d(&fftPlanInv, fftH, fftW, CUFFT_C2R));

Hmm. I don’t get it

As far as I know, the operation will perform the fastest if the dimensions are powers of 2, but cuFFT chooses other algorithms if they are not, so there should be just a slight loss of performance (and using cuFFT with diverse sizes not powers of 2, they run essentially as fast) but not an error.

Can you arrange a 1D plan with your original sizes, 7716 * 5364, and just see if it runs?
Assuming you don’t manage to get the 2D FFT running without padding, what happens if you simply let it run and, when you have the R2C finished, just read up to 7716 and 5364?

How are your variables declaration and memory allocation being done?

It works now! So padding is not necessary for the CUFFT. What I had to do was not apply the padding function from the example at all. Before, I had called it with the original image dimensions when I didn’t want to use padding, but that seems to cause difficulties.