FFT with "implicit" padding


I’d like to perform a FFT-based 1D-convolution, and I’m facing padding issues that are crushing performance.

I have a batch of arrays (my input signal) cuComplex S[N][LENGTH] on which I want to perform:

  • FFT of each N array of size LENGTH (PlanMany then ExecC2C)
  • FFT of my 1D-filter F (quite large, about 20% of LENGTH)
  • then term-to-term product of both FFTs
  • then IFFT of the result term-to-term product.

It works fine, but my problem is that I need to 1- to pad the input of the first FFT on my signal, so I’m performing a first series of memcpy, then to keep a temporary (padded) working area for FFT(S) and FFT(S).FFT(F) and then after the IFFT, i need to re-copy the result to an unpadded result buffer (more memcpy).

My padding is “just” zeroes. Is there a way to have the FFT or IFFT be performed with “implicit” padding or “a separate” padding area ? I’d just like for cuFFT to “assume” there is a number of samples with value 0 on each side of my input signal.

Sorry if the question is not clear, I’d just like to avoid so many copies ?

Implicit padding is not currently available.

You could simulate zero-padding with a load callback. I’m not sure if it would be any faster. See here.