I’d like to perform a FFT-based 1D-convolution, and I’m facing padding issues that are crushing performance.
I have a batch of arrays (my input signal) cuComplex S[N][LENGTH] on which I want to perform:
- FFT of each N array of size LENGTH (PlanMany then ExecC2C)
- FFT of my 1D-filter F (quite large, about 20% of LENGTH)
- then term-to-term product of both FFTs
- then IFFT of the result term-to-term product.
It works fine, but my problem is that I need to 1- to pad the input of the first FFT on my signal, so I’m performing a first series of memcpy, then to keep a temporary (padded) working area for FFT(S) and FFT(S).FFT(F) and then after the IFFT, i need to re-copy the result to an unpadded result buffer (more memcpy).
My padding is “just” zeroes. Is there a way to have the FFT or IFFT be performed with “implicit” padding or “a separate” padding area ? I’d just like for cuFFT to “assume” there is a number of samples with value 0 on each side of my input signal.
Sorry if the question is not clear, I’d just like to avoid so many copies ?