sliding FFT with input transform

Hello all,

I have a tricky (for me) signal processing problem. I’ve sketched it below, with a proposed solution. I’d appreciate any feedback on my assumptions, and general direction.

Say we have a long array, A = [a_0, …, a_(N-1)], and a short array B = [b_0, …, b_127]. I need to compute FFT([a_k, …, a_(k+127)]*B) for all k<N-128 (where “*B” means element-wise multiplication with B).

I believe this an ideal application for callbacks [1] (?).

Unfortunately, I can’t use them. From the docs: “NOTE:The callback API is available in the statically linked cuFFT library only, and only on 64 bit LINUX operating systems.”

In that case, I need to write a custom kernel–something like the “before” case in [2]. The problem is with memory. If N is large (say 2^20), then there are a lot of these: [a_k, …, a_(k+127)]. The only way I forward I can see is to process in batches.

Does that make sense?

Cheers

Gary

[1] https://docs.nvidia.com/cuda/cufft/index.html#callback-routines
[2] https://devblogs.nvidia.com/cuda-pro-tip-use-cufft-callbacks-custom-data-processing/