cuFFT does per se not make a provision for a windowing function to be applied for every signal in a batch. Certainly I could launch a separate kernel prior to FFT, but my concern is that this will consume additional time for loading data from GPU RAM and saving it back again.
Some time ago I have used Vasily Volkov’s incredibly fast FFT, which allowed me to tweak the more explicit FFT kernel up to a 1024 point FFT. However, for larger FFTs, the combination of several kernels makes will certainly have a similar impact on performance as using cuFFT with a preceding windowing kernel.
Does anybody have some experience with windowing and FFT?