cuFFT and windowing what is the best way to implement windowing for FFT


cuFFT does per se not make a provision for a windowing function to be applied for every signal in a batch. Certainly I could launch a separate kernel prior to FFT, but my concern is that this will consume additional time for loading data from GPU RAM and saving it back again.

Some time ago I have used Vasily Volkov’s incredibly fast FFT, which allowed me to tweak the more explicit FFT kernel up to a 1024 point FFT. However, for larger FFTs, the combination of several kernels makes will certainly have a similar impact on performance as using cuFFT with a preceding windowing kernel.

Does anybody have some experience with windowing and FFT?
Kind regards,