Short 1D complex FFT performance [2046-32768]


I currently work on a code very FFT intensive. The code compute many short 1D complex FFT. The lenght of the fft are typically between 2048 and 32768. I know that the performance in that particular situation is limited by the transfert overhead, but I wonder if I can expect some performance increase with the cuda code running on a 8800 GTS over my current config (fftw on a Athlon64 4000+ with 2Gb of RAM) ? Someone could give me some performance number before I make my purchase ?

Thanks !

Can you batch them?

I don’t know the exact meaning of the question… but I can tell that each fft depend on the previous one.


With the CUDA FFT library, it is possible to transform multiple 1D sequences at once ( we call it batch mode).

Which kind of processing do you perform between transforms? You may be able to move that to the GPU too, in order to amortize some of the I/O time.

In fact, this is a splitstep fourier algorithm in witch a exponential operator is apply in alternance with the fft on a laser pulse profile propagating in a optical fiber.

Thanks for your help !