I currently work on a code very FFT intensive. The code compute many short 1D complex FFT. The lenght of the fft are typically between 2048 and 32768. I know that the performance in that particular situation is limited by the transfert overhead, but I wonder if I can expect some performance increase with the cuda code running on a 8800 GTS over my current config (fftw on a Athlon64 4000+ with 2Gb of RAM) ? Someone could give me some performance number before I make my purchase ?
In fact, this is a splitstep fourier algorithm in witch a exponential operator is apply in alternance with the fft on a laser pulse profile propagating in a optical fiber.