FFT Performance Discussion about CUDA FFT performance

What’s the theoretical FLOP performance for the CUDA FFT?

Using fftw.org’s MFLOP calculation and varying the sample and batch size, our max calculation was around 45 GFLOPS with a sample size of 1k and batch size > 100. Does that seem ballparkish?

Any advice on tuning the FFT?

Mucho thanks!