What’s the theoretical FLOP performance for the CUDA FFT?
Using fftw.org’s MFLOP calculation and varying the sample and batch size, our max calculation was around 45 GFLOPS with a sample size of 1k and batch size > 100. Does that seem ballparkish?
Any advice on tuning the FFT?
Mucho thanks!