Guidelines for GPU comparison and performances assessment

Hello,

I have an application mostly running big batched FFT using CuFFT, in single precision. I am trying to estimate the best GPU card that I could use for this. My dataset is roughly speaking 9-12GB.

I am quite lost in all parameters (number of cuda cores, memory bandwidth, frequency of the processors, TFLOPS, etc) and I don’t quite see which one are the most relevant for my application. Can I just consider the TFLOPS?

Would you have some guidelines (or documentation) of which series of GPU (Tesla, Volta, Quadro) would be best in my case (first, let’s forget about the price, only performance).

Thanks!

Large FFTs are typically bound by memory throughput, as the computational density is low.

Does it mean that I should focus on the memory bandwidth only?