CUFFT input sizes

It’s written that CUFFT library supports algorithms that higly optimized for input sizes can be written in the folowing form: 2^a X 3^b X 5^c X 7^d.

How could they managed to do that?

For as far as I know, FFT must provide best perfomance only for 2^a input size.

While sizes that are powers of two typically show the best performance, it is possible to achieve similar performance for powers of other small prime factor. Slide 8 of the following slide deck gives an overview for CUDA 6.5:

http://developer.download.nvidia.com/compute/cuda/6_5/rel/docs/CUDA_6.5_Performance_Report.pdf

Mixed-radix FFTs for sizes that are products of the powers of small prime factors have been extensively researched since the 1970s.

Got it, thank you.