CUFFT Implementation

What FFT variants are implemented in CUFFT library? Can I assume they are compatible and comparable to FFTW, given the same input data?

One of my purpose is to compare FFT performance on various platforms. If CUFFT implementation are close to FFTW, then I don’t need to port the code.

Many thanks in advance!!!

CUFFT is very similar to FFTW. The documentation is included in the SDK download. From the docs:

This version of the CUFFT library supports the following features:

  • 1D, 2D, and 3D transforms of complex and real‐valued data.
  • Batch execution for doing multiple 1D transforms in parallel.
  • 2D and 3D transform sizes in the range [2, 16384] in any
  • 1D transform sizes up to 8 million elements.
  • In‐place and out‐of‐place transforms for real and complex data.

Simon, what would be the limit for a 2D complex transform on an 8800GTX?

Can you elaborate that how ‘similar’ CUFFT is to FFTW? Like does CUFFT also perform planning as FFTW does, and applies different codelets to the subprobems? I know CUFFT provides a similar interface to FFTW, but I care more about its implementation.

Thank you!!!