Query related to CUFFT

I am using FFT for an Image Processing application. I am planning to replace my existing CPU based FFT (which is based on Cooley Tuckey algorithm) with CUFFT.

I have few questions regarding CUFFT.

( 1 ) Which is the FFT algorithm used internally by CUFFT. Is it Cooley Tuckey?
Then only I can compare the performance.

( 2 ) How much speed up can I expect for an Image of size 5k * 5k? A rough idea.

My CPU application takes around 71 seconds to complete.

Thanks in advance

you can download the CUFFT source to check out which algorithm is used :)

Could you tell me where I can download the CUFFT source code, please?

Thank you!

Cuda announcements & news


On a G92, for a Complex2Complex Forward transform + Backward transform (that is, you get your original image back) of a 2048x2048 grayscale image, I measured around 100 milliseconds (0.1 seconds), including interleaving/deinterleaving (Re + Im <-> complex) and Host<->GPU data transfers.

For comparison, FFTW takes around 0.3 seconds (3x the time) for the same task, and that’s with a 3.0 GHz quad-core using the Single Precision SSE version of FFTW, multi-threaded (NThreads = 4).

So I’d say, CudaFFT is pretty fast in this situation.

Please note that I can’t try 5Kx5K, since on my G92 512MB I can only go as far as 3Kx3K or something around that (I think a 1.5GB setup should do 5Kx5K).

Your mileage may vary if your image has rows# and/or cols# not multiple of the number of Stream Multiprocessors your GPU has (because of uncoalesced memory access and/or bank conflicts, if I recall well).

More benchmarks here:


and here