cufft algorithm

kubush · February 3, 2010, 9:31pm

Hi,

I am interested in cufft implementation rather than its usage/manual. Which algorithm is CUDA FFT based on? Is there any documentation (or code) about that?

Thanks in advance,

You7878 · February 7, 2010, 6:57pm

it was ported from fftw library. http://www.fftw.org/

mfatica · February 7, 2010, 7:14pm

It is not true.
It has an interface similar to FFTW ( create a plan, execute the plan, destroy the plan) but the algorithms are different.

Uncle_Joe · February 8, 2010, 6:57pm

mfatica,

Based on my experience, the 2D and real valued versions of FFT are quite slow, most likely because they’re wrappers around the complex versions. I was going to try to build a specialized 2D, real valued FFT, but do you know if NVIDIA is in the process of releasing such a version any time soon?

plegresley · February 8, 2010, 7:18pm

This is a really good paper for implementing a faster version:

[url=“http://research.microsoft.com/apps/pubs/default.aspx?id=70576”]http://research.microsoft.com/apps/pubs/de...t.aspx?id=70576[/url]

Uncle_Joe · February 8, 2010, 8:01pm

Ah, great. But after skimming it, it looks like their 2D, real performance is only 2x better than CUFFT, which isn’t what I was looking for. They didn’t use any specialized 2D approaches - they still transform the rows, then the cols. I don’t think their argument of using a 2D texture cache to make the column transforms have better locality is good - for small blocks it works, but I don’t know of how a cache can be organized so that a large row and column both have spatial locality.

Also, they used the text book approach of doing 2 for the price of 1, for real valued FFTs by converting 2 real FFTs into a complex FFT.

There was another report from Microsoft (I know Burton Smith from a talk he gave @Georgia Tech) that got better results ~300 Gflops on GTX 280, but it was only for 1D, complex FFTs.

What I’m really looking for is a 2D transform that does a 2D decomposition for better memory locality and avoids complex math at least at the finest grid levels.
The best references I can find on specialized 2D real FFTs are Pierre Duhamel’s reports from the 1980s.

Topic		Replies	Views
3DFFT efficiency CUDA Programming and Performance	1	4137	June 8, 2011
Does cufft show much higher efficiency than cpu fft routines? CUDA Programming and Performance	10	9157	July 19, 2010
Writing custom FFT for sizes other than powers of 2 CUDA Programming and Performance	2	5101	September 29, 2010
3d CUFFT issues / new implementation? CUDA Programming and Performance	6	5152	June 11, 2008
CUFFT Implementation CUDA Programming and Performance	3	7428	July 2, 2007
Implementation of FFT by Nvidia GPU-Accelerated Libraries cufft	4	285	May 22, 2024
FFT Speed vs. x86 CUDA Programming and Performance	14	24765	July 27, 2008
CUFFT CUDA Programming and Performance	3	4224	November 10, 2008
CUFFT: calculation time CUDA Programming and Performance	6	2676	April 21, 2012
Performance of CuFFT 3.1 library CUDA Programming and Performance	0	3259	July 8, 2011

cufft algorithm

Related topics