CUDA FFT: a bit short?

Chalnoth · February 25, 2007, 9:53am

Well, I managed to get CUDA up and running, after installing a 32-bit Linux distribution, and almost all of the SDK samples worked just fine.

However, one problem is that the FFT sample only supports length 512 arrays, it seems. If CUDA is to be useful at all for the FFT stuff I want to use it for, I’m going to need to run FFT’s on 1-D arrays that are millions in length. Is it at all feasible to do this on the G8x efficiently?

mfatica · February 25, 2007, 10:10pm

The current beta release supports 1D/2D/3D transforms up to 16384 in each dimension.

The limit will be increased in future releases for 1D transforms.

Chalnoth · February 25, 2007, 10:35pm

Hmm. Thanks. I’ll have to look into it a bit more closely, then, to find out what is going wrong. 16384 is the bare minimum for these FFT’s to be of any use to me, so perhaps I’ll be able to get something done after all.

harrisog · April 18, 2007, 9:19pm

Will that limit become a function of available system memory only or some lower number?

mfatica · April 18, 2007, 9:43pm

The 1D transform will be able to handle up to 1M elements.

harrisog · April 19, 2007, 12:29pm

Is that 1M bound by card memory or by implementation decisions? Any plans to go beyond that or should we begin developing user code to go higher?

mfatica · April 19, 2007, 2:46pm

It is limited by implementation decisions.

If you want to do very long transforms that are power of 2, take a look at the presentation that Cleve Moler gave to SC06 for the HPC benchmark
(http://www.hpcchallenge.org/presentations/sc2006/moler-slides.pdf).

You can easily implement a very long one, doing a bunch of 1D in batch mode, a transpose ( for which there is now an efficient implementation in the new examples), a multiplication by a proper twiddle factor followed by another transpose and more 1D ffts.