CUDA FFT: a bit short?

Well, I managed to get CUDA up and running, after installing a 32-bit Linux distribution, and almost all of the SDK samples worked just fine.

However, one problem is that the FFT sample only supports length 512 arrays, it seems. If CUDA is to be useful at all for the FFT stuff I want to use it for, I’m going to need to run FFT’s on 1-D arrays that are millions in length. Is it at all feasible to do this on the G8x efficiently?

The current beta release supports 1D/2D/3D transforms up to 16384 in each dimension.

The limit will be increased in future releases for 1D transforms.

Hmm. Thanks. I’ll have to look into it a bit more closely, then, to find out what is going wrong. 16384 is the bare minimum for these FFT’s to be of any use to me, so perhaps I’ll be able to get something done after all.

Will that limit become a function of available system memory only or some lower number?

The 1D transform will be able to handle up to 1M elements.

Is that 1M bound by card memory or by implementation decisions? Any plans to go beyond that or should we begin developing user code to go higher?

It is limited by implementation decisions.

If you want to do very long transforms that are power of 2, take a look at the presentation that Cleve Moler gave to SC06 for the HPC benchmark
(http://www.hpcchallenge.org/presentations/sc2006/moler-slides.pdf).

You can easily implement a very long one, doing a bunch of 1D in batch mode, a transpose ( for which there is now an efficient implementation in the new examples), a multiplication by a proper twiddle factor followed by another transpose and more 1D ffts.