Well, I managed to get CUDA up and running, after installing a 32-bit Linux distribution, and almost all of the SDK samples worked just fine.
However, one problem is that the FFT sample only supports length 512 arrays, it seems. If CUDA is to be useful at all for the FFT stuff I want to use it for, I’m going to need to run FFT’s on 1-D arrays that are millions in length. Is it at all feasible to do this on the G8x efficiently?
Hmm. Thanks. I’ll have to look into it a bit more closely, then, to find out what is going wrong. 16384 is the bare minimum for these FFT’s to be of any use to me, so perhaps I’ll be able to get something done after all.
You can easily implement a very long one, doing a bunch of 1D in batch mode, a transpose ( for which there is now an efficient implementation in the new examples), a multiplication by a proper twiddle factor followed by another transpose and more 1D ffts.