Hi,
I have been using CUDA programming for my graduate research, and I have a question about 3D Fast Fourier Transforms (FFTs). Does someone know what the most efficient way is to compute large 3D FFTs (with sizes ranging from 7 million to 450 million points)?
Well, the largest non-Telsa card has 1GB of memory, which (if you could utilize 100% of it, which you can’t in most cases) would give you 128M elements if you did single-precision, in-place transform. The largest Tesla card has 4GB of memory, which bumps you to 512M elements (max). So, I’d say you at least need to go with the Tesla, but you may run into some memory problems with your very largest datasets.
Also, remember that you’ll need at least the same amount of memory on your PC to hold the results when you transfer them back to the host, and I’d recommend at least twice that (or more if you can afford it).