CuFFT - Limits and decomposition of large data set

relain · January 7, 2008, 7:53pm

Hi,
I’m working on a CUDA project and since other people at my lab have found out about it i’ve been asked to help with some other things too. One of the more interesting is doing 2d giga-element FFT’s.

Now i’ll be the first to say that I know absolutely nothing about how the FFT algorithm works, I do know what a Fourier Transform is and all that (so im not totally in the dark).

What I want to know is can you keep on pushing the CUFFT functions to run with huge data sets? I suspect the idea is yes, but it’ll be crap, or perhaps there is simply a flat out cut-off point.

This then raises the second question, has anyone tried butchering a distributed FFT algorithm, to decompose the problem into smaller “GPU efficient blocks” which could be shuffled on and off the card (or across the PCI bus and also around in the card RAM) where instead of running on many machines (as in a cluster) you just run all of the blocks in series (or on as many cards as you have in your box).

It seems like this would probably work quite nicely, and it would at least be faster than doing a regular FFT, simply because the FPU’s are super speedy on the card and you still have more compute paralellism than on a cpu.

I did a quick google, but only came up with decompositions over clusters of CPU’s.

Thanks for the help

MisterAnderson42 · January 7, 2008, 8:53pm

FFT the algorithm naturally scales to larger and larger sizes without any problems because it continually breaks the problem into halves recursively. What you will run into in trying such large transforms on the GPU is a memory limitation (only 1.5 GiB on Tesla, less on other cards).

relain · January 8, 2008, 12:39am

I see, the CUFFT documentation states that the 2d FFT stops at 16536 elements on a side (off the top of my head) so it seems like it might be efficient to break the problem down to the best size for occupancy / coalescing on the card. Along the lines that large scale cluster based fft algorithms work (i’m still trying to figure this out).

Ill keep reading things

Topic		Replies	Views
Big arrays with cuda? CUDA Programming and Performance	2	2220	April 13, 2010
Large data size for cuFFT GPU-Accelerated Libraries	8	4198	September 8, 2018
FFT on very large data sets CUDA Programming and Performance	7	7797	November 15, 2011
How to decide maximum size of the data using 1D CUFFT CUDA Programming and Performance	1	1846	September 19, 2011
Larger FFTs coming? CUDA Programming and Performance	7	6345	June 26, 2007
Internal details/limitations of cuFFT, general questions GPU-Accelerated Libraries	2	662	July 19, 2018
CUDA FFT: a bit short? CUDA Programming and Performance	6	11595	April 19, 2007
Trouble with cuFFT on multiple GPUs GPU-Accelerated Libraries	13	3875	August 26, 2017
CUFFT 1D Memory Limitations CUDA Programming and Performance	0	2302	April 8, 2010
FFT problem on a 8800GT 1G card CUDA Programming and Performance	4	5444	April 2, 2008

CuFFT - Limits and decomposition of large data set

Related topics