FFT Slower with CUDA

bobmiller2 · September 23, 2009, 10:07pm

I am trying to get some of the speedup others have mentioned with the CUDA FFT calls. My application requires a 1D FFT of a matrix of data that is usually either 8192X4096 or 1024X1024 depending on the application. I used the FFT code from the link [post=“100”]http://developer.nvidia.com/object/matlab_cuda.html[/post] which is an example that many have tried. I took the 2D FFT and just made a 1D FFT call and no other changes. Using the tic and toc in Matlab I have noticed the CUDA FFT is slower for my application. I am using a GTX285 with 2G of RAM, but I also have an Intel i7 975 processor for the main CPU that is tough to beat. I am using Matlab 2009a and the FFT is processed across all 8 cores on the i7 chip.

I have noticed there are some cases where CUDA can be faster such as the 2D FFT, but can the 1D FFT be faster. I don’t know if this is an implementation problem with a 1D FFT or if there is just not enough gain in the GPU speed to overcome the extra memory transfers. I would be very open to any ideas or problems I may need to check for. I would also like to know if others have experienced this same problem with CUDA.

Here are the results I am getting below.
8192X4096 FFT
CUDA - .28 s
Matlab - .15s

1024x1024 FFT
CUDA - .010 s
Matlab - .005s

Thanks for any help in this.

Miraut · September 24, 2009, 9:00am

We have similar results.
I suppose MATLAB routines are programmed with Intel MKL libraries, some routines like FFT or convolution (1D and 2D) are optimized for multiple cores and -as far as we could try- they are much faster than CUDA routines with medium-size matrices. :-(
I’m very interested in any clue about this issue, too.

Thanks !

_Big_Mac · September 24, 2009, 9:32am

This is expected (CUDA being faster only for large datasets). You have additional overhead when using CUDA - copying the data to the GPU and back and setting up the kernel. Also, small datasets limit the number of threads running (less data parallelism). There’s usually a threshold at where it becomes beneficial to use CUDA. It requires experimentation on given hardware to find it. For 1D FFTs it’s usually pretty high, I hear.

Mark_Schlegel · January 17, 2010, 11:10pm

Is your timing of the CUDA FFT including the significant time it spends making the plan? It would be a good idea to make

the plan one time for your particular FFT size then save that plan and use it over and over.

Topic		Replies	Views
FFT Speed vs. x86 CUDA Programming and Performance	14	24760	July 27, 2008
Comparing cuda fft and matlab fft CUDA Programming and Performance	5	6160	February 10, 2008
CUDA slower than MATLAB... again I can't get the simplest examples to show any speed-up using GP CUDA Programming and Performance	5	2518	February 18, 2011
Batched 1D FFTs (using CUFFT and MEX) CUDA Programming and Performance	7	3630	March 4, 2009
Batched 1D FFT not faster than a loop for big images (1024x1024) GPU-Accelerated Libraries cuda	0	481	September 25, 2020
CUDA is slower than expected. Is something missing? CUDA Programming and Performance cuda , gpu , gpu-computing , parallel-computing	4	242	July 7, 2024
CUDA FFT vs Matlab FFT CUDA FFT Library CUDA Programming and Performance	3	1251	November 6, 2009
FFT Performance CUDA Programming and Performance	4	4688	March 3, 2010
Exploring power of CUDA CUDA Programming and Performance	6	989	February 23, 2012
CUFFT: calculation time CUDA Programming and Performance	6	2676	April 21, 2012

FFT Slower with CUDA

Related topics