I am trying to get some of the speedup others have mentioned with the CUDA FFT calls. My application requires a 1D FFT of a matrix of data that is usually either 8192X4096 or 1024X1024 depending on the application. I used the FFT code from the link [post=“100”]http://developer.nvidia.com/object/matlab_cuda.html[/post] which is an example that many have tried. I took the 2D FFT and just made a 1D FFT call and no other changes. Using the tic and toc in Matlab I have noticed the CUDA FFT is slower for my application. I am using a GTX285 with 2G of RAM, but I also have an Intel i7 975 processor for the main CPU that is tough to beat. I am using Matlab 2009a and the FFT is processed across all 8 cores on the i7 chip.

I have noticed there are some cases where CUDA can be faster such as the 2D FFT, but can the 1D FFT be faster. I don’t know if this is an implementation problem with a 1D FFT or if there is just not enough gain in the GPU speed to overcome the extra memory transfers. I would be very open to any ideas or problems I may need to check for. I would also like to know if others have experienced this same problem with CUDA.

Here are the results I am getting below.

8192X4096 FFT

CUDA - .28 s

Matlab - .15s

1024x1024 FFT

CUDA - .010 s

Matlab - .005s

Thanks for any help in this.