Batched 1D FFTs (using CUFFT and MEX)

yoavmor · April 8, 2008, 7:16am

Hello,

I’m trying to compute 1D FFT transforms in a batch, in such a way that the input will be a matrix where each row needs to undergo a 1D transform. The supplied fft2_cuda that came with the Matlab CUDA plugin was a tremendous help in understanding what needs to be done. This task is supposed to be relatively simple because the built in 1D FFT transform already supports batching and fft2_cuda does all the rest.
In fft2_cuda 2D FFT transform code, they have the part with:

cufftPlan2d(&plan, N, M, CUFFT_C2C) ;

Naively, I thought it would really be enough to change it into:

cufftPlan1d(&plan, N, CUFFT_C2C, M) ;

To achieve my goal of 1D FFT transforms, M times (Batch size = M) for N members in each transform.
This, alas, does not work. (Well, it does work, it just provides the wrong results :( ).

Why does this not work? Where did I go wrong…?
I have attached the full code for reference.
I feel like there’s something very basic I’m missing here to complete this…

Thank you,
Y.
fft2_cuda.zip (1.83 KB)

yoavmor · April 8, 2008, 11:18am

I made some progress… :)
I realized that what cufftExecC2C was doing was performing the FFT’s column-wise instead of row-wise. So all I had to do was to change
cufftPlan1d(&plan, N, CUFFT_C2C, M) ;
into:
cufftPlan1d(&plan, M, CUFFT_C2C, N) ;

and to call this function using:
fft2_cuda(transpose(myMatrix));

This is good but not perfect, because the overhead of transposing large matrices is quite significant. So is there any way to tell cufftExecC2C to go row-wise? Do I need to make a change to one of the pack_c2c functions?

Thanks!
Y.

yoavmor · April 8, 2008, 12:13pm

I made a change to the pack_c2c and to the unpack_c2c functions… everything is working, thanks!

:D

Moderators: This whole thread can be removed, with my apology. Question was asked and answered (by me!), all in the same afternoon… :)

Y.

_Big_Mac · April 8, 2008, 1:57pm

Why delete? Someone might have this issue in the future and find this via the Search.

tom_TUD · April 8, 2008, 8:02pm

Cool, maybe you can publish your code for all users.
What speedups have you achieved?

yoavmor · April 9, 2008, 7:28am

I have attached my code, if someone will ever be interested in something like that.
It takes a 2-D matrix and performs 1D FFT for each and every row separately but using CUDA’s batch mode. The speed-up is about x4-x5 on my system here (8800 GTX).

Note: I only changed pack_c2c and unpack_c2c, so the input right now has to be complex. I didn’t change pack_r2c, so using a matrix with real values instead of complex values will perform the transform column-wise and not row-wise. I didn’t need it so I didn’t change it for now.

Enjoy…
Y.
fft1DBatch_cuda.zip (1.93 KB)

XFer · July 10, 2008, 10:54pm

Hello,

is it really possible to decompose an ordinary Complex2Complex FFT2D in a batch of Complex2Complex FFT1D (rows)?
Does it give the same result? Sounds strange to me.
That would be very interesting, since a 4x speedup would allow CudaFFT to be faster than FFTW (at the moment, I get quite faster FFTW C2C 2D transforms, with array sizes up to 1024x1024).

Thanks for any advice.

janakasoft · March 4, 2009, 5:20pm

some common CUDA MATLAB errors and solutions are available here
[url=“http://cs.ucf.edu/~janaka/gpu/”]http://cs.ucf.edu/~janaka/gpu/[/url]

Topic		Replies	Views
CUDA FFT vs Matlab FFT CUDA FFT Library CUDA Programming and Performance	3	1251	November 6, 2009
FFT Slower with CUDA CUDA Programming and Performance	3	6662	January 17, 2010
CUFFT: calculation time CUDA Programming and Performance	6	2676	April 21, 2012
Getting Segmentation fault for higher sized input arrays for 2d CuFFT application. CUDA Programming and Performance	15	2892	July 10, 2014
CUFFT gives wrong results? the results from MATLAB and CUFFT differ... CUDA Programming and Performance	5	9518	June 15, 2009
cufftPlanMany R2C advanced layout problem CUDA Programming and Performance	2	2804	June 15, 2014
FFT Speed vs. x86 CUDA Programming and Performance	14	24760	July 27, 2008
Batched 3D FFT Implementation I'm writing one. Can anyone comment on my approach? CUDA Programming and Performance	0	4080	December 11, 2009
how to run un cudafft ? CUDA Programming and Performance	3	2480	February 20, 2011
Doing FFT along rows CUDA Programming and Performance	2	1146	December 9, 2011

Batched 1D FFTs (using CUFFT and MEX)

Related topics