CUDA FFT vs Matlab FFT CUDA FFT Library


I hope someone can help me with a problem I am having.

I am trying to do 1D FFT in a 1024*1000 array (one column at a time). I am trying to move my code from Matlab to CUDA. The Matlab fft() function does 1dFFT on the columns and it gives me a different answer that CUDA FFT and I am not sure why…I have tried all I can think off but it still does the same… :wacko:

Is the CUDA FFT library different? Is this result expected?

My code is here:


#define ROWS 1024

#define COLUMNS 1000

// CUFFT plan

cufftHandle plan;


cufftSafeCall(cufftExecR2C(plan, (cufftReal *)d_image_buff, (cufftComplex *)d_result_buff));


where the d_image_buff contains the 1024*1000 elements array. Is this the way I should be using the library?

Any help is greatly appreciated!


Matlab and CUFFT use two different formats for complex arrays.
In Matlab, you have all the real components, followed by the imaginary components.
On CUFFT they are interleaved. You will need to shuffle them.

Thanks a lot for your help.

I found out a small bad assumption I was making. I was indeed using the cufftComplex data types to take care of the interleaved data.

The problem was more in the sense that the Matlab FFT returns a 1024 array out of a 1024 point FFT which is rather interesting…as far as I understand we should get only half the size (meaning 512 points out of an 1024 point FFT). CUDA was indeed doing this correctly but I was expecting the 1024 points and hence the data won’t match. CUDA returns 512 out of a 1024 point FFT as it should be. I still get slightly different results (e-4 order) but I guess htis is related to the single point precision of CUDA vs the double point procesion of Matlab. I will try with the double point precision libraries to see what I get

Thanks for your help again!

The transform of 1024 real elements will be 513 complex elements ( N/2 +1).