 # Can somebody explain this? (cufft strange result!)

Hi,

I have developed from a MATLAB code a GPU CUDA code, I checked intermediate products of the algorithm and they practically the same at each stage of the processing. But after a cufft2d call the output diverge from the MATLAB one. The differences are not compatible with differences of input matrices in CUDA code and MATLAB code respectively.

Let’s say for sake of simplicity, U_cuda and U_matlab the 2 matrices N1xN2 obtained by CUDA code and MATLAB respectively. What happens is that:

U_cuda and U_matlab are equal unless a relative tollerance of 1e-6.
cufft(U_cuda) and ff2(U_matlab) are different and maximum differece is about 30% in some points of the output, that is unacceptable. Indded, the final result of CUDA code is drastically affected: for istance MATLAB code produce 35.45 while CUDA 45.17 in points where FFT and cuFFT do not agree.

In order to understand if there was a problem above the FFT call in the code I made some experimental trial.

I) I evaluated by means of FFT_matlab the fft of U_cuda in standard precision (- fft2(U_cuda) command)
II) I evaluated by means of cuFFT the FFT of U_matlab in single precision
III) I evaluated by means of cuFFT the FFT of U_matlab in double precision
IV) I evaluated by means of FFTW the FFT of U_cuda in single precision
V) I evaluated by means of FFTW the FFT of U_cuda in double precision
VI) I evaluated by means of FFTW the FFT of U_matlab in double precision

The result is that: FFT_matlab(U_matlab) = FFT_matlab(U_cuda) != FFT_cufft (U_matlab)=FFT_cufft(U_cuda)=FFTW(U_cuda) (both single and double prec.) =FFTW(U_matlab) (both single and double prec.).

How it is possible?

Why FFT MATLAB is different and also it seems to be the correct one as the matrix has some symmetry properties that only with FFT_matlab appear?

I’m getting crazy. Is it possible that FFTW and CUFFT are less accurate than MATLAB FFT?

Thanks.

p.s. the size of the input matrix U is just 16x16!!

Are you taking in account the different normalization factor?

In matlab: IFFT(FFT(A))=A

In cuFFT: IFFT(FFT(A))=length(A)*A

Also, cuFFT is expecting data in row-major, Matlab in column major ( but this should not be a problem in your case, since N=M=16).
If the array is complex, remember also that cuFFT is storing the arrays in interleaved format (re.1,im,1,re.2,im.2,…) , Matlab in split format (re.1,re.2,…,im.1,im.2,…)

In the CUDA mex examples, there is a working FFT2 wrapper I wrote, you may want to take a look.

Yes, normalisation is considered as maximum value of ouputs of FFT_matlab and cuFFT coincides.

Concerning the other issues you pointed out I can say that i’m using cuFloatComplex type for cufft input and output, so i met the interleaving criteria and I’m quite sure that the data is stored in row-major.

By the way, as I’m using a 2D FFT with the same number of output points in conjugate domain (NxN output data) so the storage order should be irrelevant.

Could you test basic vector, for example, dimension N1 x N2,

X(i,j) = cos( 2piki/N1) * cos( 2pi *m *j / N2 ) for frequency k = 0,1,2,…, N1-1, m = 0, 1, 2, …, N2-1

because cufff(X) is very simple.

consider one-dimensional case, X(i) = cos(2pik*i/N), then

value of non-redudant frequency is

cufft(X) = N for k = 0 or N/2 (N is even)

cufft(X) = N/2 otherwise

Also you can test other basis, say

X(i,j) = cos( 2piki/N1) * sin( 2pi *m *j / N2 )

X(i,j) = sin( 2piki/N1) * cos( 2pi *m *j / N2 )

X(i,j) = sin( 2piki/N1) * sin( 2pi *m *j / N2 )

I test basis vector of 1-D, 2-D, 3-D for cuda 2.3, cuFFT works fine.