I have developed from a MATLAB code a GPU CUDA code, I checked intermediate products of the algorithm and they practically the same at each stage of the processing. But after a cufft2d call the output diverge from the MATLAB one. The differences are not compatible with differences of input matrices in CUDA code and MATLAB code respectively.
Let’s say for sake of simplicity, U_cuda and U_matlab the 2 matrices N1xN2 obtained by CUDA code and MATLAB respectively. What happens is that:
U_cuda and U_matlab are equal unless a relative tollerance of 1e-6.
cufft(U_cuda) and ff2(U_matlab) are different and maximum differece is about 30% in some points of the output, that is unacceptable. Indded, the final result of CUDA code is drastically affected: for istance MATLAB code produce 35.45 while CUDA 45.17 in points where FFT and cuFFT do not agree.
In order to understand if there was a problem above the FFT call in the code I made some experimental trial.
I) I evaluated by means of FFT_matlab the fft of U_cuda in standard precision (- fft2(U_cuda) command)
II) I evaluated by means of cuFFT the FFT of U_matlab in single precision
III) I evaluated by means of cuFFT the FFT of U_matlab in double precision
IV) I evaluated by means of FFTW the FFT of U_cuda in single precision
V) I evaluated by means of FFTW the FFT of U_cuda in double precision
VI) I evaluated by means of FFTW the FFT of U_matlab in double precision
The result is that: FFT_matlab(U_matlab) = FFT_matlab(U_cuda) != FFT_cufft (U_matlab)=FFT_cufft(U_cuda)=FFTW(U_cuda) (both single and double prec.) =FFTW(U_matlab) (both single and double prec.).
How it is possible?
Why FFT MATLAB is different and also it seems to be the correct one as the matrix has some symmetry properties that only with FFT_matlab appear?
I’m getting crazy. Is it possible that FFTW and CUFFT are less accurate than MATLAB FFT?
Please, help me.
p.s. the size of the input matrix U is just 16x16!!