Precision lose in cuFFT when using double precision.

Hello,

The discrete Fourier Transform (DFT) are defined as a triple sum along each coordinate. So a 3D transofrm is

sum_{x,y,z}=psi(x,y,z)*exp{i*[kx*x+ky<+y+kz*z]}.

I have very large matrices. like 512x512x512. If I do this in a sequential way on a cpu. This transform will start to accumulate errros fro adding very large number to very small numbers for om values of kx., especially if I have a periodic function.

On the cpu just doing a simple sum (average of the matrix) gave 1.0e-12 error. If this accumulates can become a problem. For the serial code (cpu) I solved this problem, by first doing the averages line by line, then plane by plane, this way avoiding operation with numbers too different to each other.

I have a iterative algorithm which involves tens of thousands of FFT and iFFT. How is the cuFFT library working? How much precision is lost at 1 transform.

I am not aware of documentation in that regard. Is it not possible to simply measure the error of one CUFFT-based iteration against a (possibly naive) higher-precision reference? I assume that is what one would need to do when using a CPU-based FFT library (such as FFTW) as well. Maybe there is another way to check the quality of the final result, similar to how residuals can be used to check the results of solvers?

Hello,

Thanks for you reply. A cpu double precision could have the same problem, but I just notice a long double precision fftw lirbary. It will take me some time to dig my old fortran codes and converse them.

The “long double” version of FFTW would probably make for a convenient higher-precision reference that allows you asses the accumulated error from the double-precision (I)FFTs. Is the “long double” data in that library IEEE-754 quadruple precision, double-double, or x87 8-bit extended precision?

I think the long double refers to the 80 bit, because there is on the same page of the manual some note regarding the quadruple precision. Both cases are good for testing.

Before using CUDA, I never had so large systems that I had to worry about any errors. :)

I just remembered that on some platforms “long double” simply maps back to “double” (this is allowed under the C/C++ standards). A caveat worth looking into. The support for __float128 that is mentioned on that page seems like the best bet in terms of a higher-precision reference used to assess the quality of the double-precision results.