CUDA FFT low accurary compared with FFTW using single precision float

I have some accuracy problem using CUDA FFT compared with FFTW3F.
The GPU is RTX3080, CUDA and NVCC version 11.1.

I create a Eigen::Matrix with row/column are 2048. The difference between CUDA/FFTW3F larger than 1e-3.
what I expect is less than 1e-5.

real max coeff: 0.00195312
real min coeff: -0.00183105
imag max coeff: 0.00195312
imag min coeff: -0.00195312

The test code attached.fft_cmp.tar.gz (3.3 MB)

  1. cuFFT doesn’t promise identical results to FFTW, therefore FP64 should be the reference.
  2. You should be looking at relative error, not absolute.
  3. See double precision results below, absolute errors have same magnitude

####################################
FFTW3F results vs FFTW3
reference max real value: 5927.72
reference max imag value: 6195.68
computed max real value: 5927.71
computed max imag value: 6195.68
real max diff: 0.00146484
real min diff: -0.00146484
imag max diff: 0.00146484
imag min diff: -0.00146484

####################################
cuFFT results vs FFTW3
reference max real value: 5927.72
reference max imag value: 6195.68
computed max real value: 5927.72
computed max imag value: 6195.68
real max diff: 0.00244141
real min diff: -0.00268555
imag max diff: 0.00244141
imag min diff: -0.00234985