Different results between FFTW and CUFFT

Hello.

I’m replacing FFTW3 for CUFFT and I get different results with floats.

Plans:

[codebox]

// p = fftwf_plan_dft_r2c_3d(global_grid_size,global_grid_size,glob

al_grid_size,static_grid, (fftwf_complex *)static_grid, FFTW_MEASURE);

cufftPlan3d (&p_cufft, global_grid_size, global_grid_size, global_grid_size, CUFFT_R2C);

// pinv = fftwf_plan_dft_c2r_3d(global_grid_size,global_grid_size,glob

al_grid_size,multiple_fsg, (fftwf_real *)multiple_fsg, FFTW_MEASURE);

cufftPlan3d (&pinv_cufft, global_grid_size, global_grid_size, global_grid_size, CUFFT_C2R);

[/codebox]

and the FT:

[codebox]

// fftwf_execute_dft_r2c(p,static_grid,(fftwf_complex *)static_grid);

CHECK_CUDA(cudaMemcpy(static_grid_d, static_grid, sizeof_grid, cudaMemcpyHostToDevice));

cufftExecR2C( p_cufft, static_grid_d, (cufftComplex * )static_grid_d );

CHECK_CUDA(cudaMemcpy(static_grid, static_grid_d, sizeof_grid, cudaMemcpyDeviceToHost));

[/codebox]

The results start to be slightly different but the error is bigger in successive iterations.

Any help?

Thank you

I haven’t used CUFFT since 2.3, so I don’t know anything about 3.0, but back then CUFFT implemented no appropriate FFT routines for data sizes with large prime factors but used direct DFTs instead whose error is a lot worse. If at all possible, try to use power-of-two data sizes or sizes with small prime factors (2,3,5) - for those, CUFFT results should be reasonably accurate.

Hello,

global_grid_size is 128 so I suppose CUFFT is using a FFT routine, isn’t it?

Thank you

Yes, it should. How big exactly is your error (L1/relative)?

Well, here we have some values using “fftwf_execute_dft_r2c” and “cufftExecR2C” respectively, where input is a 3D array initialized to 0.0f:

CPU:
-168608.00000000000000000000000000000000000000000000000000
0.00000000000000000000000000000000000000000000000000
129608.38281250000000000000000000000000000000000000000000
4217.92529296875000000000000000000000000000000000000000
-47863.76171875000000000000000000000000000000000000000000
-5714.29687500000000000000000000000000000000000000000000
-10428.89746093750000000000000000000000000000000000000000
2505.26733398437500000000000000000000000000000000000000
17181.33984375000000000000000000000000000000000000000000
4267.99316406250000000000000000000000000000000000000000
1140.93835449218750000000000000000000000000000000000000

GPU:
-168608.00000000000000000000000000000000000000000000000000
0.00000000000000000000000000000000000000000000000000
129608.35937500000000000000000000000000000000000000000000
4217.91015625000000000000000000000000000000000000000000
-47863.75781250000000000000000000000000000000000000000000
-5714.28222656250000000000000000000000000000000000000000
-10428.90234375000000000000000000000000000000000000000000
2505.25830078125000000000000000000000000000000000000000
17181.34375000000000000000000000000000000000000000000000
4267.98291015625000000000000000000000000000000000000000
1140.94860839843750000000000000000000000000000000000000

What do you think?
Thanks

Sorry, I edited my last post instead of writing a new one.

Sorry, I edited my last post instead of writing a new one.