In this simple program I have attached, it appears that the results are not the same. Is this some byproduct of the way I am calculating the abs() function for the std::complex vs. the cuComplex value?
I plotted this and it seems like the CUFFT bars are scaled somewhat more than the FFTWF bars across the 50 rows.
Any idea whats goign on there? driver.cpp (1.89 KB)
So what do people do to fix this? I am working on some already existing code that is “correct” and I am trying to do this GPU implementation. do I have to manually fix the normalization?
Does this have anything to do with the compatibilityMode? I tried a couple of those, couldnt seem to make a difference.
Also, I modified the output to try putting out the real and imaginary parts of each of these. It is different for each. IN the FFTW case the 2-49 entries real component is -25. The values are different int he CUFFT version. The DC component of the FFTW is (1225,0) and (1225,1225)
And from the FFTW site:
The DFT results are stored in-order in the array out, with the zero-frequency (DC) component in out[0]. The array in is not modified. Users should note that FFTW computes an unnormalized DFT, the sign of whose exponent is given by the dir parameter of fftw_create_plan. Thus, computing a forward followed by a backward transform (or vice versa) results in the original array scaled by n. See Section What FFTW Really Computes, for the definition of DFT.
So maybe it is CUFFT doing the normalization, or FFTW doing none?
It appears that when I run IFFT(CUFFT(A)) I get a value that is scaled by (sqrt(2)*50) . This is contrary to the documentation, which says t his will give me something scaled by the number of elements. (which is 50 for my test case)