I did a 400-point FFT on my input data using 2 methods:
C2C Forward transform with length nx*ny and
R2C transform with length nx*(nyh+1)
Observations when profiling the code:
Method 1 calls SP_c2c_mradix_sp_kernel 2 times resulting in 24 usec.
Method 2 calls SP_c2c_mradix_sp_kernel 12.32 usec and SP_r2c_mradix_sp_kernel 12.32 usec.
So eventually there’s no improvement in using the real-to-complex transform over the complex-to-complex transform. Theoretically, there should be an improvement as Method 2 uses only half the size of the second dimension. Am I missing something? This is also mentioned in page 21 of the CUFFT_Library_3.1 Manual.
Secondly, my results are not matching using the R2C transform between CUFFT and FFTW. Don’t know what’s the issue here…?
double* ffcorr1; cufftComplex *f1_d; cudaMalloc((void**) &ffcorr1, sizeof(double) * pix3); cudaMalloc((void**) &f1_d, sizeof(cufftComplex) * pix1 * (pix2/2 + 1)); // create plan for CUDA FFT cufftHandle plan_forward1; CUFFT_SAFE_CALL(cufftPlan2d(&plan_forward1, pix1, pix2, CUFFT_R2C)); CUFFT_SAFE_CALL(cufftExecR2C(plan_forward1, (cufftReal*) ffcorr1, f1_d)); //cast double* ffcorr1 as cufftReal* //Destroy CUFFT context CUFFT_SAFE_CALL(cufftDestroy(plan_forward1));
double* ffcorr1; fftw_complex *f1; ffcorr1 = (double*) malloc(sizeof(double) * pix3); f1 = fftw_malloc ( sizeof ( fftw_complex ) * pix1 * (pix2/2+1) * n); plan_forward1 = fftw_plan_dft_r2c_2d ( pix1, pix2, ffcorr1, f1, FFTW_ESTIMATE ); fftw_execute ( plan_forward1 );