Problem with CUFFT R2C+C2R returning NaNs

jamelo · June 25, 2012, 7:03pm

I’m trying to perform convolution using FFTs. However, when applying a CUFFT R2C and then a C2R transform to an image (without any processing in between), any part of the original image that had zeros is now littered with NaNs. I cannot perform convolution like this because the convolution kernel will have a ton of NaNs in it. What might be causing this issue? Might the result be any different if I use a C2C transform?

Here is the code I’m using that’s relevant to performing the FFT. Although not shown, I’m checking the error codes and all seems to be well.

cufftHandle forwardPlan = NULL, inversePlan = NULL;

cufftPlan2d(&forwardPlan, height, width, CUFFT_R2C);

cufftSetCompatibilityMode(forwardPlan, CUFFT_COMPATIBILITY_NATIVE);

cufftPlan2d(&inversePlan, height, width, CUFFT_C2R);

cufftSetCompatibilityMode(inversePlan, CUFFT_COMPATIBILITY_NATIVE);

cufftExecR2C(forwardPlan, (cufftReal*) d_image, (cufftComplex*) d_image);

cufftExecC2R(inversePlan, (cufftComplex*) d_image, (cufftReal*) d_image);

The size of d_image is width*(height / 2 + 1)*sizeof(cufftComplex). The data is not padded.

pasoleatis · June 25, 2012, 8:02pm

Hello,

Everything looks ok. The error must be in another place when you do the allocations or when you copy back the data. I am not using the compatibility mode and everything works without problem using an array with padding.

Here is my code without compatibility mode using double precision:

const int lx=...,ly=...;

   const int totsize=lx*ly,totsize_pad=lx*2*(ly/2+1),totsize_invspa=lx*(ly/2+1);

   static cufftDoubleReal hpsi[totsize_pad;  // data o host padded

   cufftDoubleReal *dpsi;

   cufftHandle prc,pcr;   

   cudaMalloc((void**)&dpsi, sizeof(cufftDoubleReal)*totsize_pad);

   cufftPlan2d(&prc,lx,ly,CUFFT_D2Z);

   cufftPlan2d(&pcr,lx,ly,CUFFT_Z2D); 

// initiliazing the data on host

int count;

    count=0;

    for(int i=0;i<lx;i++)

    { 

    for(int j=0;j<2*(ly/2+1);j++)   

    {

    if(j<ly)

    { 

     hpsi[count]=

    }

    else

    {

    hpsi[count]=0;

    }

    count=count+1;  

    }}

// transfer from host to device

cudaMemcpy(dpsi, hpsi, sizeof(double)*totsize_pad,cudaMemcpyHostToDevice);

    cufftExecD2Z(prc,dpsi,(cufftDoubleComplex*)psi); // forward transform

    cufftExecZ2D(pcr,(cufftDoubleComplex*)dpsi, dpsi); // backward transform

// transfer back the results

   cudaMemcpy(hpsi, dpsi, sizeof(double)*totsize_pad,cudaMemcpyDeviceToHost);

// retrieve the data from the host array

count=0;

    for(int i=0;i<lx;i++)

    { 

    for(int j=0;j<2*(ly/2+1);j++)   

    {

    if(j<ly)

    { 

        printf("%d %d %lf", i,j,hpsi[count]);

    }

    else

    {

    hpsi[count]=0;

    }

    count=count+1;  

    }}

Use the flag -arch=sm_13 (or 20 or 21) for getting double precision.

jamelo · June 25, 2012, 8:28pm

This doesn’t seem to be the issue as the data gets copied back perfectly if I skip doing any FFTs.

Using double precision is something I would prefer not to do as I’d like to keep memory requirements to a minimum. It also seems weird that using single precision would cause an issue: why should any calculation with lower precision return NaN when it should return an actual number?

If I add 0.1 to all the pixels in the image before the FFT and subtract 0.1 from all the pixels after the FFT, it seems to work right. This doesn’t seem like a valid solution to me though as it will change the result of convolution and requires more calculations than necessary.

I may do some tests to see if I get better results with double precision, with padding, or with a C2C transform. In the mean time, what else may be causing this?

pasoleatis · June 25, 2012, 10:34pm

Forget about double precision. Try my code with single precision. It should work wihtout problem. It was the first test I did when I started using the fft. There is always possibility of bugs in libraries, but in the cufft at least this test forward and then backward transform will work without problem. I do not think the problem is in the cufft calls. If you want more debugging, try posting a little bit more code.

Topic		Replies	Views
Problem with CUFFT Z2Z CUDA Programming and Performance	3	4711	February 7, 2012
CUFFT R2C / C2R in 2D? I can't seem to make it work... CUDA Programming and Performance	1	2346	December 22, 2008
2D CUFFT wrong result GPU-Accelerated Libraries cufft	8	3071	November 7, 2023
CUFFT without padding? CUDA Programming and Performance	4	854	October 10, 2018
cuFFT 2D Convolution CUDA Programming and Performance	2	3841	April 3, 2014
2D CUFFT problem CUDA Programming and Performance	1	691	February 9, 2012
2D Convolution problem following example from SDK source code included CUDA Programming and Performance	9	11641	June 7, 2011
cuFFT and fftw CUDA Programming and Performance	10	4145	August 25, 2010
Problem with CUFFT CUDA Programming and Performance	7	4891	May 16, 2018
CUFFT on image GPU-Accelerated Libraries	2	2415	April 29, 2015

Problem with CUFFT R2C+C2R returning NaNs

Related topics