Problem with CUFFT R2C+C2R returning NaNs

I’m trying to perform convolution using FFTs. However, when applying a CUFFT R2C and then a C2R transform to an image (without any processing in between), any part of the original image that had zeros is now littered with NaNs. I cannot perform convolution like this because the convolution kernel will have a ton of NaNs in it. What might be causing this issue? Might the result be any different if I use a C2C transform?

Here is the code I’m using that’s relevant to performing the FFT. Although not shown, I’m checking the error codes and all seems to be well.

cufftHandle forwardPlan = NULL, inversePlan = NULL;

cufftPlan2d(&forwardPlan, height, width, CUFFT_R2C);

cufftSetCompatibilityMode(forwardPlan, CUFFT_COMPATIBILITY_NATIVE);

cufftPlan2d(&inversePlan, height, width, CUFFT_C2R);

cufftSetCompatibilityMode(inversePlan, CUFFT_COMPATIBILITY_NATIVE);

cufftExecR2C(forwardPlan, (cufftReal*) d_image, (cufftComplex*) d_image);

cufftExecC2R(inversePlan, (cufftComplex*) d_image, (cufftReal*) d_image);

The size of d_image is width*(height / 2 + 1)*sizeof(cufftComplex). The data is not padded.

Hello,

Everything looks ok. The error must be in another place when you do the allocations or when you copy back the data. I am not using the compatibility mode and everything works without problem using an array with padding.

Here is my code without compatibility mode using double precision:

const int lx=...,ly=...;

   const int totsize=lx*ly,totsize_pad=lx*2*(ly/2+1),totsize_invspa=lx*(ly/2+1);

   static cufftDoubleReal hpsi[totsize_pad;  // data o host padded

   cufftDoubleReal *dpsi;

   cufftHandle prc,pcr;   

   cudaMalloc((void**)&dpsi, sizeof(cufftDoubleReal)*totsize_pad);

   cufftPlan2d(&prc,lx,ly,CUFFT_D2Z);

   cufftPlan2d(&pcr,lx,ly,CUFFT_Z2D); 

// initiliazing the data on host

int count;

    count=0;

    for(int i=0;i<lx;i++)

    { 

    for(int j=0;j<2*(ly/2+1);j++)   

    {

    if(j<ly)

    { 

     hpsi[count]=

    }

    else

    {

    hpsi[count]=0;

    }

    count=count+1;  

    }}

// transfer from host to device

cudaMemcpy(dpsi, hpsi, sizeof(double)*totsize_pad,cudaMemcpyHostToDevice);

    cufftExecD2Z(prc,dpsi,(cufftDoubleComplex*)psi); // forward transform

    cufftExecZ2D(pcr,(cufftDoubleComplex*)dpsi, dpsi); // backward transform

// transfer back the results

   cudaMemcpy(hpsi, dpsi, sizeof(double)*totsize_pad,cudaMemcpyDeviceToHost);

// retrieve the data from the host array

count=0;

    for(int i=0;i<lx;i++)

    { 

    for(int j=0;j<2*(ly/2+1);j++)   

    {

    if(j<ly)

    { 

        printf("%d %d %lf", i,j,hpsi[count]);

    }

    else

    {

    hpsi[count]=0;

    }

    count=count+1;  

    }}

Use the flag -arch=sm_13 (or 20 or 21) for getting double precision.

This doesn’t seem to be the issue as the data gets copied back perfectly if I skip doing any FFTs.

Using double precision is something I would prefer not to do as I’d like to keep memory requirements to a minimum. It also seems weird that using single precision would cause an issue: why should any calculation with lower precision return NaN when it should return an actual number?

If I add 0.1 to all the pixels in the image before the FFT and subtract 0.1 from all the pixels after the FFT, it seems to work right. This doesn’t seem like a valid solution to me though as it will change the result of convolution and requires more calculations than necessary.

I may do some tests to see if I get better results with double precision, with padding, or with a C2C transform. In the mean time, what else may be causing this?

Forget about double precision. Try my code with single precision. It should work wihtout problem. It was the first test I did when I started using the fft. There is always possibility of bugs in libraries, but in the cufft at least this test forward and then backward transform will work without problem. I do not think the problem is in the cufft calls. If you want more debugging, try posting a little bit more code.