I am performing FFT (Z2Z) on an image of NXN size; as far as I understand, if I am doing an in-place C2C or Z2Z, then I do not need to pad my last dimension. But when I do an IFFT on the image generated by the real data (upon doing FFT), then I do not get the same image back. I am dividing by the number of elements (N*N) after getting the results from the inverse transform. Here are the steps followed by the CUFFT_Z2Z function, please help me in finding the bug and clearing any misconceptions. May be the bug is related to data allocation.

//Read image into NXN double array - img
//Allocate NXN cufftDoubleComplex for host_data and device_data arrays
//Transfer image data to host structure
host_data.x[...] = img[...]
host_data.y[...] = 0.0
//Memcpy and kernel launch
//Memcpy device_data to host_data
//Transfer the host_data back to img
img[...] = host_data[...].x
//Write an image from the contents of the img array
/*************CUFFT Z2Z*****************/
void Z2Z_gpu (cufftDoubleComplex *data, unsigned int nx, unsigned int ny, int dir)
{
cufftHandle plan;
/* Create a 2D FFT plan */
cudasafe( cufftPlan2d(&plan, nx, ny, CUFFT_Z2Z));
cudasafe( cufftSetCompatibilityMode ( plan , CUFFT_COMPATIBILITY_NATIVE ));
/* Forward transform the signal in place */
if ( dir )
cudasafe( cufftExecZ2Z ( plan, data, data, CUFFT_FORWARD ));
/* Inverse transform the signal in place */
if ( !dir )
cudasafe( cufftExecZ2Z ( plan , data , data , CUFFT_INVERSE ));
/* Destroy the CUFFT plan */
cufftDestroy ( plan ) ;
}

This is expected behavior. The FFT / IFFT results need to be normalized at some point by the number of elements to get back the original results. Most people normalized the results of IFFT. Some normalize FFT and IFFT each by sqrt(number of elements).

Thank you Pavan. I was normalizing the IFFT results with NXN, and upon some more testing, I found that the error increases with the variance in pixel values. Apart from NXN, I have also tried sqrt(NXN) on FFT and IFFT as you suggested, but again for some other images it fails. Perhaps I need to understand more about the normalization step to come up with a general solution.

The normalization is the toal number of points (NxN). You can apply the normalization at anytime, because it is a multiplication or divition by the same number. What does the

cufftSetCompatibilityMode ( plan , CUFFT_COMPATIBILITY_NATIVE )

command do?

I think the problem it is not in the Z2Z_gpu function, but in the parts where you copy and replot the image. You should check these lines

//Transfer image data to host structure
host_data.x[...] = img[...]
host_data.y[...] = 0.0
//Memcpy and kernel launch
//Memcpy device_data to host_data
//Transfer the host_data back to img
img[...] = host_data[...].x
//Write an image from the contents of the img array

Just do a simple test (only forward and backward transform of a complex matrix in a separate program, try first with a matrix which is all 1).