Wrong results in cufft!

Hi. I need help with cufft, my results are wrong and I have no idea why.

Here is my code:

#include<stdio.h>
#include<stdlib.h>
#include <cufft.h>

__global__ void print(cufftDoubleComplex *c, int h, int w){
for(int i=0; i<1; i++){
       	for (int j=0; j<w; j++){
          		printf("(%d,%d): %f + %fi\n",i+1, j+1, c[i*w+j].x, c[i*w+j].y);
        	}
		//printf("\n");
        }
}

int main(int argc, char *argv[]){
	cudaSetDevice(0);	

	int img_w=5;
	int img_h=5;

	double fx[img_w*img_h], *d_fx;	
	
	cudaMalloc((void**)&d_fx, img_w*img_h*sizeof(double));
	cufftDoubleComplex *otfFx;
	cudaMalloc((void**)&otfFx, img_w*img_h*sizeof(cufftDoubleComplex));	

	for(int i=0; i<img_w*img_h; i++){
		fx[i]=0;
	}

	fx[0]=1;
	fx[img_w-1]=-1;
	cudaMemcpy(d_fx, fx, img_w*img_h*sizeof(double), cudaMemcpyHostToDevice);

	cufftHandle plan_fx;
	cufftPlan2d(&plan_fx, img_h, img_w, CUFFT_D2Z);
	cufftExecD2Z(plan_fx, d_fx, otfFx);

	print<<<1,1>>>(otfFx, img_h, img_w);
	cudaDeviceSynchronize();
	
	cufftDestroy(plan_fx);
	cudaFree(d_fx);
	cudaFree(otfFx);
	return 0;
}

That’s what I’m getting in the first line of the result:

0.00000 + 0.00000i 0.69098 - 0.95106i 1.80902 - 0.58779i 0.00000 + 0.00000i 0.69098 - 0.95105i

It should be:

0.00000 + 0.00000i 0.69098 - 0.95106i 1.80902 - 0.58779i 1.80902 + 0.58779i 0.69098 + 0.95106i

Everything is garbage after otfFx[14], it’s like the result is 5x3 when it should be 5x5.

That’s the matlab code that gives me the “right” results:

A=[1 0 0 0 -1; 0 0 0 0 0; 0 0 0 0 0; 0 0 0 0 0; 0 0 0 0 0];
fft2(A)

Real to Complex transforms in CUFFT only return a portion of the result (“the non-redundant elements”). You may wish to read the data layout section of the doc carefully, for example to discover the expected sizes of the returned data:

http://docs.nvidia.com/cuda/cufft/index.html#data-layout

This can be particularly confusing in the 2D case (read that section too!)

My suggestion would be to start by doing the C2C (or Z2Z) transform, instead of R2C or D2Z, and see if the data is generated correctly. If so, the conversion to D2Z depends on you understanding data layout and the exact nature of the returned data. For example, in your case, if you read the documentation carefully, you will discover that the forward D2Z transform on a 5x5 array does not produce a 5x5 array as its result.

If you still need help, review this:

https://devtalk.nvidia.com/default/topic/826819/2d-cufft-wrong-result/

Thanks for the help, I had almost lost hope I would get an answer.
I will read the documentation when I have time, I have more urgent things to do right now.