Problem using cuFFT

Hello,

i think i’m doing something wrong, but i can’t figure what.

Here is a very basic use of cufft.

I’m reading a raw data image, then FFT and inverse FFT, and I write the result back in another raw data file. (which is supposed to be the same … except it’s not !)

int main( void )

{

	#define NX 400

	#define NY 300

cufftHandle plan;

        cufftComplex *devPtr, *devPtrOut;

		

		float

        *data,

        *h_ResultGPU;

		data = (float*)malloc(NX*NY * sizeof(float));

        h_ResultGPU = (float *)malloc(NX*NY * sizeof(float));

	

		FILE* Fichier_image;

		Fichier_image = check_filename("bridge_petit.raw","rb");

		int ni = 0;

		for (ni = 0; ni< NX*NY; ni++) 

			{

			fread( &data[ni], 1, 1, Fichier_image);

			}

		

	/* GPU memory allocation */

        cudaMalloc((void**)&devPtr, sizeof(cufftComplex)*NX*NY);

		cudaMalloc((void**)&devPtrOut, sizeof(cufftComplex)*NX*NY);

	/* transfer to GPU memory */

        cudaMemcpy(devPtr, data, sizeof(cufftComplex)*NX*NY, cudaMemcpyHostToDevice);

/* creates 2D FFT plan */

        cufftPlan2d(&plan, NX, NY, CUFFT_C2C);

/* executes FFT processes */

        cufftExecC2C(plan, devPtr, devPtrOut, CUFFT_FORWARD);

/* executes FFT processes (inverse transformation) */

        cufftExecC2C(plan, devPtrOut, devPtrOut, CUFFT_INVERSE);

	/* transfer results from GPU memory */

        cudaMemcpy(data, devPtrOut, sizeof(cufftComplex)*NX*NY, cudaMemcpyDeviceToHost);

/* deletes CUFFT plan */

        cufftDestroy(plan);

	/* frees GPU memory */

        cudaFree(devPtr);

		cudaFree(devPtrOut);

		FILE* Fichier_image2;

		Fichier_image2 = check_filename("bridge5.raw","wb");

		ni = 0;

		for (ni = 0; ni< NX*NY; ni++) 

		{

			fwrite( &h_ResultGPU[ni], 1, 1, Fichier_image2);

		}

return 0;

}

(and if i try with NX=NY=512 for another image, i’ve got “First-chance exception : access violation reading location 0X…” for the line “cufftPlan2d(&plan, NX, NY, CUFFT_C2C);”. I tried with cufftSafeCall but I still got an error.)

There are at least two problems:

  1. You are copying float (CPU) to complex (GPU). You are probably sending garbage since “data” is smaller than devPtr.

  2. You need to normalize your results: IFF(FFT(A))= len(A) A

Well for the first point I thought about that but that’s what they’re doing in “convolution fft 2D” so … How should I do ?

Thank you for the second point, I forgot that indeed.

Here I do normalize and I use the conversion between float and complex they give us :
http://forums.nvidia.com/index.php?showtopic=199259&view=findpost&p=1244213