cuFFT v/s Fourn(numerical recipes)

Hi I’m working in a astronomical projects, specifically in the image processing area.

I’m using the CUDA FFT, but I’m having bad results, I think that this could be the cuFFT.

In cuFFT the exponential is powered by a negative expression, and in the numerical recipes FFT the exponential is powered by a positive expression.

Someone knows how to get the same results with cuFFT and numerical recipes FFT?

Or how to make them theorically equal?

Thanks in advanced.

How exactly are the results “bad”? Are there numerical differences or completely different results? What kind of FFT are you performing? 1D, 2D, C2C, R2C, C2R? Note that CUFFT provides a non-normalized FFT. For R2C, C2R you may be affected by symmetry. Your code may be setting up the FFT incorrectly. Can you post minimal, buildable code operating on a minimal data set that demonstrates the issue you are encountering?

I don’t know if is that. But there is a multicore parallel code that uses fourn and works well. The only thing that is different is the FFT.

So I’m performing a 2D FFT C2C. Before performing the FFT I store the Image value in the real part of the image, and the complex are zero.

I do a fftshift before and after the FFT (there is a post around this forum where explains that).

Without code to inspect any third party will be limited to more or less intelligent guesses as to what could be the issue in your code. By the way, it still is not clear how the CUFFT results you get differ from what you expect.

The code is huge, but here is a little part

cufftHandle plan;
if ((cufftPlan2d(&plan, M, N, CUFFT_C2C))!= CUFFT_SUCCESS) {
		printf("cufft plan error\n");
		return -1;
	}

cudaEventCreate(&start);
  	cudaEventCreate(&stop);
  	cudaEventRecord(start, 0);
  	fftshift<<<numBlocksNN, threadsPerBlockNN>>>(I, device_aux, N);
  	gpuErrchk(cudaDeviceSynchronize());
  	cudaEventRecord(stop, 0);
  	cudaEventSynchronize(stop);
  	cudaEventElapsedTime(&time, start, stop);
  	//printf("CUDA first fftshift execution time = %f ms\n",time);
    global_time = global_time + time;

  	//FFT

  	cudaEventCreate(&start);
  	cudaEventCreate(&stop);
  	cudaEventRecord(start, 0);
  	if ((cufftExecC2C(plan, (cufftComplex*)device_aux, (cufftComplex*)device_V, CUFFT_FORWARD)) != CUFFT_SUCCESS) {
  		printf("CUFFT exec error\n");
  		//return -1 ;
  		exit(0);
  	}
  	gpuErrchk(cudaDeviceSynchronize());
  	cudaEventRecord(stop, 0);
  	cudaEventSynchronize(stop);
  	cudaEventElapsedTime(&time, start, stop);
  	//printf("CUDA FFT execution time = %f ms\n",time);
    global_time = global_time + time;

  	////FFTSHIFT VISIBILITIES
  	cudaEventCreate(&start);
  	cudaEventCreate(&stop);
  	cudaEventRecord(start, 0);
  	fftshift<<<numBlocksNN, threadsPerBlockNN>>>(device_V, device_V, N);
  	gpuErrchk(cudaDeviceSynchronize());
  	cudaEventRecord(stop, 0);
  	cudaEventSynchronize(stop);
  	cudaEventElapsedTime(&time, start, stop);
  	//printf("CUDA second fftshift time = %f ms\n",time);
    global_time = global_time + time;

__global__ void fftshift(cufftComplex *IorV, cufftComplex *aux, long N)
{

		int i = threadIdx.y + blockDim.y * blockIdx.y;
		int j = threadIdx.x + blockDim.x * blockIdx.x;

		float a = 1-2*((i+j)&1);
		aux[N*i+j].x = IorV[N*i+j].x * a;
		aux[N*i+j].y = IorV[N*i+j].y * a;

}

It’s difficult to explain because there is a minimization algorithm behind. FFT is to get de objective function and the gradient of that function.

Maybe bad results wasn’t the best word. But the results will be clearly different due to the difference of how the FFT is taken.

Here the part of NR of how fourn is coded.

http://www.aip.de/groups/soe/local/numres/bookfpdf/f12-4.pdf