 # Cufft double precision

Hi,
I have a problem using Cufft with double precision numbers. When doing a forward transform followed by inverse transform, the output does not equal the original data…
However, when I use the following code with smaller numbers, it seems to be correct.
Any hint to where the error is would be great. Thank you in advance.

``````...
....

int N = number_of_points_in_directions_XYZ ;

cufftResult e;
cufftDoubleComplex *rho, *d_rho, *rho2;
cufftDoubleComplex *d_rho_complex_out;
cufftHandle fft_rho_forward_z2z;
cufftHandle fft_rho_inverse_Z2Z;

//malloc rho on host
rho = (cufftDoubleComplex*) malloc(sizeof(cufftDoubleComplex) *N*N*N );
rho2 = (cufftDoubleComplex*) malloc(sizeof(cufftDoubleComplex) *N*N*N );

//malloc rho device
if (cudaSuccess != cudaMalloc((void**)&d_rho,sizeof(cufftDoubleComplex)*N*N*N))
fprintf(stderr, "allocation of d_rho on device failed\n");
CudaTest("allocation of d_rho on device failed");

if (cudaSuccess != cudaMalloc((void**)&d_rho_complex_out,sizeof(cufftDoubleComplex)*N*N*N))
fprintf(stderr, "malloc of d_rho_complex_out on device failed\n");
CudaTest("malloc d_rho_complex_out on device failed");

//create forward plan
e = cufftPlan3d(&fft_rho_forward_z2z,N,N,N,CUFFT_Z2Z);
fprintf(stderr, "cufftPlan3d creation of fft_rho_forward_z2z failed with %s\n",_cudaGetErrorEnum(e));

//create inverse plan
e = cufftPlan3d(&fft_rho_inverse_Z2Z,N,N,N,CUFFT_Z2Z);
fprintf(stderr, "cufftPlan3d creation of fft_rho_inverse_Z2Z failed with %s\n",_cudaGetErrorEnum(e));

//fill rho on host
//rho and rho2 will hold the same data.
for(int i = 0; i < N*N*N ; i++)
{

//mass is an array of doubles
rho[i].x = mass[i];//using e.g. 1.0 here will produce correct numbers (0.00) in the output.
rho[i].y = 0.0;
rho2[i]= rho[i];

}

//copy rho to gpu;
if (cudaSuccess != cudaMemcpy(d_rho,rho, sizeof(cufftDoubleComplex)*N*N*N, cudaMemcpyHostToDevice))
fprintf(stderr, "copying of rho to d_rho on device failed\n");
CudaTest("copying rho to device failed");

//execute forward fft.
//result will be in rho_complex_out
e = cufftExecZ2Z(fft_rho_forward_z2z,d_rho,d_rho_complex_out,CUFFT_FORWARD);
fprintf(stderr, "cufftExecZ2Z of fft_rho_forward_z2z failed with %s\n",_cudaGetErrorEnum(e));

//destroy the forward plan.
cufftDestroy(fft_rho_forward_z2z);

//execute inverse fft
//result will be in d_rho
e = cufftExecZ2Z(fft_rho_inverse_Z2Z,d_rho_complex_out,d_rho,CUFFT_INVERSE);
fprintf(stderr, "cufftExecZ2Z of fft_rho_inverse_Z2Z failed with %s\n",_cudaGetErrorEnum(e));

//destroy the inverse plan
cufftDestroy(fft_rho_inverse_Z2Z);

//copy reult back to host
if (cudaSuccess != cudaMemcpy(rho,d_rho, sizeof(cufftDoubleComplex) * N*N*N, cudaMemcpyDeviceToHost))
fprintf(stderr, "copying of d_rho on device to rho on host failed\n");
CudaTest("copying of d_rho on device to rho on host failed");

//cleanup memory on decive
cudaFree(d_rho);
cudaFree(d_rho_complex_out);

//print difference betwee the first 32 numbers in source and output
for (int i = 0; i < 32; i++)
{
double ff2 = rho[i].x/(double)(N*N*N);
printf("%e\n",(ff2)-rho2[i].x);
}
...
....
``````

An example of the output…

``````0.000000e+00
7.430939e+16
5.404320e+16
1.125900e+17
-4.728780e+16
9.007199e+15
-2.251800e+16
1.351080e+16
8.106479e+16
1.576260e+16
3.602880e+16
``````

After testing the same procedure with floats (using CUFFTC2C…), the same output appears.
What am i doing wrong?

WIth CUFFT:
IFFT(FFT(A))=len(A)*A
so, you need to normalize.