Wrong results produced by Cufft

simon-clavel · September 14, 2018, 3:31pm

Hi everyone,

I work on a 2000x2000 real matrix from a csv file and I want to perform a fft on it, like on matlab, scilab “fft2(Matrix)”, but I’am not able to get the same results…

int Nx, Ny;

cufftDoubleReal* champ = mCSVReader->getRealData(Nx, Ny);

if (!champ && Nx == -1 && Ny == -1)
{
	printf("[ERROR] Invalid input matrix !\n");
        return EXIT_FAILURE;
}

cufftDoubleComplex* champFFT = (cufftDoubleComplex*) malloc(Nx * Ny * sizeof(cufftDoubleComplex));

if (!launchFFT_R2C(champ, champFFT, Nx, Ny))
{
	printf("[ERROR] Unable to perform fft() !\n");
	return EXIT_FAILURE;
}

printComplexMatrix(champFFT, Nx, Ny, 5, true);

I have tried with the functions below :

__host__ bool launchFFT_R2C(const cufftDoubleReal * dataIn, cufftDoubleComplex * dataOut, int col, int row)
{
	cufftResult_t result;
	cufftHandle plan;

	cufftDoubleReal *d_idata;
	cufftDoubleComplex *d_odata;

	cudaMalloc(&d_idata, row * col * sizeof(cufftDoubleReal));
	cudaMalloc(&d_odata, row * col * sizeof(cufftDoubleComplex));

	cudaMemcpy(d_idata, dataIn, row * col * sizeof(cufftDoubleReal), cudaMemcpyHostToDevice);

	result = cufftPlan2d(&plan, row, col, CUFFT_D2Z); ERROR_HANDLER_CUFFT
	result = cufftExecD2Z(plan, d_idata, d_odata); ERROR_HANDLER_CUFFT

	cudaMemcpy(dataOut, d_odata, sizeof(cufftDoubleComplex) * row * col, cudaMemcpyDeviceToHost);

	cudaFree(d_idata);
	cudaFree(d_odata);
	cufftDestroy(plan);

	return true;
}

__host__ bool launchFFT_C2C(const cufftDoubleComplex * dataIn, cufftDoubleComplex * dataOut, int NX, int NY, int direction)
{
	cudaError_t err;
	cufftDoubleComplex *gpu_initial_array;
	cufftDoubleComplex *gpu_transformed_array;

	err = cudaMalloc(&gpu_initial_array, NX * NY * sizeof(cufftDoubleComplex)); ERROR_HANDLER_CUDA
	err = cudaMalloc(&gpu_transformed_array, NX * NY * sizeof(cufftDoubleComplex)); ERROR_HANDLER_CUDA
	err = cudaMemcpy(gpu_initial_array, dataIn, NX * NY * sizeof(cufftDoubleComplex), cudaMemcpyHostToDevice); ERROR_HANDLER_CUDA

	err = cudaMemcpy2D(gpu_initial_array, sizeof(cufftDoubleComplex), dataIn, sizeof(double), sizeof(double), NX * NY, cudaMemcpyHostToDevice);

	cufftHandle plan;

	cufftResult_t result;
	result = cufftPlan2d(&plan, NX, NY, CUFFT_Z2Z); ERROR_HANDLER_CUFFT
	result = cufftExecZ2Z(plan, gpu_initial_array, gpu_transformed_array, direction); ERROR_HANDLER_CUFFT

	err = cudaMemcpy(dataOut, gpu_transformed_array, NX * NY * sizeof(cufftDoubleComplex), cudaMemcpyDeviceToHost); ERROR_HANDLER_CUDA

	cudaFree(gpu_initial_array);
	cudaFree(gpu_transformed_array);
	cufftDestroy(plan);

	return true;
}

And I use this one to print out my results :

void Application::printComplexMatrix(const cufftDoubleComplex* data, int cols, int rows, bool cufft_symmetry)
{
	int sym_cols = cols;

	if (cufft_symmetry) sym_cols = cols / 2 + 1;

	printf("Real Part: \n");
	for (int i = 0; i < rows; i++) 
	{
		for (int j = 0; j < cols; j++)
			if (j >= sym_cols)
				printf("%f ", data[i * sym_cols + (cols - j)].x);
			else
				printf("%f ", data[i * sym_cols + j].x);
		printf("\n");
	}

	printf("Imag Part: \n");
	for (int i = 0; i < rows; i++) 
	{
		for (int j = 0; j < cols; j++)
			if (j >= sym_cols)
				printf("%f ", -data[i * sym_cols + (cols - j)].y); // complex (hermitian) symmetry
			else
				printf("%f ", data[i * sym_cols + j].y);
		printf("\n");
	}
}

Can someone help me with that ?

My wrong results on the five first elements with launchFFT_R2C() function :

Real Part:
7832466.053380 336112.003120 -241287.237986 156417.233556 -64705.152967
-1297.465045 36688.280115 -39401.822953 22499.600954 114.471259
-15039.533396 17345.256324 -9181.924529 -2246.558350 9686.908870
-10097.060700 4820.556155 2492.306439 -6415.139433 5468.776117
-1167.533065 -3182.003654 5014.081975 -2978.421501 -416.719322
Imag Part:
0.000000 1583.099541 -2273.031640 2210.305203 -1219.242600
-30.616769 1037.027652 -1299.636249 848.103546 4.755247
-709.002702 899.346736 -519.653047 -137.874700 639.530842
-714.718918 363.790663 199.798430 -545.387079 490.502953
-110.532166 -315.999213 521.184674 -324.179986 -47.576409
Appuyez sur une touche pour continuer...

The good results on the five first elements (from matlab) :

7.8284e+06+0i | 3.3606e+05+1055.8i | -2.4121e+05-1515.6i | 1.5629e+05+1473i | -64547-811.17i

3.3606e+05+1055.8i | -2.9203e+05-1834.9i | 2.2444e+05+2115.3i | -1.3986e+05-1757.6i | 55427+870.72i

-2.4121e+05-1515.6i | 2.2444e+05+2115.3i | -1.7112e+05-2150.4i | 98701+1550.5i | -30268-570.61i

1.5629e+05+1473i | -1.3986e+05-1757.6i | 98701+1550.5i | -46815-882.54i | -2167.2-47.667i

-64547-811.17i | 55427+870.72i | -30268-570.61i | -2167.2-47.667i | 28277+710.83i

simon-clavel · September 17, 2018, 9:26am

Anyone ?

Robert_Crovella · September 17, 2018, 1:58pm

This example may be of interest:

[url]https://devtalk.nvidia.com/default/topic/826819/2d-cufft-wrong-result/[/url]

If you don’t provide a complete code that makes it easy for others to test your code, in my experience you’re less likely to get help.

simon-clavel · September 17, 2018, 2:34pm

Thank you for your help but I have already tried this. I will try again maybe I missed something.

Robert_Crovella · September 18, 2018, 1:46am

In the real-to-complex case, when you are printing the matrix with your function, the FFT data has symmetry so you should be passing true here instead of false:

printComplexMatrix(champFFT, Nx, Ny, 5, false);
                                        ^^^^^

Other than that, when I build a small test case around your code, it seems to produce the same results as in the thread I linked:

$ cat t296.cu
#include <cufft.h>
#include <stdio.h>
#include <assert.h>

__host__ bool launchFFT_R2C(const cufftDoubleReal * dataIn, cufftDoubleComplex * dataOut, int col, int row)
{
        cufftResult_t result;
        cufftHandle plan;

        cufftDoubleReal *d_idata;
        cufftDoubleComplex *d_odata;

        cudaMalloc(&d_idata, row * col * sizeof(cufftDoubleReal));
        cudaMalloc(&d_odata, row * col * sizeof(cufftDoubleComplex));

        cudaMemcpy(d_idata, dataIn, row * col * sizeof(cufftDoubleReal), cudaMemcpyHostToDevice);

        result = cufftPlan2d(&plan, row, col, CUFFT_D2Z);
        assert(result == CUFFT_SUCCESS);
        result = cufftExecD2Z(plan, d_idata, d_odata);
        assert(result == CUFFT_SUCCESS);

        cudaMemcpy(dataOut, d_odata, sizeof(cufftDoubleComplex) * row * col, cudaMemcpyDeviceToHost);

        cudaFree(d_idata);
        cudaFree(d_odata);
        cufftDestroy(plan);

        return true;
}

void printComplexMatrix(const cufftDoubleComplex* data, int cols, int rows, int limit, bool cufft_symmetry)
{
        if (limit > 0)
        {
                if (cols != rows && rows == 1)
                {
                        cols = limit;
                }
                else
                {
                        cols = limit;
                        rows = limit;
                }
        }

        int sym_cols = cols;
        if (cufft_symmetry) sym_cols = cols / 2 + 1;

        printf("Real Part: \n");

        for (int i = 0; i < rows; i++)
        {
                for (int j = 0; j < cols; j++)
                        if (j >= sym_cols)
                                printf("%f ", data[i * sym_cols + (cols - j)].x);
                        else
                                printf("%f ", data[i * sym_cols + j].x);
                printf("\n");
        }

        printf("Imag Part: \n");
        for (int i = 0; i < rows; i++)
        {
                for (int j = 0; j < cols; j++)
                        if (j >= sym_cols)
                                printf("%f ", -data[i * sym_cols + (cols - j)].y); // complex (hermitian) symmetry
                        else
                                printf("%f ", data[i * sym_cols + j].y);
                printf("\n");
        }
}
void Print2DComplex(int rows, int cols, cufftDoubleComplex *data, bool cufft_symmetry = false){

        int sym_cols = cols;
        if (cufft_symmetry) sym_cols = cols/2 + 1;
        printf("Real Part: \n");
        for (int i = 0; i < rows; i++){
          for (int j = 0; j < cols; j++)
            if (j>=sym_cols)
              printf("%f ", data[i*sym_cols+(cols-j)].x);
            else
              printf("%f ", data[i*sym_cols+j].x);
          printf("\n");}
        printf("Imag Part: \n");
        for (int i = 0; i < rows; i++){
          for (int j = 0; j < cols; j++)
            if (j>=sym_cols)
              printf("%f ", -data[i*sym_cols+(cols-j)].y); // complex (hermitian) symmetry
            else
              printf("%f ", data[i*sym_cols+j].y);
          printf("\n");}
}



int main(){
        const int row = 4;
        const int col = 4;
        int Nx = col;
        int Ny = row;
        cufftDoubleReal A[row][col] =
        {{ 1, 2, 3, 4},
         { 5, 6, 7, 8},
         { 9,10,11,12},
         {13,14,15,16}};
        cufftDoubleReal *d_idata;
        cufftDoubleComplex *d_odata;
        cudaMalloc((void**)&d_idata, sizeof(cufftDoubleComplex)*row*col);
        cudaMalloc((void**)&d_odata, sizeof(cufftDoubleComplex)*row*col);
        cudaMemset(d_idata, 0, sizeof(cufftDoubleComplex)*row*col);
        cudaMemset(d_odata, 0, sizeof(cufftDoubleComplex)*row*col);

cufftDoubleReal* champ = &(A[0][0]);

cufftDoubleComplex* champFFT = (cufftDoubleComplex*) malloc(Nx * Ny * sizeof(cufftDoubleComplex));

if (!launchFFT_R2C(champ, champFFT, Nx, Ny))
{
        printf("[ERROR] Unable to perform fft() !\n");
        return EXIT_FAILURE;
}

printComplexMatrix(champFFT, Nx, Ny, 4, true);
Print2DComplex(row,col,champFFT, true);
}
$ nvcc -o t296 t296.cu -lcufft
$ cuda-memcheck ./t296
========= CUDA-MEMCHECK
Real Part:
136.000000 -8.000000 -8.000000 -8.000000
-32.000000 -0.000000 0.000000 -0.000000
-32.000000 -0.000000 0.000000 -0.000000
-32.000000 -0.000000 -0.000000 -0.000000
Imag Part:
-0.000000 8.000000 0.000000 -8.000000
32.000000 0.000000 0.000000 -0.000000
-0.000000 0.000000 0.000000 -0.000000
-32.000000 -0.000000 0.000000 0.000000
Real Part:
136.000000 -8.000000 -8.000000 -8.000000
-32.000000 -0.000000 0.000000 -0.000000
-32.000000 -0.000000 0.000000 -0.000000
-32.000000 -0.000000 -0.000000 -0.000000
Imag Part:
-0.000000 8.000000 0.000000 -8.000000
32.000000 0.000000 0.000000 -0.000000
-0.000000 0.000000 0.000000 -0.000000
-32.000000 -0.000000 0.000000 0.000000
========= ERROR SUMMARY: 0 errors
$

If you still need help, I suggest providing a complete test case. I have no idea how to interpret a 5x5 section of your output matrix for a 2000x2000 transform.

simon-clavel · September 18, 2018, 7:44am

Thank you for your help I get the good results now !

You are right I just missed to put cufft_symmetry at true :/

simon-clavel · September 28, 2018, 2:22pm

I tried in the same way on a litle matrix 4x4 :

1.5 2.0 9.9 5.0
3.0 1.0 9.0 7.3
6.9 1.1 4.0 3.0
4.0 2.9 3.0 1.0

And I got the good results below with fft2() on Scilab :

64.6 | -10.5 + 9.3i | 18. | -10.5 - 9.3i
3.4 - 9.4i | -3.1 + 8.1i | -2.4 - 0.6i | -19.5 + 5.9i
2.2 | -0.5 + 0.5i | 4.4 | -0.5 - 0.5i
3.4 + 9.4i | -19.5 - 5.9i | -2.4 + 0.6i | -3.1 - 8.1i

But with my cufft function the results are different on the last columns :

Real Part:
64.600000 -10.500000 18.000000 -10.500000
3.400000 -3.100000 -2.400000 -3.100000
2.200000 -0.500000 4.400000 -0.500000
3.400000 -19.500000 -2.400000 -19.500000
Imag Part:
-0.000000 9.300000 0.000000 -9.300000
-9.400000 8.100000 -0.600000 -8.100000
-0.000000 0.500000 0.000000 -0.500000
9.400000 -5.900000 0.600000 5.900000

It seems that the results are just not ordered correctly but I don’t know why because the exemple with your matrix works very well:

{{ 1, 2, 3, 4},
{ 5, 6, 7, 8},
{ 9,10,11,12},
{13,14,15,16}};

I used launchFFT_R2C() and printComplexMatrix() with boolean to true like above.

My function launchFFT_C2C() doesn’t work too and the output is completly wrong.

I use CUDA 9.1

simon-clavel · October 1, 2018, 12:35pm

Anyone ?

simon-clavel · October 4, 2018, 11:44am

The solution : https://devtalk.nvidia.com/default/topic/826819/2d-cufft-wrong-result/ it’s the same issue

Topic		Replies	Views
2D CUFFT wrong result GPU-Accelerated Libraries cufft	8	3110	November 7, 2023
Wrong results in cufft! GPU-Accelerated Libraries	2	1226	September 30, 2015
CUDA FFT different from Matlab FFT CUDA Programming and Performance	32	9370	March 29, 2011
CUFFT run wrong CUDA Programming and Performance	16	2820	May 23, 2013
2D FFT- Real to Complex and Complex to Real CUDA Programming and Performance	0	2177	February 3, 2009
[SOLVED] cuFFT data storage, maybe I'm operating on the wrong elements CUDA Programming and Performance	16	2162	June 11, 2022
result of cufft is different with fftw CUDA Programming and Performance	0	513	June 1, 2014
cuFFT and fftw CUDA Programming and Performance	10	4191	August 25, 2010
cufftExecR2C only gives half the answer..?! CUDA Programming and Performance	2	4228	July 24, 2009
Wrong results in CUFFT! CUDA Programming and Performance	4	5471	March 22, 2011

Wrong results produced by Cufft

Related topics