cuFFT error!

yellownavy · May 24, 2018, 12:34am

I wrote the cufft sample code and tested it.
Test results using cos () seem to work well, but using sin () results in incorrect results.
Below is my code. And attachment is result.

#include <iostream>
#include <fstream>
#include <sstream>
#include <stdio.h>
#include <chrono>
#include "cufft.h"
#include "cuda_runtime.h"
#include <stdlib.h>  
#include <vector>

using namespace std;

/*
 * Create N fake samplings along the function cos(x). These samplings will be
 * stored as single-precision floating-point values.
 */
void generate_fake_samples(int N, float **out)
{
    int i;
    float *result = (float *)malloc(sizeof(float) * N);
    double delta = 2 * M_PI * 5;           // 5 = frequency
    double delta1 = 2 * M_PI * 10;          // 10 = frequency
    printf("delta = %f\n", delta);

    std::ofstream fout;
    fout.open("../signal_data.txt");
    for (i = 0; i < N; i++)
    {
        result[i] = sin(i * delta * 1/N); // + cos(i * delta1 * 1/N) + sin(i * delta1*2 * 1/N);
        fout << result[i];
        fout << "\n";
    }
    fout.close();

    *out = result;
}

/*
 * Convert a real-valued vector r of length Nto a complex-valued vector.
 */
void real_to_complex(float *r, cufftComplex **complx, int N)
{
    (*complx) = (cufftComplex *)malloc(sizeof(cufftComplex) * N);

    for (int i = 0; i < N; i++)
    {
        (*complx)[i].x = r[i];
        (*complx)[i].y = 0;
    }
}

int main(int argc, char *argv[])
{
    int N = 1024;                    // Input Array Size
    int BATCH = 1;
//    int resultSize = (N/2 + 1) * BATCH;       // FFT Result Size at 1-D FFT, Real-to-Complex
    int resultSize = N * BATCH;                 // FFT Result Size at 1-D FFT, Complex-to-Complex

    float * samples;

    cufftHandle plan;
    cufftComplex *inPtr;
    cufftComplex *complexSamples, *devPtr, *results;

    /* source data creation & Host memory allocation */
    generate_fake_samples(N, &samples);
    real_to_complex(samples, &complexSamples, N);
    results = (cufftComplex *) malloc(sizeof(cufftComplex) * resultSize);

    /* GPU memory allocation */
    cudaMalloc((void **)&inPtr, sizeof(cufftComplex) * N * BATCH);
    cudaMalloc((void **)&devPtr, sizeof(cufftComplex) * resultSize);

    /* transfer to GPU memory */
    cudaMemcpy(inPtr, complexSamples, sizeof(cufftComplex) * N * BATCH, cudaMemcpyHostToDevice);

    /* creates 1D FFT plan */
    cufftPlan1d(&plan, N, CUFFT_C2C, BATCH);

    /* executes FFT processes */
    cufftExecC2C(plan, inPtr, devPtr, CUFFT_FORWARD);

    /* executes FFT processes (inverse transformation) */
    // cufftExecC2R(plan, devPtr, inPtr); 

    /* transfer results from GPU memory */
    cudaMemcpy(results, devPtr, sizeof(cufftComplex) * resultSize, cudaMemcpyDeviceToHost);

    /* deletes CUFFT plan */
    cufftDestroy(plan);

    /* frees GPU memory */
    cudaFree(devPtr);

    printf("Fourier Coefficients:\n");

    float maximum = 0.0f;
    for (int i = 0; i < resultSize; i++)
    {
        if (fabs(results[i].x) > maximum)
        {
            maximum = fabs(results[i].x);
        }
    }

    std::ofstream fout;
    fout.open("../fft_data.txt");
    for (int i = 0; i < resultSize; i++)
    {
        printf("  %d: (%2.4f, %2.4f)\n", i + 1, results[i].x / maximum,
               results[i].y / maximum);
        float temp = results[i].x / maximum;
        fout << temp;
        fout << "\n";
    }
    fout.close();

    printf("  ...\n");

    free(samples);
    free(results);

    return 0;
}

Robert_Crovella · May 24, 2018, 2:11am

The error is not in CUFFT but in your function that writes data to the output file, which is presumably the data you are using to generate your graphs.

Theory says that a FFT of a pure sine wave should produce a single spike in the imaginary plane, and FFT of a pure cosine wave should produce a single spike in the real plane. If we study the console printout of your program instead of the data written to the file, we see this borne out:

sine output:

$ ./t1000
delta = 31.415927
Fourier Coefficients:
  1: (0.0000, 0.0000)
  2: (0.2327, -0.1359)
  3: (0.0000, 0.0000)
  4: (0.4405, 0.2573)
  5: (0.0000, 0.0000)
  6: (0.0775, -57668960.0000)  //imaginary spike
  7: (0.0000, 0.0000)
  8: (0.0733, 0.2051)
  9: (0.0000, 0.0000)
  10: (-0.0759, -0.0754)
  11: (0.0000, 0.0000)
...
  1017: (0.0000, 0.0000)
  1018: (0.1378, -0.1404)
  1019: (0.0000, 0.0000)
  1020: (0.5789, 57668956.0000)  // output (complex) symmetry
  1021: (0.0000, 0.0000)
  1022: (0.3185, -0.2549)
  1023: (0.0000, 0.0000)
  1024: (0.2558, 0.1916)

cosine output:

$ ./t1000
delta = 31.415927
Fourier Coefficients:
  1: (-0.0000, 0.0000)
  2: (0.0000, 0.0000)
  3: (0.0000, 0.0000)
  4: (0.0000, -0.0000)
  5: (-0.0000, 0.0000)
  6: (1.0000, 0.0000)   // real spike
  7: (0.0000, 0.0000)
  8: (0.0000, -0.0000)
  9: (-0.0000, 0.0000)
  10: (0.0000, -0.0000)
  11: (0.0000, 0.0000)
...
  1017: (-0.0000, 0.0000)
  1018: (0.0000, 0.0000)
  1019: (0.0000, 0.0000)
  1020: (1.0000, -0.0000)  // output symmetry
  1021: (-0.0000, 0.0000)
  1022: (0.0000, 0.0000)
  1023: (0.0000, 0.0000)
  1024: (0.0000, -0.0000)
  ...

However here is the code that is writing the data to the output file:

fout.open("../fft_data.txt");
    for (int i = 0; i < resultSize; i++)
    {
        printf("  %d: (%2.4f, %2.4f)\n", i + 1, results[i].x / maximum,
               results[i].y / maximum);
        float temp = results[i].x / maximum;
        //           ^^^^^^^^^^^^
        //           ^^only real component!
        fout << temp;
        fout << "\n";
    }

It is only writing the real component! Likewise, your maximum-finding function (for output scaling) is only considering the real component:

float maximum = 0.0f;
    for (int i = 0; i < resultSize; i++)
    {
        if (fabs(results[i].x) > maximum)
        {
            maximum = fabs(results[i].x);
        }
    }

To display a proper scaled magnitude, you would need to do something like take the square root of the sum of the squares of the real and imaginary components, and then divide that by a normalizing coefficient which is computed from the peak magnitude, including both real and imaginary components.

As it is your output file misses the imaginary plane entirely. This is a defect in your code, not CUFFT.

yellownavy · May 24, 2018, 6:40am

txbob,
Thanks for your answer. Correct result after editing code.

float maximum = 0.0f;
    float * amplitude, * phase;
    for (int i = 0; i < resultSize; i++)
    {
        amplitude[i] = (float) sqrt(pow(results[i].x, 2) + pow(results[i].y, 2));
        if ((amplitude[i]) > maximum)
        {
            maximum = amplitude[i];
        }
    }

Topic		Replies	Views
Cufftmp 2d fft R2C GPU-Accelerated Libraries cufft	0	25	March 10, 2025
Wrong results in CUFFT! CUDA Programming and Performance	4	5468	March 22, 2011
[SOLVED] cuFFT data storage, maybe I'm operating on the wrong elements CUDA Programming and Performance	16	2160	June 11, 2022
Hi~All generalists, I want to ask about cufft problems CUDA Programming and Performance	0	6921	April 5, 2011
2D CUFFT wrong result GPU-Accelerated Libraries cufft	8	3109	November 7, 2023
CUFFT Library GPU-Accelerated Libraries	1	552	October 4, 2017
cufftExecZ2Z result not match MATLAB result GPU-Accelerated Libraries cufft	7	613	August 21, 2022
simple CUFFT problem / time to frequency domain CUDA Programming and Performance	6	3361	September 22, 2009
Problem using cuFFT CUDA Programming and Performance	3	3565	May 31, 2011
Wrong results in cufft! GPU-Accelerated Libraries	2	1226	September 30, 2015

cuFFT error!

Related topics