CUFFT and 2D array of complex numbers


I am a complete beginner in CUDA (I’ve never hear of it up until a few weeks ago). I was given a project which requires using the CUFFT library to perform transforms in one and two dimensions. In order to test whether I had implemented CUFFT properly, I used a 1D array of 1’s which should return 0’s after being transformed. The data being passed to cufftPlan1D is a 1D array of complex numbers as shown in the following code:

void runTest(int argc, char** argv);

#define SIGNAL_SIZE 4096
#define REPEAT 5000

int main(int argc, char** argv)
runTest(argc, argv);

    cutilExit(argc, argv);


void runTest(int argc, char** argv)
if( cutCheckCmdLineFlag(argc, (const char**)argv, “device”) )
cutilDeviceInit(argc, argv);
cudaSetDevice( cutGetMaxGflopsDeviceId() );

    // Allocate host memory for the signal
    cufftComplex* h_signal = (cufftComplex*)malloc(SIGNAL_SIZE * REPEAT * sizeof(cufftComplex));

    // Initalize the memory for the signal
    for (unsigned int i = 0; i < SIGNAL_SIZE; i++) {
        h_signal[i].x = 1.0f; //real
        h_signal[i].y = 0.0f; //imag

    // display the signal
    for (unsigned int i = 0; i < SIGNAL_SIZE; i++) {
        printf("%g %g\n", h_signal[i].x, h_signal[i].y);

    printf("End of signal\n");

    // Allocate device memory for signal
    Complex* d_signal;
    cudaMalloc((void**)&d_signal, SIGNAL_SIZE * REPEAT * sizeof(Complex));

    // Copy host memory to device
    cudaMemcpy(d_signal, h_signal, SIGNAL_SIZE * REPEAT * sizeof(Complex),

    // Create a 1D FFT plan
    cufftHandle plan;
    cufftPlan1d(&plan, SIGNAL_SIZE, CUFFT_C2C, REPEAT);

    // Use the CUFFT plan to transform the signal in place
    cufftExecC2C(plan, (cufftComplex *)d_signal,
            (cufftComplex *)d_signal, CUFFT_FORWARD);

    // Check if CUFFT library initialized successfully
    if (CUFFT_SETUP_FAILED != 0)
        printf("CUFFT Library initialized\n");
    // Check if CUUFT executed the transform on the GPU
    if (CUFFT_EXEC_FAILED != 0)
        printf( "FFT successfully executed on the GPU\n" );
    // Copy result from device to host
    cufftComplex* h_transformed_signal = h_signal;
    cutilSafeCall(cudaMemcpy(h_transformed_signal, d_signal,
            SIGNAL_SIZE * REPEAT * sizeof(Complex), cudaMemcpyDeviceToHost));

    // Display results
    for (unsigned int i = 0; i < SIGNAL_SIZE; i++) {
        printf("%g %g\n", h_transformed_signal[i].x, h_transformed_signal[i].y);

    printf("End of result\n");

    // Destroy the CUFFT plan

    // Free host and device memories



I’ve been struggling trying to figure out how to initialize and pass a 2D array of complex numbers to a 2d C2C CUFFT plan. I’ve read everything on the forums that I could, but it’s still not clear to me. I know most people mention it better to flatten multidimensional arrays, but even getting to this point is proving to be very frustrating. I’ve tried the following with no success:

// Allocate memory for host signal
cufftComplex *h_idata = (cufftComplex *)malloc(size);

for (unsigned int col = 0; col < NX; col++) {
        for (unsigned int row = 0; row < NY; row++) {
        h_idata[row][col].x = 1.0f; //real
        h_idata[row][col].y = 0.0f; //imag

But, I do believe that CUDA flattens multidimensional arrays(?).

I sincerely appreciate any help.


In Cuda CUFFT take complex numbers as input in the form of

cufftComplex *a_h;
for (i=0; i < N; i++) {
a_h[i].x = (your choice);
a_h[i].y = (your choice));
then it can be easily transferred to GPU by cudaMalloc and cudaMemcpy.

cufftComplex *h_idata = (cufftComplex *)malloc(size);

for (int col = 0; col < NX; col++) { {
h_idata[col].x = 1.0f; //real
h_idata[col].y = 0.0f; //imag
this will work hopefully.

I am using something like this:

int count=0;

for (int i=0;i<nx;i++)


      for(int j=0;j<ny;j++)







You transfer the data as a 1D array of size nx*ny