Some conceptual questions on multidimensional cuFFT

saulocpp · August 8, 2018, 8:45pm

Good evening, all.

I am extending some work I’m doing in FFT 1D to 2D and 3D, but came across some doubts in the API. The documentation at https://docs.nvidia.com/cuda/cufft/index.html left me unsure as to how to implement.
My basic declarations and definitions are (error checking omitted for easier reading):

// Declaring the cuFFT inputs, outputs and plans
cufftHandle fw_plan_2D, inv_plan_2D;
cufftComplex *complex_array;
cufftReal *input_array, *output_array;

// Explicitly saying in the variable names the directions
unsigned int x_rows, y_cols;

// Creating the forward and inverse plans
// Consider the x and y quantities already provided by user
cufftPlan2d(&fw_plan_2D, x_rows, y_cols, CUFFT_R2C);
cufftPlan2d(&inv_plan_2D, x_rows, y_cols, CUFFT_C2R);

// Allocate the 2D float and complex arrays as flattened to 1D
cudaMallocManaged(&input_array, sizeof(cufftReal) * x_rows * y_cols));
cudaMallocManaged(&output_array, sizeof(cufftReal) * x_rows * y_cols));
cudaMallocManaged(&complex_array, sizeof(cufftComplex) * x_rows * y_cols));

// Run the forward and inverse transforms with some work in between
// Consider the input array was already filled by another method
cufftExecR2C(fw_plan_2D, input_array, complex_array);
cudaThreadSynchronize();

// Do some work with the complex data

cufftExecC2R(inv_plan_2D, complex_array, input_array);
cudaThreadSynchronize();

// Do some more work with the complex data before deallocating everything

// Free the arrays
cudaFree(input_array);
cudaFree(output_array);
cudaFree(complex_array);

The code declares the arrays, plans, allocates memory, “runs” (in quotations marks because my questions are around here) the transforms and frees memory. So my questions are:

1 - My data is organized as “y_cols” (outter dimension) by “x_rows” (inner dimension). My data is organized as row-major. When we pass the arguments to cufftPlan2d, is the API considering the row-major nature of C/C++? Likewise, when we write the same in FORTRAN, the order of the arguments is the same but now treated as column-major, that is, “nx” is outter dimension and “ny” is inner?
2 - I must run “y_cols” transforms of length “x_rows”. However, I am not sure the way I call the transforms (lines 21 and 26) takes into consideration the parameters passed to plan creations in regards to “nx” and “ny”, due to question #1.
3 - Are the declaration and allocation of the complex array adequate for a 2D or 3D transform?
4 - Due to Hermitian symmetry, how should the stride be, as the complex array is flattened to 1D and the guides of the section 2.6 doesn’t mention this situation?

If you can assist with any of these questions or make any consideration, I will be grateful.

Robert_Crovella · August 8, 2018, 9:21pm

Yes, CUFFT assumes row-major data storage.

For your questions about R2C complex transforms, there are several questions on this forum that discuss this.

here is one:

[url]https://devtalk.nvidia.com/default/topic/826819/2d-cufft-wrong-result/[/url]

saulocpp · August 8, 2018, 9:50pm

Thanks, txbob. This topic escaped my searches.
Your code probably answers these questions so I’m going to have a good look at it and adjust my work.

saulocpp · August 9, 2018, 5:31pm

txbob, just a few question on the code of the referred topic:

The “fors” in lines 22 and 30, despite the indentation, are not inside the “if” in line 20, correct?
The logic in line 5 below, if sym_cols > j then cols is subtracted by j, is it the case of Hermitian symmetry? I thought it was be enough to just limit the iterations in j up to (cols / 2 + 1).

for (int i = 0; i < rows; i++)
    {
    for (int j = 0; j < cols; j++)
        if (j>=sym_cols)
            printf("%f ", data[i * sym_cols + (cols - j)].x);
        else
            printf("%f ", data[i * sym_cols + j].x);
    printf("\n");
    }

Robert_Crovella · August 9, 2018, 5:46pm

correct, that was sloppy of me. The indentation should be less confusing now.

The concept here is to give you printout that looks the same whether you specified R2C (in which case the actual data output is compressed taking advantage of hermitian symmetry) or C2C (in which case the data can be printed directly).

The underlying data is not 100% identical, but the point is to show that the equivalent of the C2C output can be generated from the (smaller) R2C output.

You could limit the iterations in j up to that limit. The printout would look different then. There is no right or wrong answer here, but I was trying to demonstrate that hermitian symmetry should not get in the way of using a R2C transform: you could create “exactly” the same output as you get from C2C, if you wished to.

saulocpp · August 9, 2018, 6:58pm

Thank you, sir.
Now I have enough information to resume work.
Really appreciate your patience and guidance in these details.

Topic		Replies	Views
cufftComplex Data Issues GPU-Accelerated Libraries	3	1135	June 29, 2017
cuFFT cufftPlan1d and cufftExecR2C issues GPU-Accelerated Libraries	4	2360	July 13, 2016
2D CUFFT wrong result GPU-Accelerated Libraries cufft	8	3064	November 7, 2023
CUFFT and 2D array of complex numbers CUDA Programming and Performance	2	5549	March 30, 2012
Implementation behind the 2D C2R FFT? GPU-Accelerated Libraries	1	1090	November 18, 2017
Internal details/limitations of cuFFT, general questions GPU-Accelerated Libraries	2	591	July 19, 2018
Cufft_R2C and Cufft_C2R are inaccurate GPU-Accelerated Libraries	2	1728	April 11, 2014
CuFFT R2C 2D Batch Transforms Producing Incorrect Results GPU-Accelerated Libraries	1	2685	June 6, 2013
cuFFT return zeros CUDA Programming and Performance	6	1770	May 14, 2011
SIFT implmentation on CUDA CUDA Programming and Performance	5	2217	June 2, 2009

Some conceptual questions on multidimensional cuFFT

Related topics