Good evening, all.
I am extending some work I’m doing in FFT 1D to 2D and 3D, but came across some doubts in the API. The documentation at https://docs.nvidia.com/cuda/cufft/index.html left me unsure as to how to implement.
My basic declarations and definitions are (error checking omitted for easier reading):
// Declaring the cuFFT inputs, outputs and plans
cufftHandle fw_plan_2D, inv_plan_2D;
cufftComplex *complex_array;
cufftReal *input_array, *output_array;
// Explicitly saying in the variable names the directions
unsigned int x_rows, y_cols;
// Creating the forward and inverse plans
// Consider the x and y quantities already provided by user
cufftPlan2d(&fw_plan_2D, x_rows, y_cols, CUFFT_R2C);
cufftPlan2d(&inv_plan_2D, x_rows, y_cols, CUFFT_C2R);
// Allocate the 2D float and complex arrays as flattened to 1D
cudaMallocManaged(&input_array, sizeof(cufftReal) * x_rows * y_cols));
cudaMallocManaged(&output_array, sizeof(cufftReal) * x_rows * y_cols));
cudaMallocManaged(&complex_array, sizeof(cufftComplex) * x_rows * y_cols));
// Run the forward and inverse transforms with some work in between
// Consider the input array was already filled by another method
cufftExecR2C(fw_plan_2D, input_array, complex_array);
cudaThreadSynchronize();
// Do some work with the complex data
cufftExecC2R(inv_plan_2D, complex_array, input_array);
cudaThreadSynchronize();
// Do some more work with the complex data before deallocating everything
// Free the arrays
cudaFree(input_array);
cudaFree(output_array);
cudaFree(complex_array);
The code declares the arrays, plans, allocates memory, “runs” (in quotations marks because my questions are around here) the transforms and frees memory. So my questions are:
1 - My data is organized as “y_cols” (outter dimension) by “x_rows” (inner dimension). My data is organized as row-major. When we pass the arguments to cufftPlan2d, is the API considering the row-major nature of C/C++? Likewise, when we write the same in FORTRAN, the order of the arguments is the same but now treated as column-major, that is, “nx” is outter dimension and “ny” is inner?
2 - I must run “y_cols” transforms of length “x_rows”. However, I am not sure the way I call the transforms (lines 21 and 26) takes into consideration the parameters passed to plan creations in regards to “nx” and “ny”, due to question #1.
3 - Are the declaration and allocation of the complex array adequate for a 2D or 3D transform?
4 - Due to Hermitian symmetry, how should the stride be, as the complex array is flattened to 1D and the guides of the section 2.6 doesn’t mention this situation?
If you can assist with any of these questions or make any consideration, I will be grateful.