I use the “cufft_c2c_radix2” function with the following parameters and it generates incorrect result.

Let say I would like to generate the fft for the following signal:

BLOCK_SIZE = 8;

signalSize = 16 (cufftComplex)

theta = 2 * PI / signalSize;

base = 3;

strd.ibStride = BLOCK_SIZE; // Is this correct?

strd.ieStride = 1; // Is this correct?

strd.obStride = BLOCK_SIZE;

strd.oeStride = 1;

dim3 dimBlock(BLOCK_SIZE, 1);

dim3 dimGrid signalSize / dimBlock.x, 1);

smemSize = sizeof(cufftComplex) * signalSize;

cufft_c2c_radix2<<<dimGrid, dimBlock, smemSize>>>smemSize, // Signal size in complex elements

theta, // 2 * Pi / N

base, // log base 2 of N

d_inImg, // Pointer to input signal in global memory

d_outImg, // Pointer to output array in global memory

CUFFT_FORWARD, // FFT direction: -1 is forward, 1 is inverse

strd); // Input and output block and elements strides

Can someone explain to me why this setup will generate incorrect result?

In addition, in the “cufft_kernels.h” file, it mentions that in order to “perform M transforms of size N, set grid.y = N, and thread.x = N / R.”.

I thought the grid.y parameter should contain the number of signals (M) in a batch! Am I missing something?

Thank you!