cufft r2c input alignment in CUDA 7.5

I’d like to FFT data from two interleaved real-valued signals that are to be cross-correlated by the FFT method. The input data look like d_in = [x0 y0 x1 y1 … xn-1 yn-1]. The output should be d_out = [X0Re X0Im Y0Re Y0Im … ] for sequential memory access in later processing.

Tried cufftPlanMany() with input and output strides of 2, input dist of 2*(2Lfft) and output dist of 2(Lfft+1). Then called cufftExecR2C() twice. The first cufftExecR2C() with start at d_in transformed the “x” data. This worked. The second cufftExecR2C() for a offset-by-one start at d_in+1, however, produces a CUFFT_INVALID_VALUE error.

The same error happens with the following addition to CUDA 7.5 example “”:

Complex *d_signal;
checkCudaErrors(cudaMalloc((void **)&d_signal, mem_size+32));
Complex *d_signal_o;
checkCudaErrors(cudaMalloc((void **)&d_signal_o, mem_size+32));
cufftHandle plan_r2c;
checkCudaErrors(cufftPlan1d(&plan_r2c, new_size, CUFFT_R2C, 1));

// 1st out-of-place FFT, works
checkCudaErrors(cufftExecR2C(plan_r2c, ((cufftReal *)d_signal)+0, ((cufftComplex *)d_signal_o)+0));

// 2nd out-of-place FFT, error 4(CUFFT_INVALID_VALUE)
checkCudaErrors(cufftExecR2C(plan_r2c, ((cufftReal *)d_signal)+1, ((cufftComplex *)d_signal_o)+1));

The CUFFT_INVALID_VALUE error also occurs when a ‘load’ callback function is used to fetch the cufftReal-aligned input data.

There is no mention of requirements for FFT input data aligment in

Should something like the above actually work…?

Or is all cuFFT processing natively aligned to ‘float2’?

In the description of cufftExecR2C:

it states:

"Pointers to idata and odata are both required to be aligned to cufftComplex data type in single-precision transforms and cufftDoubleComplex data type in double-precision transforms. "