cufft r2c input alignment in CUDA 7.5

I’d like to FFT data from two interleaved real-valued signals that are to be cross-correlated by the FFT method. The input data look like d_in = [x0 y0 x1 y1 … xn-1 yn-1]. The output should be d_out = [X0Re X0Im Y0Re Y0Im … ] for sequential memory access in later processing.

Tried cufftPlanMany() with input and output strides of 2, input dist of 2*(2Lfft) and output dist of 2(Lfft+1). Then called cufftExecR2C() twice. The first cufftExecR2C() with start at d_in transformed the “x” data. This worked. The second cufftExecR2C() for a offset-by-one start at d_in+1, however, produces a CUFFT_INVALID_VALUE error.

The same error happens with the following addition to CUDA 7.5 example “simpleCUFFT.cu”:

Complex *d_signal;
checkCudaErrors(cudaMalloc((void **)&d_signal, mem_size+32));
Complex *d_signal_o;
checkCudaErrors(cudaMalloc((void **)&d_signal_o, mem_size+32));
cufftHandle plan_r2c;
checkCudaErrors(cufftPlan1d(&plan_r2c, new_size, CUFFT_R2C, 1));

// 1st out-of-place FFT, works
checkCudaErrors(cufftExecR2C(plan_r2c, ((cufftReal *)d_signal)+0, ((cufftComplex *)d_signal_o)+0));

// 2nd out-of-place FFT, error 4(CUFFT_INVALID_VALUE)
checkCudaErrors(cufftExecR2C(plan_r2c, ((cufftReal *)d_signal)+1, ((cufftComplex *)d_signal_o)+1));

The CUFFT_INVALID_VALUE error also occurs when a ‘load’ callback function is used to fetch the cufftReal-aligned input data.

There is no mention of requirements for FFT input data aligment in http://docs.nvidia.com/cuda/cufft/index.html#data-layout.

Should something like the above actually work…?

Or is all cuFFT processing natively aligned to ‘float2’?

In the description of cufftExecR2C:

[url]http://docs.nvidia.com/cuda/cufft/index.html#function-cufftexecr2c-cufftexecd2z[/url]

it states:

"Pointers to idata and odata are both required to be aligned to cufftComplex data type in single-precision transforms and cufftDoubleComplex data type in double-precision transforms. "