Have got strange result with batching of C2C FFT:
complex array in GPU memory has 1024 x 360 of cufftComplex elements (initialized already)
and 360 1D-FFT C2C by 1024 are executed in place (cufftExecC2C), then array is handled (without any shifts) and 360 inverse 1D-FFT C2C are calculated also in place.
Everything is fine if plan is cufftPlan1d(&plan, 1024, CUFFT_C2C, 1).
But if to use batch, e.g. cufftPlan1d(&plan, 1024, CUFFT_C2C, 2) and 180 1D FFT/IFFT, the result image has some defects and image gets worse with increasing the batch number.
Card is FX 1700, CUDA 2.0, no errors during executing. Is there some undocumented limitation for using cufft or data for batch should be prepared some other way?