Starting with CUDA v3.0 and later cuFFT appears to fail with a certain size complex FFT.
The offending size is 59200 = 64 * 25 * 37
The failure behavior is that given any randomly generated complex input, the output FFT does not match that produced by
any other FFT software, including Matlab and numpy. (They probably use fftw I suppose.) If you use fft sizes of 59199 or 59201 or
multiples of 59200 or any other size I’ve tried so far it works fine. There may be other bad sizes that I haven’t found however.
All sizes from 1 to 10000 appear to more or less work. (Although an fft of size 6449, which is a prime, seems to be pretty noisy giving normalized errors from that of
Matlab’s fft at around 0.01)
A second call to the FFT code on the same data will start to generate NaNs on the output and in fact the output of this FFT changes for each
call, even when run on the same input data. It happens if the FFT is run in place or if the output buffer differs
from the input buffer. This behavior is seen on both linux and windows version of cuda 3.0 or greater.
I have not seen it happen for cuda 2.2.
A code snippet of how I run the FFT follows:
[codebox]
cufftHandle plan ;
cufftResult cfr ;
cfr = cufftPlan1d( &plan, nrows, CUFFT_C2C,ncols );
/* make sure we are successful */
assert(cfr == CUFFT_SUCCESS) ;
cfr = cufftExecC2C(plan, (cufftComplex *)val, (cufftComplex *)val, CUFFT_FORWARD);
assert(cfr == CUFFT_SUCCESS) ;
cufftDestroy(plan);
[/codebox]
val is a device pointer that points to the complex random data. My colleague intends to provide some raw test data, however this size fft fails for every input data I’ve tried
on every machine I’ve tried running cuda 3.0 or greater and thus could easily be generated via numpy or matlab or some other mathematical software.
I’m wondering if anyone else has seen this behavior or if there is a planned fix?