complex cuFFT fails for length 59200 cuFFT bug for cuda 3.0 or greater

Starting with CUDA v3.0 and later cuFFT appears to fail with a certain size complex FFT.

The offending size is 59200 = 64 * 25 * 37

The failure behavior is that given any randomly generated complex input, the output FFT does not match that produced by

any other FFT software, including Matlab and numpy. (They probably use fftw I suppose.) If you use fft sizes of 59199 or 59201 or

multiples of 59200 or any other size I’ve tried so far it works fine. There may be other bad sizes that I haven’t found however.

All sizes from 1 to 10000 appear to more or less work. (Although an fft of size 6449, which is a prime, seems to be pretty noisy giving normalized errors from that of

Matlab’s fft at around 0.01)

A second call to the FFT code on the same data will start to generate NaNs on the output and in fact the output of this FFT changes for each

call, even when run on the same input data. It happens if the FFT is run in place or if the output buffer differs

from the input buffer. This behavior is seen on both linux and windows version of cuda 3.0 or greater.

I have not seen it happen for cuda 2.2.

A code snippet of how I run the FFT follows:

[codebox]

cufftHandle plan ;

cufftResult cfr ;

cfr = cufftPlan1d( &plan, nrows, CUFFT_C2C,ncols );

/* make sure we are successful */

assert(cfr == CUFFT_SUCCESS) ;

cfr = cufftExecC2C(plan, (cufftComplex *)val, (cufftComplex *)val, CUFFT_FORWARD);

assert(cfr == CUFFT_SUCCESS) ;

cufftDestroy(plan);

[/codebox]

val is a device pointer that points to the complex random data. My colleague intends to provide some raw test data, however this size fft fails for every input data I’ve tried

on every machine I’ve tried running cuda 3.0 or greater and thus could easily be generated via numpy or matlab or some other mathematical software.

I’m wondering if anyone else has seen this behavior or if there is a planned fix?

This was a known issue in certain large transform sizes in versions of CUFFT up to and including CUFFT 3.1. For CUFFT 3.2, there should be far fewer of these kinds of problems.

For example, a comparison of CUFFT 3.2 against FFTW shows that the difference is now on the order of 10^-7 for single precision and 10^-11 for double precision for all three of the transform sizes you cited (59199, 59200, 59201).

Thanks,
Cliff

This was a known issue in certain large transform sizes in versions of CUFFT up to and including CUFFT 3.1. For CUFFT 3.2, there should be far fewer of these kinds of problems.

For example, a comparison of CUFFT 3.2 against FFTW shows that the difference is now on the order of 10^-7 for single precision and 10^-11 for double precision for all three of the transform sizes you cited (59199, 59200, 59201).

Thanks,
Cliff

It’s only the 59200 case that’s giving us real trouble as far as I can tell. However the 6449, large prime case is somewhat noisy. I presume it’s forced to use a DFT, unless you are using Rader’s algorithm to handle the large prime cases. At any rate I look forward to the cuda 3.2 fix.

It’s only the 59200 case that’s giving us real trouble as far as I can tell. However the 6449, large prime case is somewhat noisy. I presume it’s forced to use a DFT, unless you are using Rader’s algorithm to handle the large prime cases. At any rate I look forward to the cuda 3.2 fix.

The large prime cases are greatly improved in CUFFT 3.2 as well.

–Cliff

The large prime cases are greatly improved in CUFFT 3.2 as well.

–Cliff