cufftExecC2C incorrect for certain FFT sizes

themanatuf · January 24, 2012, 2:13pm

First off - I apologize that my first post has to be a question. I visit the forums frequently but have come across an issue that has me scratching my head.

I have a large CUDA application and at one point it calculates the inverse FFT for a set of data. Comparing this output to FFTW (for example) produces drastically different results, but ONLY for an FFT size of 32k. Unfortunately I cannot post the entire code, but here’s the gist:

#define FFT_SIZE 32768

cufftComplex *input = (cufftComplex *)malloc(sizeof(cufftComplex) * FFT_SIZE);

cufftComplex *input_d;

cufftHandle plan;

cufftPlan1d(&plan, FFT_SIZE, CUFFT_C2C, 1);

cudaMalloc((void **)&input_d, sizeof(cufftComplex) * FFT_SIZE);

// Generate the input data

cudaMemcpy(input_d, input, sizeof(cufftComplex) * FFT_SIZE, cudaMemcpyHostToDevice);

cufftExecC2C(plan, input_d, input_d, CUFFT_FORWARD);

// Verified data looks good here.

int nThreads = 512;

int nBlocks = (FFT_SIZE / 2) / nThreads + (((FFT_SIZE / 2) % nThreads) == 0 ? 0 : 1);

kernelFunc<<< nBlocks, nThreads >>>(FFT_SIZE, input_d);

// Verified data looks good here.

cufftExecC2C(plan, input_d, input_d, CUFFT_INVERSE);

// Data is completely wrong here.

In my main application if I change the FFT_SIZE to 1k, 2k, 4k, 8k, or 16k the output after the inverse FFT is perfect. Its only for an FFT_SIZE of 32k where the data is wrong.

I should also add that I tried this sample code outside of my environment and it worked for all FFT sizes. So there seems to be something within my main application environment that is causing the error. I’ve been looking at this for a few days and am clueless.

If anyone has any thoughts I would greatly appreciate it!

mfatica · January 24, 2012, 3:33pm

The error is probably in the nBlocks assignment or in the kernelFunc.

themanatuf · January 24, 2012, 5:55pm

Thanks mfatica, I have a feeling you’re right and I was really hoping that wasn’t the issue :)

I quickly commented out the first cufftExecC2C and the kernelFunc and that didn’t seem to help. I have 4 other kernel functions above that first cufftExecC2C function, so it looks like I’ve got a bit more debugging to do.

Thanks for the direction.

short · January 25, 2012, 4:27am

Is there a reason for doing FFT_SIZE/2 while configuring the kernel launch configuration. You could comment out the kernelFunc launch and see if FFT followed by IFFT generates input_d (scaling maybe needed).

You could also use ArrayFire (which is free) to do FFTs in one line as shown below:

array A = randu(3,4) // random data generation on GPU

   array B = fft(A);    // FFT

   print(B);

Output:

A =

        0.7402     0.9690     0.6673     0.5132

        0.9210     0.9251     0.1099     0.7762

        0.0390     0.4464     0.4702     0.2948

B =

        1.7002 +    0.0000i    2.3405 +    0.0000i    1.2475 +    0.0000i    1.5841 +    0.0000i

        0.2602 -    0.7638i    0.2832 -    0.4146i    0.3772 +    0.3120i   -0.0223 -    0.4169i

        0.2602 +    0.7638i    0.2832 +    0.4146i    0.3772 -    0.3120i   -0.0223 +    0.4169i

themanatuf · January 30, 2012, 5:06pm

Thanks short. I’ve stepped away from this for a few days to help clear my head. I’ll give ArrayFire a shot.

Commenting out the kernelFunc did not seem to help. However, commenting out the first cufftExecC2C (contradictory to what I said in my second post, I made some other changes) DID in fact produce the correct results. I’m not sure why that is the case. I’ve tried copying the data to different buffers all without any success.

I’m starting to go back into the code above all of this. Unfortunately, each function depends on the output from the previous one! So trying to get meaningful data down to this point has been challenging.

Thanks again for the help.

apostglen46 · February 4, 2012, 4:39pm

Have to set the FFTW compatibility flag to FFT_COMPATIBLE_ALL (or something like that) since you are checking correctness with FFTW?

Topic		Replies	Views
output of CUFFT not centered like FFTW CUDA Programming and Performance	7	7559	September 28, 2011
CUFFT appears to give errors for vectors > 1024 CUDA Programming and Performance	6	8764	April 12, 2007
CUFFT run wrong CUDA Programming and Performance	16	2806	May 23, 2013
3D FFT problem CUDA Programming and Performance	2	3098	December 21, 2007
size limit of 1D FFT CUDA Programming and Performance	8	2526	September 24, 2011
CUDA 2.2 and failing CUFFT SDK example CUDA Programming and Performance	3	10569	June 9, 2009
3D CUFFT fails CUDA Programming and Performance	5	1011	February 9, 2012
cufft error (?) CUDA Programming and Performance	7	8991	March 5, 2012
Buggy CUFFT inplace 2D R2C? CUDA Programming and Performance	0	3750	February 20, 2008
CUFFT_INVALID_PLAN error Error on using CUFFT CUDA Programming and Performance	10	3517	June 29, 2009

cufftExecC2C incorrect for certain FFT sizes

Related topics