CUFFT_INTERNAL_ERROR Error while executing cufftPlan2d()


first post from a longtime lurker. :biggrin:

After a couple of very basic tests with CUDA, I stepped up working with CUDAFFT (which is my real target).

I’m running Win XP SP2 with CUDA 1.1 final; I use VisualStudio 2005.

Drivers are 169.09.

Card is a 8800 GTS (G92) with 512MB of RAM.

CPU is an Intel Core2 Quad Q6600, 4GB of RAM.

I’m having problems when trying to execute cufftPlan2d().

My code snippet (core function only to keep it short):

int mycufft (int mode, float *in_re, float *in_im, float *out_re, float *out_im, unsigned int size)


	cufftComplex	*deviceFC1 = NULL, *deviceFC2 = NULL, 

    	*hostFC1   = NULL, *hostFC2   = NULL;

	cufftHandle  FPPlan;

	int    i, samples;

	if ( (in_re == NULL)||(in_im == NULL)||(out_re == NULL)||(out_im == NULL) )

  return 1;

	samples = size * size;

	cudaMalloc ((void **) &deviceFC1, samples*sizeof(cufftComplex));

	cudaMalloc ((void **) &deviceFC2, samples*sizeof(cufftComplex));

	if ( (deviceFC1 == NULL)||(deviceFC2 == NULL) )

  return 2;


	CUDA_SAFE_CALL (FPPlan = cufftPlan2d (&FPPlan, size, size, CUFFT_C2C));

	if (FPPlan == NULL)


  cudaFree (deviceFC1);

  cudaFree (deviceFC2);

  return 3;





When execution reaches cufftPlan2d(), I get this error:

cufft: ERROR: C:/cygwin/home/cuda0/cuda/sw/gpgpu_rel1.1/cufft/src/, line 41


Cuda initialization (before calling the function) goes well.

DeviceQuery returns:


There is 1 device supporting CUDA

Device 0: "GeForce 8800 GTS 512"

  Major revision number:                         1

  Minor revision number:                         1

  Total amount of global memory:                 536543232 bytes

  Total amount of constant memory:               65536 bytes

  Total amount of shared memory per block:       16384 bytes

  Total number of registers available per block: 8192

  Warp size:                                     32

  Maximum number of threads per block:           512

  Maximum sizes of each dimension of a block:    512 x 512 x 64

  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1

  Maximum memory pitch:                          262144 bytes

  Texture alignment:                             256 bytes

  Clock rate:                                    1620000 kilohertz


Press ENTER to exit...

Any help would be much appreciated!

** Please note I can build and run all of the CUDA SDK examples without problems. **

Thanks for your patience,


Does this problem persist if you upgrade everything to the 2.0-beta2 release?

Sorry, I can’t upgrade the framework in the middle of a project.

But since everything runs fine (even the ConvolutionFFT2D sample in the SDK and the FFTBench posted somewhere here in the forum), I don’t think it’s a platform problem: looks more like I’m doing something stupid, but just can’t figure what.

For instance, even cufftPlan1d() fails when I set the problem size (number of elements) over 1024: this can’t be possible…!

I mean: this works

CUDA_SAFE_CALL (cufftPlan1d (&Plan, 1024, CUFFT_C2C, 1));

while this throws the CUFFT_INTERNAL_ERROR:

CUDA_SAFE_CALL (cufftPlan1d (&Plan, 4096, CUFFT_C2C, 1));


In the meantime, I’ve catched a bug that was causing the first of my problems:

That’s it.

I was wrongly assigning FPPlan = …, while cufftPlan2d does not use this syntax (I was porting some code from another FFT library).

So I was overwriting FPPlan and the subsequent execution failed.

It seems that CUFFT_INTERNAL_ERROR is a catch-all generic error that is throwed any time there’s something wrong in the code.

In this case, I would have expected a more appropriate error, like “CUFFT executed with invalid PLAN” or something like that… it would have been much more useful.