FFT problem on a 8800GT 1G card

Hi,

I’ve a 8800GT 512M card which I can do the CUFFT for size up to 3136 x 3136 on a WinXP PC. With a new 8800GT 1G card which I just got today, I thought that I can at least do a CUFFT on a 4096 x 4096 matrix. But it failed with the “CUFFT_ALLOC_FAILED” error message. Your assistant is greatly appreciated!

The following is the code:

int
main( int argc, char** argv)
{

struct cudaDeviceProp prop;
int dev = 0;

CUT_DEVICE_INIT();

CUDA_SAFE_CALL(cudaGetDeviceCount(&s_gpuCount));

if(s_gpuCount == 0)
{
	printf("\nNo Device Found\n");
	exit(0);
}
else
{
	printf("\n%d Device Found\n",s_gpuCount);

	for(dev = 0; dev < s_gpuCount; dev++)
	{
		CUDA_SAFE_CALL(cudaSetDevice(dev));
		cudaGetDevice(&dev);
		printf("\nThe Device ID is %d\n",dev);

		cudaGetDeviceProperties(&prop,dev);
		printf("\nThe Properties of the Device with ID %d are\n",dev);
		printf("\tDevice Name : %s",prop.name);
		printf("\n\Device Memory Size (in bytes) : %d",prop.totalGlobalMem);
		printf("\n\Constant Memory Size (in bytes) : %d",prop.totalConstMem);
		printf("\n\tDevice Major Revision Numbers : %d",prop.major);
		printf("\n\tDevice Minor Revision Numbers : %d",prop.minor);
		printf("\n\n");
	}
}
printf("\n");

if(s_gpuCount > 1)
{
	CUDA_SAFE_CALL(cudaSetDevice(1));
	CUDA_SAFE_CALL(cudaGetDevice(&dev));
	printf("Doing it on Device %d\n", dev);
}

int nX = 4096;
int nY = 4096;
unsigned int fftInputSize = (nX * nY) * sizeof(float2);
cufftComplex *idata;
cudaMalloc((void**)&idata, sizeof(cufftComplex)*nX*nY);
cufftHandle plan;
cufftResult rst = cufftPlan2d(&plan, nX, nY, CUFFT_C2C);
cufftExecC2C(plan, idata, idata, CUFFT_FORWARD);

cufftDestroy(plan);
cudaFree(idata);

}

The following is the result:

2 Device Found

The Device ID is 0

The Properties of the Device with ID 0 are
Device Name : GeForce 8800 GT
Device Memory Size (in bytes) : 536543232
Constant Memory Size (in bytes) : 65536
Device Major Revision Numbers : 1
Device Minor Revision Numbers : 1

The Device ID is 1

The Properties of the Device with ID 1 are
Device Name : GeForce 8800 GT
Device Memory Size (in bytes) : 1073479680
Constant Memory Size (in bytes) : 65536
Device Major Revision Numbers : 1
Device Minor Revision Numbers : 1

Doing it on Device 1
cufft: ERROR: C:/cygwin/home/cuda0/cuda/sw/gpgpu_rel1.1/cufft/src/config.cu, line 299
cufft: ERROR: CUFFT_ALLOC_FAILED
cufft: ERROR: C:/cygwin/home/cuda0/cuda/sw/gpgpu_rel1.1/cufft/src/cufft.cu, line 115
cufft: ERROR: CUFFT_INVALID_PLAN
cufft: ERROR: C:/cygwin/home/cuda0/cuda/sw/gpgpu_rel1.1/cufft/src/cufft.cu, line 94
cufft: ERROR: CUFFT_INVALID_PLAN
Press any key to continue . . .

just a quick guess, it seems that you malloc the idata on the card but then you don’t actually put anything into it. So you’re calling an FFt on whatever happens to be in those memory locations on the card, you probably need to do a cudaMemcpy from some test data on your host to get a sensible transform.

Perhaps this isn’t an issue with the cudaFFt but it’s usually a problem.

The original code has the “cudaMemcpy” function to fill the idata variable. I omitted it to simplify the code. Actually the failure occurs at the “cufftPlan2d” statement, before the call to “cufftExecC2C”.

I read the following post

[url=“The Official NVIDIA Forums | NVIDIA”]The Official NVIDIA Forums | NVIDIA

which suggests that the maximum 1d FFT is 8M pt fft. Is this an official maximum. Can Nvidia personnel comment on the maximum capacity of the 2d case?

I reboot the PC this morning and was able to get the 4096 x 4096 case working, but still would like to know the maximum size of matrix that the 2d FFT can handle? Thanks in advance!

The library limit is 16K x 16K.

On a Tesla (with 1.5GB of memory), you can do a C2C transform up to 7000 x 7000.

This is very useful information.