cufftPlan3d Device Memory Usage Large memory usage creating fft plan

I am new to cuda so please forgive me if I missing something basic in my understanding of cufftPlan3d. In the following code, where Nx=512, Ny=512 and Nz=128, I have used over 500MB of memory after the two calls to cufftPlan3d (the output follows). Is cufftPlan3d suppose to take this much memory?

I have two data sets of 256 MB to transform. I run out of memory when I try to cudaMalloc the second data set on my GTX 465. The GTX 465 has Total global mem of 1073414144.

printf("\n");
cudaMemGetInfo(&free,&total);
printf("%d KB free of total %d KB\n",free/1024,total/1024);
cufftPlan3d(&planr, Nz, Ny, Nx,CUFFT_Z2D);

    cufftPlan3d(&planf, Nz, Ny, Nx,CUFFT_D2Z);

printf("\n");
cudaMemGetInfo(&free,&total);
printf("%d KB free of total %d KB\n",free/1024,total/1024);

Output from code segment.

991736 KB free of total 1048256 KB

463736 KB free of total 1048256 KB

Thanks for your help.

Hello,

From my experience the cufft allocates up to 2.5 times more memory than the size of the matrix to be transformed.