A question about memory allocate when using cuFFT

I’m using a 9800GT card, with 512MB memory.CUDA 2.2. driver 185.xx, system Windows XP.

Plan to get my 2D fft transform more faster.

And it really get my programme much more faster ,but when I increase the transform size ,there got to be a problem.

int mem_size = sizeof(float2) * 4096*4096;

re =  cuMemGetInfo (&free,&total);//---------------------------------------- 465MB,512MB results here


re =  cuMemGetInfo (&free,&total);//--------------------------------------- 81MB results here.

cutilSafeCall(cudaMalloc((void**)&d_signal, mem_size));----------------- falied!

re =  cuMemGetInfo (&free,&total);

There is another programme here:

           	int mem_size = sizeof(float2) * 2048*2048;

re =  cuMemGetInfo (&free,&total);//---------------------------------------- 465MB,512MB results here


re =  cuMemGetInfo (&free,&total);//--------------------------------------- 401MB results here

cutilSafeCall(cudaMalloc((void**)&d_signal, mem_size));

re =  cuMemGetInfo (&free,&total);//----------------------------------------369MB results here

It is very strange to me that 4096 fft2dplan used 50% more memory than expected.

465 - 81 = 384MB = 1283 ( It supposed to be 1282… why *3?)

465 - 401 = 64MB = 32*2