used memory on device how much memory does device allocate after calling cudamalloc?

Hi,
I have a problem with memory allocation.
My program allocates memory on device several time. then I check how much memory allocated by calling “cuMemGetInfo”,
I calculate used memory, but it is more than what my program allocated.
For the first allocation ,I Think it is because of memory for contex and …
but for second allocation and third time, the allocated memory is more than my needs. so in this way i don’t understand how much memory an allocation will be allocate and I want to now it.

for example if i allocate array of float with size 1000 , I wan to know how much memory it used after allocation?
how can I unerstand it ?
or in other words, how does cuda allocation API allocate memory on device that allocate more than what I want.

for more clarification please see this sample code:

int
main( int argc, char** argv)
{
int size = 1000100010;
int mem_size; //the memory the program want to allcoate
cudaSetDevice( cutGetMaxGflopsDeviceId() );

unsigned int free_mem,total_mem, used_mem;

const int counter = 3;
   float* d_idata[counter];

//first allocation
int i = 0;
mem_size = (1000)sizeof(float);
cutilSafeCall( cudaMalloc( (void
*) &d_idata[i], mem_size));
cuMemGetInfo (&free_mem,&total_mem);
used_mem = total_mem-free_mem;
printf("%d_ free: %u , mem_size: %d , used : %u\n",i,free_mem, mem_size,used_mem);

//second allocaiton
i++;
mem_size = (4500)sizeof(float);
cutilSafeCall( cudaMalloc( (void
*) &d_idata[i], mem_size));
used_mem =free_mem;
cuMemGetInfo (&free_mem,&total_mem);
used_mem = free_mem-used_mem;
printf("%d_ free: %u , mem_size: %d , used : %u\n",i,free_mem, mem_size,used_mem);

//third allocation
i++;
mem_size = (4500)sizeof(float);
cutilSafeCall( cudaMalloc( (void
*) &d_idata[i], mem_size));
used_mem =free_mem;
cuMemGetInfo (&free_mem,&total_mem);
used_mem = free_mem-used_mem;
printf("%d_ free: %u , mem_size: %d , used : %u\n",i,free_mem, mem_size,used_mem);

 for(i = 0; i<counter ; i++)
  cutilSafeCall(cudaFree(d_idata[i]));

 cudaThreadExit();

}

(sorry there was a mistake in output,now it is correct)
the output is :
0_ free: 1025612544 , mem_size: 4000 , used : 47801600
1_ free: 1025592064 , mem_size: 18000 , used : 20480
2_ free: 1025526528 , mem_size: 18000 , used : 65536

(free -> free memory on device,
mem_size -> number of bytes that program wantes to alocate,
used->used memory on device after allcoation mem_size byte)

Thank,
Marjan

check it for answer:
http://forums.nvidia.com/index.php?showtop…mp;#entry586254