Cannot allocate the total amount of global memory benchmark

LDS · August 25, 2010, 1:00pm

Hi !

I’m currently having issues trying to figure out the reason why I cannot allocate as much as the total amount of memory available on my Tesla C1060. I mean, I would not bear if it was for like a cupple of KB or even MB but in my case it’s like if more than a hundred MB were not really available. Does it sound normal to you?

Here is the code I’m using : maximumAllocatableMemory.cu

[codebox]int main(int argc,char**argv)

{

cudaError_t L_cuda_return;

int count = 0;

int *p_dev;

int nb = 1000;

do{

         L_cuda_return = cudaMalloc((void**)&p_dev,nb*sizeof(int));

        count++;

}while (L_cuda_return == cudaSuccess);

printf(“\n %u allocations really done so that a total amount of %u B has been allocated with %u B in a row”,count-1,(count-1)nbsizeof(int), nb*sizeof(int));

cudaDeviceProp deviceProp;

cudaGetDeviceProperties( &deviceProp, 0 );

 printf("\n The total amount of global memory available is %u B in theory, then it appears that %u B where not actually used\n",deviceProp.totalGlobalMem, deviceProp.totalGlobalMem - (count-1)*nb*sizeof(int));

return 0;

}

[/codebox]

the result is :

1038592 allocations really done so that a total amount of 4154368000 B has been allocated with 4000 B in a row

The total amount of global memory available is 4294770688 B in theory, then it appears that 140402688 B where not actually used

Am I missing something obvious here?

Please try this simple code (that I compile with nvcc 3.0) since I’d really like to know how it’ll end up for you.

Just for the record, I’m not using this card for graphics purpose.

Thanks for you time

tera · August 25, 2010, 1:45pm

How far do you get with nb=1024? How far with larger sizes like nb=1048576?
As cudaMalloc() returns an address aligned to 256 bytes, with nb=1000 you lose 96 bytes per allocation, or about 2.3% of the total memory size.

tera · August 25, 2010, 1:45pm

How far do you get with nb=1024? How far with larger sizes like nb=1048576?
As cudaMalloc() returns an address aligned to 256 bytes, with nb=1000 you lose 96 bytes per allocation, or about 2.3% of the total memory size.

LDS · August 25, 2010, 2:37pm

Well firstable thank you for your fast reply!

I tried with nb=1024 as you suggested and results are :

[i]1029120 allocations really done so that a total amount of 4215275520 B has been allocated with 4096 B in a row

The total amount of global memory available is 4294770688 B in theory, then it appears that 79495168 B where not actually used[/i]

Now, and I supposed it won’t really surprised you, with nb=1048576, I get:

[i]1005 allocations really done so that a total amount of 4215275520 B has been allocated with 4194304 B in a row

The total amount of global memory available is 4294770688 B in theory, then it appears that 79495168 B where not actually used[/i]

(starting to find it intersting…)

So you’re saying that I was not actually allocating 4000 bytes every time I called cudaMalloc but 4096, then that would mean that in my first example I actually allocated 99704832 B more than I thought. Still, the question remains for the 40697856 bytes left and so unused.

Now back to the two examples where we know exactly how much memory we allocated, is there a chance you have an explanation for the 79495168 B that I could not allocate? It still sounds a lot to me… (Anyway thank you for you reply, I was not aware of the way the memory was allocated by cudaMalloc)

LDS · August 25, 2010, 2:37pm

Well firstable thank you for your fast reply!

I tried with nb=1024 as you suggested and results are :

[i]1029120 allocations really done so that a total amount of 4215275520 B has been allocated with 4096 B in a row

The total amount of global memory available is 4294770688 B in theory, then it appears that 79495168 B where not actually used[/i]

Now, and I supposed it won’t really surprised you, with nb=1048576, I get:

[i]1005 allocations really done so that a total amount of 4215275520 B has been allocated with 4194304 B in a row

The total amount of global memory available is 4294770688 B in theory, then it appears that 79495168 B where not actually used[/i]

(starting to find it intersting…)

So you’re saying that I was not actually allocating 4000 bytes every time I called cudaMalloc but 4096, then that would mean that in my first example I actually allocated 99704832 B more than I thought. Still, the question remains for the 40697856 bytes left and so unused.

Now back to the two examples where we know exactly how much memory we allocated, is there a chance you have an explanation for the 79495168 B that I could not allocate? It still sounds a lot to me… (Anyway thank you for you reply, I was not aware of the way the memory was allocated by cudaMalloc)

LDS · August 26, 2010, 6:21am

I’ve just tried cudaMemGetInfo() and the result kind of surprises me. It stipulates that exactly 40628480 B are simply not allocatable though I called cudaMemGetInfo in the beginning.

Using this method I can for sure manage not to get an “out of memory” error. However, I wonder why the total amount of global memory sold by Nvidia regarding the TeslaC1060 is like 40MB more than what is really available on this card.

I’ve tested it on 3 different TeslaC1060. The result remains unchanged. Seems like a constant “lost of memory”.

I know it only represent 1% of the global memory sold but I suppose I’m curious…!

LDS · August 26, 2010, 6:21am

I’ve just tried cudaMemGetInfo() and the result kind of surprises me. It stipulates that exactly 40628480 B are simply not allocatable though I called cudaMemGetInfo in the beginning.

Using this method I can for sure manage not to get an “out of memory” error. However, I wonder why the total amount of global memory sold by Nvidia regarding the TeslaC1060 is like 40MB more than what is really available on this card.

I’ve tested it on 3 different TeslaC1060. The result remains unchanged. Seems like a constant “lost of memory”.

I know it only represent 1% of the global memory sold but I suppose I’m curious…!

tmurray · August 26, 2010, 7:50am

Some of the memory is reserved for the kernel driver and the CUDA context.

tmurray · August 26, 2010, 7:50am

Some of the memory is reserved for the kernel driver and the CUDA context.

LDS · August 26, 2010, 12:02pm

Well thank you for answering me. I should have thought about it…

However, what I trully would like to know is if this amount of memory reserved is somehow quantifiable?

LDS · August 26, 2010, 12:02pm

Well thank you for answering me. I should have thought about it…

However, what I trully would like to know is if this amount of memory reserved is somehow quantifiable?

tmurray · August 26, 2010, 3:59pm

Not really. Beyond that reserved area (which can change from driver to driver for various reasons), you can also lose some due to memory manager fragmentation and other basic things like that.

tmurray · August 26, 2010, 3:59pm

Not really. Beyond that reserved area (which can change from driver to driver for various reasons), you can also lose some due to memory manager fragmentation and other basic things like that.

Mitsu_DunDee · November 22, 2010, 2:54pm

Interesting problem, is there any announced api function to estimate the maximal amount of linear memory available ?

Regards

Mitsu_DunDee · November 22, 2010, 2:54pm

Interesting problem, is there any announced api function to estimate the maximal amount of linear memory available ?

Regards

Topic		Replies	Views
How much do i allocate global memory? CUDA Programming and Performance	1	1187	January 8, 2009
Accurately determining available global memory on a CUDA device CUDA Programming and Performance	2	14398	April 11, 2011
Cannot allocate "all" memory? cudaMalloc fails with 50MB memory left.. CUDA Programming and Performance	9	9605	July 15, 2008
bug in memory allocation? CUDA Programming and Performance	6	4160	May 24, 2012
memory allocation problem CUDA Programming and Performance	2	4809	September 8, 2009
How to allocate all available global memory on the GeForce GTX 690 device? CUDA Programming and Performance	0	1171	February 1, 2013
cudaMemGetInfo returns wrong amount free memory CUDA Programming and Performance	3	5278	December 11, 2012
CUBLAS alloc limit CUDA Programming and Performance	9	10810	June 1, 2009
Limit on cublasAlloc? CUDA Programming and Performance	16	10710	October 2, 2010
How is 4GB addressable on 32bit? CUDA Programming and Performance	10	9234	August 21, 2009

Cannot allocate the total amount of global memory benchmark

Related topics