350MB of GPU memory disappear right after context initialization

_device · February 8, 2011, 12:49am

Hello All,
I’ve just noticed that 350MB of GPU memory mysteriously disappear right after context initialization. Moreover, GPU0 looses twice as much, i.e. 700MB. I have 3 cards in a box –
Device 0: GeForce GTX 480, 1535MB, CC 2.0
Device 1: Tesla C2050, 3071MB, CC 2.0
Device 2: GeForce GTX 480, 1535MB, CC 2.0

But when I call cuMemGetInfo(), I get
0: available GPU memory: 835MB
1: available GPU memory: 2720MB
2: available GPU memory: 1183MB

I’m not running X. When X is launched, it grabs additional ~30MB from GPU0 –
0: available GPU memory: 801MB
1: available GPU memory: 2720MB
2: available GPU memory: 1183MB
which makes a lot of sense… But where are those 350MB?
I use cuda 3.2 on fedora

Any ideas what’s going on?
Thanks a lot!

_device · February 8, 2011, 7:49pm

Apparently, this problem doesn’t exist on another machine running centos. I would greatly appreciate if you could compile/run this code and post your results here!

nvcc memory.cu -o memory -L/usr/local/cuda/lib64 -lcuda

#include <stdio.h>

#include <pthread.h>

#include <cuda.h>

__global__ void Null(){}

void* PrintMemory(void* index)

{

   int id = *((int*)index);

   cudaSetDevice(id);

   Null <<< 1, 1 >>> ();

   size_t mem_tot, mem_free;

   cuMemGetInfo(&mem_free, &mem_tot);

   printf("%d: %dMB free, %dMB total\n", id, mem_free >> 20, mem_tot >> 20);

   return NULL;

}

int main(int argc, char* argv[])

{

   int tot_gpus = 0;

   cudaGetDeviceCount(&tot_gpus);

   if(!tot_gpus){

      printf("GPU not found\n");

      return 1;

   }

   printf("found %d CUDA devices\n", tot_gpus);

for(int i = 0; i < tot_gpus; i++){

      cudaDeviceProp dprop;

      cudaGetDeviceProperties(&dprop, i);

      printf("   Device %d: %16s, %4dMB, CC %d.%d\n", i, dprop.name, dprop.totalGlobalMem >> 20, dprop.major, dprop.minor);

   }

int data[8] = {0,1,2,3,4,5,6,7};

   pthread_t thread_handle[8];

   for(int i = 0; i < tot_gpus; i++)

      pthread_create(thread_handle+i, NULL, PrintMemory, (void*)(data+i));

   for(int i = 0; i < tot_gpus; i++)

      pthread_join(thread_handle[i], NULL);

   return 0;

}

gshi · February 9, 2011, 5:41pm

200-300 MB loss seems normal. In my machine

% ./memory

found 2 CUDA devices

Device 0: GeForce GTX 480, 1535MB, CC 2.0

Device 1: GeForce GTX 480, 1535MB, CC 2.0

0: 1336MB free, 1535MB total

1: 1336MB free, 1535MB total

Apparently, this problem doesn’t exist on another machine running centos. I would greatly appreciate if you could compile/run this code and post your results here!

nvcc memory.cu -o memory -L/usr/local/cuda/lib64 -lcuda

#include <stdio.h>

#include <pthread.h>

#include <cuda.h>

__global__ void Null(){}

void* PrintMemory(void* index)

{

   int id = *((int*)index);

   cudaSetDevice(id);

   Null <<< 1, 1 >>> ();

   size_t mem_tot, mem_free;

   cuMemGetInfo(&mem_free, &mem_tot);

   printf("%d: %dMB free, %dMB total\n", id, mem_free >> 20, mem_tot >> 20);

   return NULL;

}

int main(int argc, char* argv[])

{

   int tot_gpus = 0;

   cudaGetDeviceCount(&tot_gpus);

   if(!tot_gpus){

      printf("GPU not found\n");

      return 1;

   }

   printf("found %d CUDA devices\n", tot_gpus);

for(int i = 0; i < tot_gpus; i++){

      cudaDeviceProp dprop;

      cudaGetDeviceProperties(&dprop, i);

      printf("   Device %d: %16s, %4dMB, CC %d.%d\n", i, dprop.name, dprop.totalGlobalMem >> 20, dprop.major, dprop.minor);

   }

int data[8] = {0,1,2,3,4,5,6,7};

   pthread_t thread_handle[8];

   for(int i = 0; i < tot_gpus; i++)

      pthread_create(thread_handle+i, NULL, PrintMemory, (void*)(data+i));

   for(int i = 0; i < tot_gpus; i++)

      pthread_join(thread_handle[i], NULL);

   return 0;

}

SPWorley · February 9, 2011, 7:43pm

It’s probably static context buffer allocation, which are configurable. The defaults have been increased in toolkit 3.2 for printf() buffers and such, but you can set them yourself with something like:

cudaThreadSetLimit(cudaLimitStackSize, 32768);

Why one GPU could use more than another, I don’t know, but perhaps you haven’t initialized the same number of contexts on all of them.

Cliff_Woolley · February 10, 2011, 9:24pm

This looks like a bug in the CUDA 3.2 driver. We are actively investigating this. Thanks for the report!

–Cliff

_device · February 10, 2011, 9:39pm

Cliff, Steve was right regarding context initialization – I did initialize two on card 0 for some stupid reason.
Thank you guys!

Topic		Replies	Views
Incorrect total memory reported by cudaMemGetInfo CUDA Programming and Performance	8	6593	June 11, 2012
Huge memory leak CUDA Programming and Performance	16	5633	July 27, 2016
Only 300 MB of free memory on Tesla S2050 GPUs CUDA Programming and Performance	6	1196	January 11, 2011
Maximum amount of memory you can cudamalloc? CUDA Programming and Performance	5	15142	February 22, 2010
Strange memory consumption on the device CUDA Programming and Performance	9	3524	July 13, 2017
Problem: What is going on with memory on card ? Why it is wasted so significantly ? CUDA Programming and Performance	4	1617	September 1, 2008
Got out of memory from cudaMemcpy CUDA Programming and Performance	13	4058	January 28, 2022
Predictable? how much device memory per device context creation. CUDA Programming and Performance	6	1908	March 31, 2016
Determine Memory CUDA Context Memory Usage CUDA Programming and Performance	16	10772	March 9, 2019
free kernel code after execution CUDA Programming and Performance	8	4799	June 23, 2012

350MB of GPU memory disappear right after context initialization

Related topics