350MB of GPU memory disappear right after context initialization

Hello All,
I’ve just noticed that 350MB of GPU memory mysteriously disappear right after context initialization. Moreover, GPU0 looses twice as much, i.e. 700MB. I have 3 cards in a box –
Device 0: GeForce GTX 480, 1535MB, CC 2.0
Device 1: Tesla C2050, 3071MB, CC 2.0
Device 2: GeForce GTX 480, 1535MB, CC 2.0

But when I call cuMemGetInfo(), I get
0: available GPU memory: 835MB
1: available GPU memory: 2720MB
2: available GPU memory: 1183MB

I’m not running X. When X is launched, it grabs additional ~30MB from GPU0 –
0: available GPU memory: 801MB
1: available GPU memory: 2720MB
2: available GPU memory: 1183MB
which makes a lot of sense… But where are those 350MB?
I use cuda 3.2 on fedora

Any ideas what’s going on?
Thanks a lot!

Apparently, this problem doesn’t exist on another machine running centos. I would greatly appreciate if you could compile/run this code and post your results here!

nvcc memory.cu -o memory -L/usr/local/cuda/lib64 -lcuda

#include <stdio.h>

#include <pthread.h>

#include <cuda.h>

__global__ void Null(){}

void* PrintMemory(void* index)

{

   int id = *((int*)index);

   cudaSetDevice(id);

   Null <<< 1, 1 >>> ();

   size_t mem_tot, mem_free;

   cuMemGetInfo(&mem_free, &mem_tot);

   printf("%d: %dMB free, %dMB total\n", id, mem_free >> 20, mem_tot >> 20);

   return NULL;

}

int main(int argc, char* argv[])

{

   int tot_gpus = 0;

   cudaGetDeviceCount(&tot_gpus);

   if(!tot_gpus){

      printf("GPU not found\n");

      return 1;

   }

   printf("found %d CUDA devices\n", tot_gpus);

for(int i = 0; i < tot_gpus; i++){

      cudaDeviceProp dprop;

      cudaGetDeviceProperties(&dprop, i);

      printf("   Device %d: %16s, %4dMB, CC %d.%d\n", i, dprop.name, dprop.totalGlobalMem >> 20, dprop.major, dprop.minor);

   }

int data[8] = {0,1,2,3,4,5,6,7};

   pthread_t thread_handle[8];

   for(int i = 0; i < tot_gpus; i++)

      pthread_create(thread_handle+i, NULL, PrintMemory, (void*)(data+i));

   for(int i = 0; i < tot_gpus; i++)

      pthread_join(thread_handle[i], NULL);

   return 0;

}

200-300 MB loss seems normal. In my machine

% ./memory

found 2 CUDA devices

Device 0: GeForce GTX 480, 1535MB, CC 2.0

Device 1: GeForce GTX 480, 1535MB, CC 2.0

0: 1336MB free, 1535MB total

1: 1336MB free, 1535MB total

It’s probably static context buffer allocation, which are configurable. The defaults have been increased in toolkit 3.2 for printf() buffers and such, but you can set them yourself with something like:

cudaThreadSetLimit(cudaLimitStackSize, 32768);

Why one GPU could use more than another, I don’t know, but perhaps you haven’t initialized the same number of contexts on all of them.

This looks like a bug in the CUDA 3.2 driver. We are actively investigating this. Thanks for the report!

–Cliff

Cliff, Steve was right regarding context initialization – I did initialize two on card 0 for some stupid reason.
Thank you guys!