can someone give me hint how to use cuMemGetInfo() ?

i cannot find something about in SDK samples …


g++ -o gpumeminfo gpumeminfo.c -I /usr/local/cuda/include/  -L /usr/local/cuda/lib/ -lcuda


#include <cuda.h>

#include <stdio.h>

static unsigned long inKB(unsigned long bytes)

{ return bytes/1024; }

static unsigned long inMB(unsigned long bytes)

{ return bytes/(1024*1024); }

static void printStats(unsigned long free, unsigned long total)


   printf("^^^^ Free : %lu bytes (%lu KB) (%lu MB)\n", free, inKB(free), inMB(free));

   printf("^^^^ Total: %lu bytes (%lu KB) (%lu MB)\n", total, inKB(total), inMB(total));

   printf("^^^^ %f%% free, %f%% used\n", 100.0*free/(double)total, 100.0*(total - free)/(double)total);


int main(int argc, char **argv)


   unsigned int free, total;

   int gpuCount, i;

   CUresult res;

   CUdevice dev;

   CUcontext ctx;



   printf("Detected %d GPU\n",gpuCount);

  for (i=0; i<gpuCount; i++)



   cuCtxCreate(&ctx, 0, dev);

   res = cuMemGetInfo(&free, &total);

   if(res != CUDA_SUCCESS)

       printf("!!!! cuMemGetInfo failed! (status = %x)", res);

   printf("^^^^ Device: %d\n",i);

   printStats(free, total);



  return 0;


thank you very much

One year later and your code was useful to me, also. Thanks.

But for others who like me, begin with templates/project and wonder why in building the code, you get variious errors like

“error LNK2019: unresolved external symbol _cuDeviceGetCount@4 referenced in function _main”

The solution is to go to Project / Properties / Linker / Input

The Additional Dependencies on the Template / Project are listed as: cudart.lib cutil32D.lib

You need to add cuda.lib

Then the linking will work. Puzzles me why it is not in the tenplate but that is how it is.

Thanks for the post again. It works for me. And thanks for the reminding of cuda.lib.

But how can get it work in Linux SDK examples, like template. I saw a flag called “USEDRVAPI” then lib += -lcuda, else lib += -lcudart. I add USEDRVAPI = 1 into template Makefile, but failed to link again. Then I ignored it, directly add lib += -lcuda, this work, but I wonder there must be some problems. What do you think?

Maybe it’s time to move on, try to write a makefile by myself.

It is hard to combine them together how,

hm… strange, nvcc seemed not to like the order
res = cuMemGetInfo(&free, &total);

as I get the error: error: argument of type “unsigned int *” is incompatible with parameter of type “size_t *”

I just edit the declaration of “free” and “total” to size_t and it works fine.

thanks for the post, very useful!

You will notice that the code you were trying was posted almost 3 years ago. In older versions of CUDA, the prototype used unsigned ints, since CUDA 3.0 it has been using size_t. On a 32 bit platform, it should probably still compile. On a 64 bit platform, it shouldn’t. I am guessing your machine falls into the latter category.

oh, it seemed compiled in my 64bit machine:
[ cuda_matlab]$ gpumeminfo
Detected 4 GPU
!!! cuMemGetInfo failed! (status = c9)^^^^ Device: 0
^^^^ Free : 0 bytes (0 KB) (0 MB)
^^^^ Total: 0 bytes (0 KB) (0 MB)
^^^^ nan% free, nan% used
^^^^ Device: 1
^^^^ Free : 109314048 bytes (106752 KB) (104 MB)
^^^^ Total: 131399680 bytes (128320 KB) (125 MB)
^^^^ 83.192020% free, 16.807980% used
^^^^ Device: 2
^^^^ Free : 4254142208 bytes (4154435 KB) (4057 MB)
^^^^ Total: 4294770688 bytes (4194112 KB) (4095 MB)
^^^^ 99.054001% free, 0.945999% used
^^^^ Device: 3
^^^^ Free : 4254142208 bytes (4154435 KB) (4057 MB)
^^^^ Total: 4294770688 bytes (4194112 KB) (4095 MB)
^^^^ 99.054001% free, 0.945999% used

Now I just getting to figure out if it is possible to free the memory in device 0 without actually rebooting the machine.

thanks for your help,

It seems like I can’t free all the memory, even if I restart or shut down the machine it says that like 120 MB is used?

The CUDA driver reserves some device memory for itself, as does any fancy GUI desktop you might be running.

But 120 MB? I have three Nvidia GTX 480, so at least two of them should not use any memory for the GUI desktop.

The CUDA context uses memory - there is space reserved for printf and C++ support, as well as the required per thread local memory scratch of any kernels loaded into the context. The runtime API has a function cudaThreadSetLimit() which gives some degree of control over the memory footprint of a context on Fermi cards.