Problem: What is going on with memory on card ? Why it is wasted so significantly ?


I’m working under Windows XP32, using CUDA 2.0, GPU for experiments is 8500 GT, it is secondary graphics card (monitor is not connected to it as well as the desktop is not extended onto it) with 256 MB RAM on board.

At the startup of the app cuMemGetInfo informs me that 200531200 bytes are free (about 60 !!! megabytes are gone for unknown reason - but OK, let’s think that it is a frame buffer than can’t be disabled because of drivers e t c).

Then, right after I cudaMalloc an array of 5000 floats (20000 bytes) and bind simple 1d texture to it, cuMemGetInfo returns only 168553216 free bytes! More than 30 megabytes are simply wasted due to completely unknown reason.

After it, all memory allocations are precise and correct.

It looks like a real drawback … instead of 256 MBytes only about 160 are available. Where are the rest 100 megabytes ? Is it a “feature” or a bug ?

Thanks in advance.

This is quite interesting! Only NVIDIA guys can answer this…

BUt before that – Did you do a “set device” to your device – just to be sure that you are working on the correct device…

Sure, I did … and that 8500 GT is the only CUDA-compatible card in my system.

When you freed the memory – was it freeing the 30MB as well?? OR Does it become a fixed allocation for life??


What about subsequent invocations of your applications? What behaviour do u c?

At the startup 200MB are available, then 30MB are eaten after the first invocation of cudaMalloc and then no more than about 170 MBs are available (no matter how many cudaMalloc/cudaFree pairs are executed).

I’ve dug out one reason of this issue: cudaSetDevice has been called in other thread, not that thread that I’m actually using for calculations. I’ve moved all the initialization code into the worker thread, no I see this:

Startup: 230 MB available.

cudaMalloc(100 bytes): 200 MB are available, and then no more than 200 MBs are reported as free.

Consequently, about 26 MB are always unavailable and about 30MB are wasted after the first malloc, no matter how big it is.

Issue with threads is my fault, however, even after I’ve fixed it the memory is still wasted.