I’m working under Windows XP32, using CUDA 2.0, GPU for experiments is 8500 GT, it is secondary graphics card (monitor is not connected to it as well as the desktop is not extended onto it) with 256 MB RAM on board.
At the startup of the app cuMemGetInfo informs me that 200531200 bytes are free (about 60 !!! megabytes are gone for unknown reason - but OK, let’s think that it is a frame buffer than can’t be disabled because of drivers e t c).
Then, right after I cudaMalloc an array of 5000 floats (20000 bytes) and bind simple 1d texture to it, cuMemGetInfo returns only 168553216 free bytes! More than 30 megabytes are simply wasted due to completely unknown reason.
After it, all memory allocations are precise and correct.
It looks like a real drawback … instead of 256 MBytes only about 160 are available. Where are the rest 100 megabytes ? Is it a “feature” or a bug ?
At the startup 200MB are available, then 30MB are eaten after the first invocation of cudaMalloc and then no more than about 170 MBs are available (no matter how many cudaMalloc/cudaFree pairs are executed).
I’ve dug out one reason of this issue: cudaSetDevice has been called in other thread, not that thread that I’m actually using for calculations. I’ve moved all the initialization code into the worker thread, no I see this:
Startup: 230 MB available.
cudaMalloc(100 bytes): 200 MB are available, and then no more than 200 MBs are reported as free.
Consequently, about 26 MB are always unavailable and about 30MB are wasted after the first malloc, no matter how big it is.
Issue with threads is my fault, however, even after I’ve fixed it the memory is still wasted.