cudaMalloc fails on huge allocation

Hello everyone.
I have run into allocation problems with my program. I have a Geforce GT 8800. NVIDIA X Server settings window tells me I have 512mb onboard memory, which is correct. I installed 260.19.26 development drivers and I use Cuda toolkit 3.2. My application requires me to perform least squares on huge matrices of floats. For that I use the CULA library.

The current example I’m working with results in a matrix with 44 columns. As soon as I pass the number of about 1000000 rows, the program fails. I inserted asserts and they result in the following:
mem::Linear::Linear(const T*, size_t) [with T = float, size_t = long unsigned int]: Assertion `cudaMalloc(&_dev,_size*sizeof(T)) == cudaSuccess’ failed.

Linear is a template class I made as wrapper for linear memory on GPU. 44 x 1000000 x 4 = 176000000, which is only 176 mb? (assuming a float is 4 bytes).
What am I doing wrong? I have the feeling not all my memory is being used…

CUDA needs to use some of the device memory for its own purposes, although not so much that it would explain the problem you are having. At the beginning of your program, you should call cudaGetMemInfo() to determine how much available memory there is. I’m curious if the amount of device memory used by the driver is abnormally high for some reason.

Hey seibert,
Thank you for the information, I did this and got the following: (free, free in mb, total, total in mb, usage in mb)
294019072 280.398 536150016 511.312 230.914

I then removed the initialisations for cublas and cula, and got the following:
297054208 283.293 536150016 511.312 228.02
So about half of my gpu memory is filled with something… Could it be because I am running KDE or something? I must be doing something wrong for sure.
(btw I just rebooted before the test so it would be right after the start)£
I rebooted again to assure it wasn’t caching the libraries, and it was even worse:
265269248 252.98 536150016 511.312 258.332

So I did another test, and after reading the matrix with 1000000 rows:
108274944 103.259 536150016 511.312 408.053
I then tried 1100000. Top usage was:
85361920 81.4075 536150016 511.312 429.905

So the allocation fails due to filled memory for sure. Any ideas why the memory is already filled 50% at startup?

Thanks for the help.

Hmm, can you try stopping X and running your memory test from the console? I have no idea how much device memory a GUI desktop requires.

I had in mind that CUDA took ~150 MB of memory from the device, but I haven’t checked if that has changed significantly with new releases. (It’s been a while since I’ve run code seriously on our 8800 GT. I should figure out which compute node it has wandered off to and check this…)

Ok I ran the application in runlevel 2 and it used only 29mb at the start. I guess I have my answer.