clReleaseMemObject not working? The clReleaseMemObject function doesn't release the memory

Hey all, I’m only a student at my college. This i LAST semester! Anyways, I’m writing simple adding and sorting programs where the kernel just takes in A and s to B and adds their element pairs to get C. The only trouble is, C isn’t being released at all, infact A, B, and C, are all staying in the GPU memory. I use the clReleaseMemObject, I don’t know what other function to use to tell the Device to free its memory.

So my question is, what is it about clReleaseMemObject that I need to know about. I’ve read the man pages on OCL, I’m just lost here.

Don’t you call clRetainMemObject() on your memory objects somewhere? Otherwise I have no idea :-( Paste your host code.

I also have problems with clReleaseMemObject. If I create a few big buffers, release them and try to create and enqueue some new ones I get a MEM_OBJECT_ALLOCATION_FAILURE. The same happens if I allocate/deallocate many small ones. It looks that Retain/Release pairs are correct (I have this functionality wrapped in a shared pointer class in C++), while the allocation failure seems to happen when the accumulated buffers’ sizes reach the GPU memory size.

I further researched the issue and I got to the following.

When clCreateBuffer is called to create a buffer object, the cl_mem object returned already has a reference count of 1 (see: Apple). This means that this object will be deallocated when the api (or whatever structure holds it) is destructed and probably happens when the program finishes and not when one thinks that he has released the last reference to it. In order to get rid of this, one should probably explicitly call a clMemReleaseObject on the pointer returned by clCreateBuffer, i.e. something like:

cl_mem tmp = clCreateBuffer(session_->get_context().id(), CL_MEM_READ_WRITE,

							   sizeof(cl_value_type) * global_work_size(), NULL, &error) );

memory_object device_data_(tmp); // memory_object is a wrapper around cl_mem with automatic Retain and Release clauses. Works like a smart pointer.

clReleaseMemObject(tmp);

// Now when device_data_ is destroyed the memory held by the cl_mem object will be deallocated.

As long as tmp is not used again, this should be safe. The above probably holds for all objects created by opencl (contexts, command queues etc.). Can anybody think a reason this would mess up with openCL implementations?

Safely wrapping this functionality (i.e. automatic release at of tmp in memory_object class), would require a way to detect rvalueness, that is only avaliable in the next c++ standard. This would allow something like:

device_data_ = clCreateBuffer(session_->get_context().id(), CL_MEM_READ_WRITE,

							   sizeof(cl_value_type) * global_work_size(), NULL, &error) );

There maybe a work around though with move semantics.

Best

Nasos

+1, this is very frustrating.

Problem well described here (including code for reproducing):

[url]Bloerg – 404
“However on NVIDIA systems, a final clReleaseMemObject() will not free the memory segment in GPU memory, if not every other OpenCL object has been freed too.”

I fixed my memory leak by adding single missed clReleaseProgram (all others objects were released by appropriate clRelease - including context and buffer, but without this single clReleaseProgram my app was still vRAM leaking)

For me problem exists on Windows (378 driver) and Linux (375 driver).

1 Like

this bug is solved?

There appear to be multiple inquiries in this thread, and I am not sure they are all discussing the same thing, as essentially none of them include any complete examples. I have looked at the linked article, and built a test case around that. The claim there seems to be that a clReleaseContext should free all other context-oriented resources, whether they have been explicitly released or not. According to my testing, it is still the case that a clReleaseContext may not free other related resources (I used the implicitly created clEvent from the kernel enqueue for my test). I don’t actually know what correct behavior is or should be.

I have filed nvbug 3754877 to take a look at this. I’m not sure when any updates will be available.

In the meantime, my suggestion is to carefully free every resource you use. According to my testing, that works. Do not depend on clReleaseContext to free resources (other than the context itself) for you.

I won’t be able to engage in discussions about correct behavior. I have filed a bug to have our development team look at it. They are much more knowledgeable about OpenCL than I am.

If there are other inquiries that for example suggest a clReleaseMemObject is not working (by itself) I have not investigated those, and I personally don’t find a clear description of that issue here in this thread, such that I can concoct a test case to look at.