clReleaseMemObject not working? The clReleaseMemObject function doesn't release the memory

Hey all, I’m only a student at my college. This i LAST semester! Anyways, I’m writing simple adding and sorting programs where the kernel just takes in A and s to B and adds their element pairs to get C. The only trouble is, C isn’t being released at all, infact A, B, and C, are all staying in the GPU memory. I use the clReleaseMemObject, I don’t know what other function to use to tell the Device to free its memory.

So my question is, what is it about clReleaseMemObject that I need to know about. I’ve read the man pages on OCL, I’m just lost here.

Don’t you call clRetainMemObject() on your memory objects somewhere? Otherwise I have no idea :-( Paste your host code.

I also have problems with clReleaseMemObject. If I create a few big buffers, release them and try to create and enqueue some new ones I get a MEM_OBJECT_ALLOCATION_FAILURE. The same happens if I allocate/deallocate many small ones. It looks that Retain/Release pairs are correct (I have this functionality wrapped in a shared pointer class in C++), while the allocation failure seems to happen when the accumulated buffers’ sizes reach the GPU memory size.

I further researched the issue and I got to the following.

When clCreateBuffer is called to create a buffer object, the cl_mem object returned already has a reference count of 1 (see: Apple). This means that this object will be deallocated when the api (or whatever structure holds it) is destructed and probably happens when the program finishes and not when one thinks that he has released the last reference to it. In order to get rid of this, one should probably explicitly call a clMemReleaseObject on the pointer returned by clCreateBuffer, i.e. something like:

cl_mem tmp = clCreateBuffer(session_->get_context().id(), CL_MEM_READ_WRITE,

							   sizeof(cl_value_type) * global_work_size(), NULL, &error) );

memory_object device_data_(tmp); // memory_object is a wrapper around cl_mem with automatic Retain and Release clauses. Works like a smart pointer.


// Now when device_data_ is destroyed the memory held by the cl_mem object will be deallocated.

As long as tmp is not used again, this should be safe. The above probably holds for all objects created by opencl (contexts, command queues etc.). Can anybody think a reason this would mess up with openCL implementations?

Safely wrapping this functionality (i.e. automatic release at of tmp in memory_object class), would require a way to detect rvalueness, that is only avaliable in the next c++ standard. This would allow something like:

device_data_ = clCreateBuffer(session_->get_context().id(), CL_MEM_READ_WRITE,

							   sizeof(cl_value_type) * global_work_size(), NULL, &error) );

There maybe a work around though with move semantics.



+1, this is very frustrating.

Problem well described here (including code for reproducing):
“However on NVIDIA systems, a final clReleaseMemObject() will not free the memory segment in GPU memory, if not every other OpenCL object has been freed too.”

I fixed my memory leak by adding single missed clReleaseProgram (all others objects were released by appropriate clRelease - including context and buffer, but without this single clReleaseProgram my app was still vRAM leaking)

For me problem exists on Windows (378 driver) and Linux (375 driver).

1 Like