out of resources when clEnqueueReadBuffer

hello guys,

i’m writing a program with two opencl kernels. both of them run fine. but eventually, i want to loop the kernels for many many times. the output of the kernel 1 is the input of the kernel 2. so my program looks like this:

main()

{

	 initialize kernel 1;

	 initialize kernel 2;

	 for(int i=0;i<n;++i)

	 {

			initialize the input for kernel 1;	

			execute kernel 1;

			read the result of kernel 1;

			initialize the input for kernel 2;

			execute kernel 2;

			read the result of kernel 2;

	 }

}

however, my code can only loop only for two times, and then i got a CL_OUT_OF_RESOURCES error, when i read the result of the kernel 2:

ciErr1 = clEnqueueReadBuffer(cqCommandQueue, cmDevNeighbors, CL_TRUE, 0, sizeof(cl_float) * iNumElements*60, neighbors, 0, NULL, NULL);

here are the things i don’t quite understand:

first of all, according to this online specification of opencl 1.0 http://www.khronos.org/opencl/sdk/1.0/docs…tml/errors.html, the CL_OUT_OF_RESOURCES should not be returned by the clEnqueueReadBuffer function.

second, my understanding of the CL_OUT_OF_RESOURCES error is that my kernel program uses up all the registers. but why would a reading back function need registers.

third, the kernels can run twice perfectly but have this problem for the third time. however, both the kernel program and the size of the input arrays are fixed. if the kernels can run for once, that means the resources should be enough for all the following executions. why did it stop at the third time?

one thing i’m not sure though is that i didn’t release my cl_mem pointers after each execution, instead i reuse these cl_mem pointers by writing the new data to them. the size of the data is fixed. so the program looks like this:

cmDevBuffer=clCreateBuffer(cxGPUContext, CL_MEM_READ_ONLY, sizeof(cl_int) *hostBuffer.size(), NULL, &ciErr1);

ciErr1 = clSetKernelArg(Kernel1, 2, sizeof(cl_mem), (void*)&cmDevBuffer);

ciErr1 |= clEnqueueWriteBuffer(cqCommandQueue, cmDevBuffer, CL_TRUE, 0, sizeof(cl_int) * hostBuffer.size(), &(hostBuffer[0]), 0, NULL, NULL);

	

for(int i=0;i<n;++i)

{

	run kernel1;

	ciErr1 = clEnqueueWriteBuffer(cqCommandQueue, cmDevCellPointsBeginAndEnd, CL_TRUE, 0, sizeof(cl_int) * cellPointsBeginAndEnd.size(), &(cellPointsBeginAndEnd[0]), 0, NULL, NULL);

	

}

should i release the cmDevBuffer at the end of the loop chunk and recreate a new one at the beginning of the loop chunk? like this:

for(int i=0;i<n;++i){

	 cmDevBuffer=clCreateBuffer(...);

	 clSetKernelArg();

	 clEnqueueWriteBuffer();

	 call Kernel1;

	 call Kernel2;

	 clEnqueueReadBuffer();

	 clReleaseMemObject(cmDevBuffer);

}

i didn’t do this, because i thought it is unnecessary as the size of the buffer doesn’t change. but now i do have the out of resources problem.

i will try this option, but if it doesn’t work, i will have no idea how to fix this.

is there anybody having the same problem before or having an idea of what the problem might be?

Thank you very much.

interesting thing happened.

the code was run on my mac book with a nv m9400 card.

now i’ve switched to my desktop with 8800 gtx, the code runs perfectly. i looped the code for 50 times, and there is no problem.

may this a driver bug? cause i know notebook driver is in demo.

you see, coding on a graphics card is always a frustrating experience, because even though everything is correct, things don’t work.

You do not say what you mean by run or call kernel. You are queuing blocking reads, so perhaps the out of resources is really occurring in the kernel call. Try inserting a finish() before the reads to rule this out.

I am queuing a set of 3 kernels followed by a blocking read in a loop of 7600 iterations. I never release my cl_mem until the end. I successfully run this on my MacBook 9400M & 9600M. It worked with both the July & Oct OpenCL frameworks.

I have the same problem of getting CL_OUT_OF_RESOURCES when clEnqueueReadBuffer on a specific buffer, even tho this behavior is undefined by OpenCL specification.
The code runs smoothly on AMD OpenCL implementation.
Driver bug?

CL_OUT_OF_RESOURCES occurs in clEnqueueReadBuffer or in Kernel calls when the program takes long time to execute. This may result either in Graphics car being flushed or
the above problem being originated. I think this reset time is lesser for NVIDIA gpus when compared to AMD or MAC and hence, u dont see dat prob while running on those platforms.

see /usr/local/cuda/doc/OpenCL_Implementation_Notes.txt

I had similar problems with out-of-range accesses that —at least I assume— somehow damaged important contents of the graphics memory.

Returning CL_OUT_OF_RESOURCES somehow violates the specs in most cases, but it is at least documented.

Regards,

Markus