hello guys,
i’m writing a program with two opencl kernels. both of them run fine. but eventually, i want to loop the kernels for many many times. the output of the kernel 1 is the input of the kernel 2. so my program looks like this:
main()
{
initialize kernel 1;
initialize kernel 2;
for(int i=0;i<n;++i)
{
initialize the input for kernel 1;
execute kernel 1;
read the result of kernel 1;
initialize the input for kernel 2;
execute kernel 2;
read the result of kernel 2;
}
}
however, my code can only loop only for two times, and then i got a CL_OUT_OF_RESOURCES error, when i read the result of the kernel 2:
ciErr1 = clEnqueueReadBuffer(cqCommandQueue, cmDevNeighbors, CL_TRUE, 0, sizeof(cl_float) * iNumElements*60, neighbors, 0, NULL, NULL);
here are the things i don’t quite understand:
first of all, according to this online specification of opencl 1.0 http://www.khronos.org/opencl/sdk/1.0/docs…tml/errors.html, the CL_OUT_OF_RESOURCES should not be returned by the clEnqueueReadBuffer function.
second, my understanding of the CL_OUT_OF_RESOURCES error is that my kernel program uses up all the registers. but why would a reading back function need registers.
third, the kernels can run twice perfectly but have this problem for the third time. however, both the kernel program and the size of the input arrays are fixed. if the kernels can run for once, that means the resources should be enough for all the following executions. why did it stop at the third time?
one thing i’m not sure though is that i didn’t release my cl_mem pointers after each execution, instead i reuse these cl_mem pointers by writing the new data to them. the size of the data is fixed. so the program looks like this:
cmDevBuffer=clCreateBuffer(cxGPUContext, CL_MEM_READ_ONLY, sizeof(cl_int) *hostBuffer.size(), NULL, &ciErr1);
ciErr1 = clSetKernelArg(Kernel1, 2, sizeof(cl_mem), (void*)&cmDevBuffer);
ciErr1 |= clEnqueueWriteBuffer(cqCommandQueue, cmDevBuffer, CL_TRUE, 0, sizeof(cl_int) * hostBuffer.size(), &(hostBuffer[0]), 0, NULL, NULL);
for(int i=0;i<n;++i)
{
run kernel1;
ciErr1 = clEnqueueWriteBuffer(cqCommandQueue, cmDevCellPointsBeginAndEnd, CL_TRUE, 0, sizeof(cl_int) * cellPointsBeginAndEnd.size(), &(cellPointsBeginAndEnd[0]), 0, NULL, NULL);
}
should i release the cmDevBuffer at the end of the loop chunk and recreate a new one at the beginning of the loop chunk? like this:
for(int i=0;i<n;++i){
cmDevBuffer=clCreateBuffer(...);
clSetKernelArg();
clEnqueueWriteBuffer();
call Kernel1;
call Kernel2;
clEnqueueReadBuffer();
clReleaseMemObject(cmDevBuffer);
}
i didn’t do this, because i thought it is unnecessary as the size of the buffer doesn’t change. but now i do have the out of resources problem.
i will try this option, but if it doesn’t work, i will have no idea how to fix this.
is there anybody having the same problem before or having an idea of what the problem might be?
Thank you very much.