CL_OUT_OF_RESOURCES on clEnqueueReadBuffer

Hej folks,

I’m currently developing a simple blur on an greyscale image.

Currently I’m getting an CL_OUT_OF_RESOURCES on my first clEnqueueReadBuffer call.
After this call every clEnqueueWriteBuffer and clEnqueueReadBuffer fails with that error.

The kernel execution and the wait for finish is successful.

I used to question the maximum allocation size and took buffers that fit into that size but when I was doing that the computer crashed.
Now I’ using a much smaller size and the computer doesn’t crash but the program is not successful, too.

Can anybody tell me how I should find that failure?



that is my kernel-code:

__kernel void blur(

    __global ushort* input,

    __global ushort* output,

    __global float* output2,

    int myRadius,

    uint rowLength,

    uint dataLength)


    int i = get_global_id(0);

    int max_ushort = (int)pow(2.,16.); // seems to be no problem 2*2*2*2*2*2*2*2*2*2*2*2*2*2*2*2; doesn't work, too



        int x = i%rowLength;//(i & (rowLength-1));

        int y = i/rowLength;

        int height = dataLength/rowLength;

        int xtmp = 0;

        int ytmp = 0;

        float tmp = 0; // average between surrounding pixels

        float divisor = (float)((myRadius*2+1)*(myRadius*2+1));

// build average in radius of pixel

        for(int j = -myRadius; j < myRadius; j++)


            ytmp = y+j;

            if(ytmp < 0)

                ytmp = 0;//-ytmp;

            if(ytmp > height-1)

                ytmp = height-1;//2*height - ytmp - 1;

for(int k = -myRadius; k < myRadius; k++)


                xtmp = x+k;

                if(xtmp < 0)

                    xtmp = 0;//-xtmp;

                if(xtmp > rowLength-1)

                    xtmp = rowLength-1;//2*rowLength - xtmp - 1;

tmp = tmp + (float)(input[xtmp + rowLength*ytmp]);



        tmp = tmp / divisor;

        if((int)tmp > max_ushort)

            tmp = (float)max_ushort;

        if((int)tmp < 0)

            tmp = (float)0;

        ushort tmpResult = (ushort)tmp;

        //output[i] = tmpResult;

        output2[i] = (float)tmpResult;

        output[i] = input[i];




        output[i] = input[i];



The interesting fact is: the failure just occures when I write back tmp in any way.



OUT_OF_RESOURCES with kernels is often due to memory accesses outside of allocated buffers. Make sure that your buffers are as large as you believe they are, and that you read and write to/from where you intended.

I already checked it many times.

And everything works fine when I write input to output.

But when I use tmp instead of input the CL_OUT_OF_RESOURCES occures.

Indeed when I try to write tmp in any way to the normal memory the program breaks.



Not writing tmp means that the compiler will optimize away stuff which only purpose is to create tmp, which in your case it appears to be the nested for-loops. The only memory access in there I could find is

tmp = tmp + (float)(input[xtmp + rowLength*ytmp]);

If the CL_OUT_OF_RESOURCES is due to bad memory access, then changing the above line to

tmp = tmp + (float)(0);

should remove the error. If not then I’m wrong and we will have to search for something else.

I changed the line to

tmp = tmp + (float)(1);

but the same error occures…

Could you zip together both the host code and the kernel code and I’ll have a look over here?

I already found the failure:

It seems that “tmp” is already used in the OpenCL internals.

I changed the name to “result” and everything works fine now.

EDIT: On the computer in my company (MacOS) everything works fine now.

  At home: still the same problem.

Here is my project.

You’ll need QtOpenCL for this. (14.4 KB)

Hej folks,

I now have tried a version with uint instead of ushorts and now it runs perfectly.

The big question is: WHY??


Hej all,

can anybody please give feedback?
1st: Can’t you reproduce the failure?
2nd: Does anybody know if the failure might happen because fo the work sizes?


Hej all,

I found the failure in the DevDrivers releasenotes.
When the kernel needs to long to execute the graficcard crashes or the known CL_OUT_OF_RESSOURCES occures.


Hi Henrik,

I think I suffer from the same problem. My question is, assuming that the timeout is the cause for the problem, how did you resolve this problem in your case?


Hi i+d,

I got rid of that bug by reducing the amount of data to be calculated on one kernel call.

Other possibilities would be:

  • simplify the kernel as much as possible

  • split your kernel into several other kernels to reduce the operations done on one kernel call

I hope that helps.