clEnqueueNDRangeKernel crashes with CL_OUT_OF_RESOURCES

Hello all
I am trying a simple OpenCL application.
It is a brute force ray triangle intersection code wherein I pass all rays and all triangles to the GPU and it gives me the result.
If I use __global variables for the parameters the function works fine.

Now, I am trying to optimise the function. I changed the triangle’s vertices to constant values and now it gives me CL_OUT_OF_RESOURCES.
I know that constant memory is 64K. I am passing only 64 triangles => 6433*4 = 2304 bytes of data as constants.

Another thing is that if I don’t write the result back to the result __global variable, it works fine.

Hope I am clear in defining my problem.

Any and all help appreciated.

Thanks