CL_OUT_OF_RESOURCES In what situation it can be at ReadBuffer call?

Sometimes my app prints such errors:
ERROR: ReadBuffer(gpu_result_flag,pulse):-5
Where -5 means CL_OUT_OF_RESOURCES .
gpu_result_flag buffer is sizeof(float4).
App works w/o such errors on ATI HD4870 GPU, but fails on GSO9600.

I get no -4 (CL_MEM_OBJECT_ALLOCATION_FAILURE) at buffers creation so at least at allocation time all OK, but why it fails later?
What resource is not available ?

Sometimes my app prints such errors:
ERROR: ReadBuffer(gpu_result_flag,pulse):-5
Where -5 means CL_OUT_OF_RESOURCES .
gpu_result_flag buffer is sizeof(float4).
App works w/o such errors on ATI HD4870 GPU, but fails on GSO9600.

I get no -4 (CL_MEM_OBJECT_ALLOCATION_FAILURE) at buffers creation so at least at allocation time all OK, but why it fails later?
What resource is not available ?

It seems that the NVidia OpenCL drivers don’t actually allocate the memory on the card when you call clCreateBuffer(); the memory is allocated the first time you attempt to use the memory object (I observed this behavior with a program that only called clCreateBuffer repeatedly to test my code in low GPU-ram conditions, and NVidia’s OpenCL drivers would allow me to clCreateBuffer over 6GB worth of buffers on a 1.5GB card). Other than that, I have found that any out-of-bounds memory access on the GPU can lead to CL_OUT_OF_RESOURCES errors in seemingly unrelated places.

It seems that the NVidia OpenCL drivers don’t actually allocate the memory on the card when you call clCreateBuffer(); the memory is allocated the first time you attempt to use the memory object (I observed this behavior with a program that only called clCreateBuffer repeatedly to test my code in low GPU-ram conditions, and NVidia’s OpenCL drivers would allow me to clCreateBuffer over 6GB worth of buffers on a 1.5GB card). Other than that, I have found that any out-of-bounds memory access on the GPU can lead to CL_OUT_OF_RESOURCES errors in seemingly unrelated places.

Thanks.

About my issue - I found the reason. It’s not connected to memory allocation, it’s just max kernel launch time exceed. I think under CUDA it would be reported as -1, but under OpenCL too long kernel execution results in -5 errors it seems. When I run app with splitted longest kernel call (but same memory consumption) I see no these errors.

I use Win Server 2003 x64 for testing, it has no driver restart feature as Vista/7 so kernel call fails in such obscure way.

Thanks.

About my issue - I found the reason. It’s not connected to memory allocation, it’s just max kernel launch time exceed. I think under CUDA it would be reported as -1, but under OpenCL too long kernel execution results in -5 errors it seems. When I run app with splitted longest kernel call (but same memory consumption) I see no these errors.

I use Win Server 2003 x64 for testing, it has no driver restart feature as Vista/7 so kernel call fails in such obscure way.