im writing a bunch of tools using OpenCL. i am having the same problem with all of my kernels: random-ass-out-of-resource-errors!
my card has 128mb of allocatable memory. the buffers i use are usually pretty small. the datasets pretty large. so i run a kernel with a little buffer-data, some constants and privates in it a lot of times.
There seem to be a critical combination of buffersize (not even close to 128mb though) and the number of iterations done in a single kernel. Its not so much that it just doesnt work as i expect it to, its more that i don’t understand it.
Take look at this test-kernel:
__kernel void test()
float count = 10000000;
count of 10M works, count of 100M crashes - i would have excepted that even count = MAXFLOAT would be possible.
It doesnt even use any buffers, but increasing the iteration-depth will at some point result in “out-of-resources”. if i enable the unrolling extension, behaviour changes somehow, but still at some point: out-of-res. the more buffers i use, the deeper the iteration, the sooner the error.
I want to be able to make sure that this doesnt happen - preferably by knowing why it happens, and setting my depths and buffersizes according to the limits i can calculate from my gpu’s stats.
What i imagine the problem to be right now is something like: iterations get unrolled at some point and if a iteration is too deep, the memory it needs, grows to large.
Somebody please enlighten me whats going on here… How can i predict this behaviour?
Just to round this one off: my problem was the runtime-limit which is active on my nvidia-gpu when having a graphical ui running on it besides my opencl-code. shut down X and all is well! you can “predict” that by checking your device for CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV.