Kernel fails, no errors or explanation Smaller kernel runs fine

I hate coming to the forums just to ask a question, but I’m stumped.

When I run a small kernel, it compiles and executes fine, returning the right answer. When I increase the kernel size to 50 lines, all the result buffers are full of zeros.

The ptx code looks fine. It’s about 760 lines (instructions).

Is there a limit on kernel size? Is there some way to check if the kernel was actually executed or if it crashed?


You may have missed to add a context error callback, pass a callback function to clCreateContext and see if it gets called.

You are fantastic! That was exactly it. I’m getting CL_OUT_OF_RESOURCES error executing CL_COMMAND_NDRANGE_KERNEL when I increase the size of my kernel past a certain point.

Is there a limit on the size of a kernel? My card should have 65k of constant memory, and instructions are loaded in constant memory. The assembly in TEXT format takes up only 9k.

Thanks so much for helping me find that error. I’ve been stuck on it for a while and I was completely perplexed as to what was going wrong.

Problem solved!

The registers on the device are divided up per thread, allowing immediate and free context switching without loading anything onto or off of a stack. I forgot about this, and spawned more threads than CL_KERNEL_WORK_GROUP_SIZE, meaning that the number of threads * the number of registers per thread is greater than the number of registers on the device.

If you run into the error CL_OUT_OF_RESOURCES error executing CL_COMMAND_NDRANGE_KERNEL then request the max number of work group items using clGetKernelWorkGroupInfo() and make sure that the number going into clEnqueueNDRangeKernel() is less than or equal to this!