How to debg OpenCL


Is there way to debug OpenCL ? I have a large OpenCL kernel, a complex algorithm that we have translate from C++… but find no way to debug it ?


Use AMD SDK to build your code for CPU device, and then say use printf()-s in your kernel to debug. Alternatively, define some auxiliary values/arrays to pass to your kernel, and then start with executing only first several statements of your kernel (comment out the rest of your code), write results in these values/arrays, copy them back to the CPU, and then print them out, and check if these are OK. If so, un-comment another bit of your kernel code, and to the same - proceed this way until you find a point where results differ from what you would expect, and then look into corresponding piece of your kernel code, and try to find the error.

I agree it’s sort of back to the days before the interactive debugger, where you can set a stop on a line & examine everything with ever more sophisticated IDEs. In the parallel world, that does not even really make sense. What line of which work unit? With things like reductions, the picture does not always come together until you look at the final memory buffer.

If possible, write a version your process in the host language you are using, in my case Java, then compare outputs. This has worked well for me. I debug the Java version. I compare the OpenCL’s output to Java’s for exactness, & use the differences to give hints on what might be wrong. It is a lot of infrastructure to put in place, but in my case I have almost 0% confidence in the output without this. Producing reasonable output is not the same thing as correct output.

This also produces an artifact that is of some use. The execution time of the Java version helps define my expectations of what OpenCL should do.

I take this approach a step further. Since you’re allowed to have an unlimited number of __kernel functions mixed into your code I write unit test kernels. Then every component can be tested individually. You could even come up with a framework that loads every kernel (clCreateKernelsInProgram) and matches the kernel name with a launcher and set of assertions.

“Debugging sucks, testing rocks!”