How to get kernel failure instruction

Hi, I am currently developping quicksort implementation in OpenCL. In order to do the sorting quickly on the GPU architecture, the kernel is rather complex and surprisingly, I have a bug (or rather multiple bugs) there. It’s probably an out of bounds access to some array, but the only think I get from the failing kernel is when I do clFinish after enqueuing the kernel:

CL_INVALID_COMMAND_QUEUE error executing clFinish on GeForce GTX 580 (Device 0).

Well, that’s not much information, I’d prefer at least the PTX instruction address where this error happens (I understand that requesting the line in C code is rather too much). It is somehow possible to obtain on Linux with some version of drivers? Or do you have any other method, how to obtain more info?

Usually, when the kernel just behaves wrong (but does not fail), I could tunnel some info through the global memory, but after this failure, the queue is damaged and I cannot do anything more on it.

As for now, the best way I have developped is to guard all accesses to arrays by C macro-ish runtime checks in the “debug” version, where failing causes skipping the command and pushing debug info (LINE macro works :-) ) to some global memory, which can be read if the kernel does not fail (just gives incorrect results). It’s a pity these checks must be written manually, it would be cool to hack the compiler to generate them automatically.

Awesome solution!
Can you please post the macro?

http://choorucode.com/2011/03/02/cuda-error-checking/