Hi, I am currently developping quicksort implementation in OpenCL. In order to do the sorting quickly on the GPU architecture, the kernel is rather complex and surprisingly, I have a bug (or rather multiple bugs) there. It’s probably an out of bounds access to some array, but the only think I get from the failing kernel is when I do clFinish after enqueuing the kernel:
CL_INVALID_COMMAND_QUEUE error executing clFinish on GeForce GTX 580 (Device 0).
Well, that’s not much information, I’d prefer at least the PTX instruction address where this error happens (I understand that requesting the line in C code is rather too much). It is somehow possible to obtain on Linux with some version of drivers? Or do you have any other method, how to obtain more info?
Usually, when the kernel just behaves wrong (but does not fail), I could tunnel some info through the global memory, but after this failure, the queue is damaged and I cannot do anything more on it.