Raising an exception in Kernel


I have built a custom error object that I populate device side with an error message, filename, line number etc when an error is detected in a kernel routine. The idea being that this can be inspected later from host to provide clear error reporting to the user.

However, I find that if I try to abort the kernel via something like:


or something less graceful like:

int hats = (int)0xffffffff;
*hats = 12;

then any memory I allocated on the device is destroyed / invalid pointers when I am back to host. Is there a way to abort the kernel without this happening, or should I not try to abort and instead have some global flag telling other threads to finish up and return asap?

Many thanks

You’ll need a global flag and graceful exit. asm(“trap”) and the more polished assert(0) will result in a corrupted CUDA context.

There is currently no abort available that does not result in a corrupted CUDA context.

Thanks, that is very helpful. For my global flag I used a critical section, which means that within each block a single thread can set the error message. However, I am not completely satisfied with the solution as this means a string gets allocated in every block.

Is it worth looking at cooperative groups to synchronize all threads in the grid to provide error handling messages to solve this, or is this overkill?