I have a cooperative grid of kernels performing a parallel search. Once goal is found by any of the kernels the problem is solved and all need to terminate. I sort of done it with a global flag variable, but it is cumbersome. Is there an API call, or perhaps “trap” instruction can be used for this?
There isn’t any convenient system that I am aware of. There is a “trap” like instruction (and you can actually just use assert(0); in kernel code) but this results in a corrupted CUDA context, which is not what most people are thinking about when discussing early kernel termination.