I have an algorithm where a set of kernels need to be invoked repeatedly until a termination criterion is satisfied. The question is, how can I efficiently identify when that has happened? What I really want is to be able to return a value from a kernel, but that isn’t allowed. I could write the value to global memory and then do a cupaMemcpy() after every iteration, but that adds a lot of overhead just to return a single boolean flag. Is there some way to do it that’s more lightweight? In OpenGL programming, people generally use an occlusion query for this purpose.