Hey, I apologize if this has been answered. I am not entirely sure how to formulate the question for searching (or more specifically, all my previous searches point in very wrong directions).
I have a series of kernels all doing substantial work (millions of threads each potentially). At a high level the code does some initialization (A,B), iterates a high-level loop (C 10x max) makes a decision (D) and if the decision is successful performs another high-level loop (E 10x max) and collects the results (F). So the serialization of this is as follows:
A, B, C, C, C, C, C, C, C, C, C, C, C, D, E, E, E, E, E, E, E, E, E, E, F
The high-level algorithm may need to do up to 10 iterations of C and E, but frequently is required to do much less (like 3-4 iterations).
With hardware that supports dynamic parallelism, each iteration of C and E could decide if a subsequent iteration is required and issue the next kernel invocation correct? Also if D fails, and decides E and F are unnecessary, it can simply not issue them to the stream correct?
The problem is that this is required to run on pre-dynamic parallelism hardware. So the real question is: Can I add a trivial check in C and E to decide if any work is necessary and simply early out (i.e. tell the GPU NOT to schedule any further threadblocks) for the current kernel invocation? (Or better yet for D, tell the currently active stream to kill and remove any future work items).
I found some posts describing asm(“trap;”), but it seems this is more for exceptional behaviour. I am simply looking for a way to insert conditional behaviour to the high-level sequence of kernels.