Skipping cuda kernels based on condition


Here’s the problem I am trying to solve. Let’s say I have this pseudo-code:

void MyKernel( *some_data*, bool* _continueFlag )
     if ( *_continueFlag == false )

     // ... do some work

     if ( condition_not_met )
         *_continueFlag = false;


bool* continueFlag;
cudaMalloc( &continueFlag );
// Set continue flag to true

for ( int i = 0 ; i < 100 ; ++i )
    MyKernel<<< 256, 256, 0, stream >>>( *some_data*, continueFlag );
cudaStreamSynchronize( stream );

As you can see, I am doing some work in MyKernel iteratively. Once some condition is met, I can skip the computation of all subsequent iterations.

The way I am dealing with it right now is by the use of a continue flag, tested at the start of the kernel. This way I can “skip” kernels based on a condition, without having the overhead of having to call cudaStreamSynchronize at every iteration (which is costly).

However, if there are a lot of iterations to skip, it is still pretty costly for the system to launch all these kernels only to skip them. Therefore, my question is, is it possible to skip kernels using the cuda API based on some condition?