How can one thread to stop immediately the kernels work

Hello,

I have a case when all the threads are looking for some data. As soon as one thread found it, there is no meaning to continue running the other threads. Is there a possibility to stop the kernel run by the tread that found the data at the moment of finding the data?

If not, is there a workaround to shorten the kernel work in this case?

Thanks, Lila

When you launch a kernel, you cannot physically “kill” other running kernels, if this is what you are meaning. A workaround to avoid that the “unuseful” kernels continue computing would be to define a flag and, based on this flag, to deny the corresponding thread to make computations. Of course, in this case you could incur in branch divergence.

The global flag approach is also the only option I’m aware of.

While not really recommended, you can use the TRAP operation in PTX to abort a kernel.
I haven’t used this myself, and I’m not sure if it is inefficient or has side effects like aborting the whole stream queue.

asm("TRAP;");

I think this is the first GIF meme in the history of the CUDA forums. :)

1 Like

One way to try it :

volatile int *flag;   //value pointed to by flag should be 0 at the start should be 0 at start of execution
for (conds) {
    if (flag[0]==0) {
        lookfordata()
        if (founddata) atomicAdd(flag,1);
    }
    else return;
}

sBc, probably you’d need a volatile keyword in the flag definition.

thanks :)

:D

shares with colleagues

Perhaps

asm("TRAP;");

is not really what the user was meaning. My understanding is that he/she does not want to abort an entire kernel, but “kill” only the threads that have found the datum. On the other side, the code suggested by sBc-Random translate very well to practice the flag idea.

My interpretation of the text is to abort the kernel. Perhaps the original poster can clarify? :)

Yes, seibert. “stopping the kernel run by a thread” is somewhat ambiguous. But perhaps the poster does not need to provide clarifications, as there have been found solutions to the two possible interpretation of his/her post :-)

I wonder if there is a “higher level” alternative to the PTX instruction to abort an entire kernel.

I just found this thread

[url]cuda - how can a __global__ function RETURN a value or BREAK out like C/C++ does - Stack Overflow

Perhaps it could be of interest.