Aborting kernel execution from kernel code

Hi everyone,

Is there a way to stop the execution of an entire grid from within a kernel? Something like:

__global__ void mykernel()

{

	if(... some condition ...)

	{

		__abort();

	}

}

where __abort() would immediately stop the execution of the entire currently executing grid.

There are two scenarios where I’d find this extremely helpful: 1) to immediately stop when an error condition is detected, and 2) in cases where the grid does searches over large spaces and should exit as soon as a match is found. There are ways to emulate this feature (e.g., via periodic testing of an “abort flag” in global memory), but they’re all unsatisfactory (cumbersome, slow).

Regards,

Mario

CUDA support all C syntax So what about the “goto -> label” control flow instructions? This structure may suitable for this situation.

I haven’t tried it before.

__global__ void mykernel()

{

	if(... some condition ...)

	{

		goto End;

	}

End://

}

I usually use the below structure.

__global__ void mykernel()

{

	if(!= some condition) //pay attention at "some condition"

	{

		//do some things;

	}

}

That short-circuits the execution of that particular thread, but does not abort the entire grid to which the thread belongs (which is what I’m after).

Yes, you are correct.

You mean that if the condition is true. all threads of grid immediately stop?

If yes, It is really a problem. I have a small idea, but i don’t know that useful for you.

You can immediately stop all threads within block by setting a flag loacted on shared memory. You do it by yourself.

This flag will be setted by some threads, and all thread within block can read the status of this flag.

So you can stop all thread within block if condition is true. consider that bank_conflic will occurs.

Naturally you can use with global memory, all thread within grid can read a flag allocated in global memory,

but nothing warranty that flag is correct at the reading time. we have no way to synchronize all threads within grid.

Atomic function is a good idea for this problem.

:)

Yes, a variant of that approach is what I was referring to at the end of my original post. But it’s a clumsy, error prone (and slow!) workaround for a problem that is likely common enough to have a simpler solution. (Does it?)

Not at the moment, no. (go add it to the feature request thread… that is a thing I keep track of!)

Much needed feature…