How can I halt an entire kernel?

thejfasi · June 29, 2009, 4:58pm

I have a nice application that requires that I be able to halt a kernel. It is essentially a search operation, and looks for an element of 1 in a matrix, the better to perform row and column reduction. The details are a little unimportant, but what matters is that I’m wondering if there is a way to have one thread stop the entire kernel execution.

tmurray · June 29, 2009, 4:58pm

Not really, no.

Nico · June 29, 2009, 5:28pm

Maybe you could poll a shared boolean after a fixed number of operations/iterations in order to determine whether the threads should continue or exit using a return statement.
See section B.5 of the programming guide.

N.

tmurray · June 29, 2009, 5:33pm

If it makes you feel better, the reason I could answer this so quickly is because I have spent a long time thinking about this problem… :)

Nico · June 29, 2009, 5:41pm

Maybe you could add it to the CUDA wish list, Tim :D

N.

SPWorley · June 29, 2009, 6:32pm

Yeah, you can do it by using a periodic poll of some device memory flag… done by every block. It’s messy but does work. But the hassle is really annoying… I’ve found it easier to just make small fast kernels and do the check between kernels.

_Tom · June 29, 2009, 8:14pm

I don’t think it can work, what if the “terminator” thread takes longer than some threads, and consequently writes the boolean later than the check?

You have some kernels that continue and some that don’t.

And i think that all the kernels accessing the same shared bool should cause some heavy bank conflicts…

SPWorley · June 29, 2009, 8:35pm

It’s much worse than that. You need interblock communication since you want to shut down an entire kernel early, so you need to do device memory polling, not (just) shared memory polling.

So you end up putting a (high latency!) test into your block’s work loop where one thread checks the global memory for the “stop it!” flag. If it’s found, that thread writes to SHARED memory so all the threads in the block can shut down. I even experimented with using one warp only for polling and the remaining warps for work, but that’s a mess too since you need to remerge for __syncthreads() barriers.

Efficiency wise, it’s not really TOO bad, but it’s really just easier to call a thousand short 3ms kernels rather than 3s kernels with completion polling.

Sarnath · June 30, 2009, 8:41am

One probable way is to write to a host-memory location (using zero-copy method) – which the host-application/driver would poll for and then request the GPU to stop executing the kernel.

Thats a decent way to do it. But right now, the driver does NOT have an API to halt a kernel (I am not aware of)

You could do “cudaThreadExit()” from host side… but that will destroy the context. So you wont get any result from your kernel.

CUDAkk · June 30, 2009, 11:13am

I used “cudaThreadExit()” many times but what exectly it do that is not clear to me.

Could you explain bit more?

Sarnath · June 30, 2009, 11:33am

cudaThreadExit destroys the context. I think all CUDA pointers in that context would become invalid after a call to cudaThreadExit().

Sarnath · July 2, 2009, 5:05am

Extending this weird? idea, one could make this zero-copy phys address as the interrupt-trigger address of the GPU.

But Not sure if PCI-E allows the Master and Target to be the same device…

Mu-Chi_Sung · July 2, 2009, 5:51am

I just came up with one idea, how about triggering GPU exception to terminate the kernel? (such as some invalid memory read or write to global/shared memory)
However I am not sure whether the partial result, which has been written to global memory or host memory, would still be valid or not after throwing out the exception.

Sarnath · July 2, 2009, 7:10am

Segfaulted kernels dont return partial results. I have encountered this behaviour before.

Thats why we need to gently request the GPU to stop and that has to come from the driver…

What is needed is a back-channel for the kernel to communicate to the driver.

If the GPU had a service processor, it could probably read a special memory location and signal an interrupt to the driver. But not so sure about the architecture of GPUs.

Keldor314 · July 2, 2009, 11:39pm

It’s also possible that the only way to halt a kernal in process is to send a reset signal to the hardware.

Anyway, right now, the best way to stop a kernal would likely be to just have a flag in memory. The signaling thread sets the flag, and all threads periodically poll it, and return if they see it set. By setting the polling rate moderately low, say, once per 10 iterations, the polling won’t effect performance very much, at the cost of the response time being a bit longer.

SPWorley · July 3, 2009, 12:34am

What an ugly hack!

You should be proud of yourself, that’s really clever. I bet it’d work.

If you call threadfence() first it will likely allow all pending writes to complete safely before your kill. Of course this is clearly not guaranteed in any way.

tmurray · July 3, 2009, 1:15am

Except that it invalidates your context.

Sarnath · July 3, 2009, 5:38am

Meaning - the partial results wont be available. Segfaulting is not an option.

Topic		Replies	Views
Interrupt How to interrupt kernel execution CUDA Programming and Performance	3	2468	February 6, 2010
Kernel CUDA Programming and Performance	8	2629	October 14, 2009
How to abort the kernel from inside ? I'd kindly ask Mark Harris to check it CUDA Programming and Performance	5	6171	June 9, 2008
How can one thread to stop immediately the kernels work CUDA Programming and Performance	13	4100	August 17, 2013
CPU load when kernel is running why 100%? CUDA Programming and Performance	14	8228	December 22, 2008
How to stop the kernel effectively ? Need to stop calculations from within ... CUDA Programming and Performance	10	3259	September 15, 2009
Manually kill kernel before TDR CUDA Programming and Performance	8	3052	July 2, 2012
Aborting kernel execution from kernel code CUDA Programming and Performance	6	3706	April 27, 2009
Killing all kernel threads CUDA Programming and Performance	16	4922	October 2, 2015
triggering kernel wide shutdown/crash on purpose CUDA Programming and Performance	4	1620	May 11, 2010

How can I halt an entire kernel?

Related topics