Really confusing topic: how to STOP the kernel ?

I’ve asked similar question yesterday but there’s still no answer … really, it is hard to believe that nobody faced the problem of stopping the kernel that works for too long.

As I have two graphics cards (one for primary usage with connected monitor, one for CUDA experiments) there is no watchdog active for the second card so my kernels may work as long as the power in the wall outlet exists.

Kernel run is async so it is easy to check the state of the run (using cudaEventQuery). That’s just perfect - but how to stop the kernel if it runs for, say, one minute and I don’t want to keep waiting ??

I even can’t shut the process down from TaskManager - the system simply hangs up until the kernel is finished.

It is not about an efficiency or elegance of the solution - is it fundamentally possible to terminate the kernel ?? I believe it is as the watchdog does it in some way …

May be it is possible to check the boolean var (that resides on device but controllable from the host) inside the kernel each iteration and stop calculations as soon as it becomes true ?

Really - ANY suggestion is valuable …

Thanks in advance,
Roman.

The common use case for most people really is many short (less than a few second) kernel calls. That said, your question is perfectly sensible, and hopefully someone can help you.

Well, that should be feasible I think. Write to a memory location a boolean value, and in your kernel code read the memory location regularly to break out of a for loop. But it would indeed be nice to know how the watchdog does it, although I think the watchdog needs root access to reset a card.

The watchdog will reset the GPU, not just stop the kernel.

Right now, it is not possible to stop a kernel from the CPU.

Is it a hardware or software limitation ?

Is there any chance that this will be possible in the future ?