Terminate CUDA kernel which got stuck in an endless loop? Is that possible under linux?

euk · December 16, 2008, 2:28pm

Sometimes kernel goes into an endless loop (by a mistake in a kernel of course). And I’m unable to somehow terminate it.
kill -9 of the hosting process fails, process remains (still the process is not zombie, it is marked as just running).
Have to reboot. Definitely unusable approach.

Driver: 180.60
Linux: Debian/lenny
Monitor is attached to another video card, so there is no CUDA timeout.

Is there any solution to terminate such process/kernel?

MisterAnderson42 · December 16, 2008, 3:21pm

Odd, I usually have luck just pressint Ctrl-C. Sometimes it takes ~10 seconds to take effect, but it usually works. The only times a reboot has been necessary for me is with horribly buggy kernels that wrote all over device memory, probably messing up the driver.

euk · December 16, 2008, 4:25pm

I have just tested with while(true) {}. No memory use…
Just found that Ctrl-C really helps, but in about 30 minutes (!).

Maybe there can help an unload of some module (nvidia driver?) Any ideas?

tmurray · December 16, 2008, 4:41pm

A fix for that is coming, but not until after 2.1 is out.

euk · December 16, 2008, 4:55pm

Thank you. But are there any tweaks for the present moment? Such as driver unload or something like that?

alex_dubinsky · December 17, 2008, 2:05am

A fix in what form? Quicker return after Ctrl+C, or some larger-scale solution? Will it work on Windows?

Sarnath · December 17, 2008, 4:58am

Just a guess:

Extend (or) change your desktop into this graphics card temporarily to kill the kernel

Beware: If that did not work, you wont have a display to work with :-) Extending would be a better idea… But not sure if linux supports it.

euk · December 18, 2008, 5:03pm

Cool idea, however I’ve never heard this is possible under X in linux… I mean extending the desktop

Sarnath · December 19, 2008, 5:48am

Thanks. If possible, write a script to switch the display and then get it back to the original display.

Not sure how to write it OR if it would even work. Good Luck!

wumpus · December 20, 2008, 2:30pm

Trying to unload the driver in such a case or doing other things can hang the entire pc, or at least the driver unloading, until the kernel terminates… be warned. At least, that’s my experience.

Topic		Replies	Views
Trouble killing CUDA processes? CUDA Programming and Performance	1	6118	July 8, 2008
Forcing watchdog timer on Linux? CUDA Programming and Performance	3	3947	June 26, 2008
How to abort infinite loop CUDA kernel on Vista CUDA Programming and Performance	2	4238	February 11, 2010
Exit Kernel Execution CUDA Programming and Performance	1	1327	November 30, 2008
How to cleanly kill a CUDA application CUDA Programming and Performance	5	5221	September 30, 2016
CUDA becomes unusable until reboot After kernel with infinite loop CUDA Programming and Performance	3	7310	March 3, 2008
How to terminate a GPU program CUDA Programming and Performance	2	24571	March 31, 2011
Kernel Interruption in Command Line Application CUDA Programming and Performance	1	7417	July 15, 2011
any way to kill a gpu process ? CUDA Programming and Performance	1	6664	July 1, 2009
How to "gracefully" abort kernel execution? CUDA Programming and Performance	1	1791	April 14, 2009

Terminate CUDA kernel which got stuck in an endless loop? Is that possible under linux?

Related topics