Now that you mention it, I remember that when I tried launching huge kernels (for testing) twice, first I received a nice failure message in the command line (I believe something about a timeout) and on the second try I got a bluescreen. It seems to me that the watchdog can be sort of hit and miss. I’ve also seen reports that a kernel failure can corrupt GPU memory to the point that it requires rebooting your machine even though theoretically the driver should clean up in such occasions.
Handling screwups on the device is not implemented very gracefully I believe.
I can’t think of any elegant methods of stopping infinite loops. Perhaps you could estimate the number of iterations and add a condition to stop looping after there’s been 10x as many? Like
while (myConditions==true && iterations<10000)
It’s not pretty and it’s not always possible to make this estimate. I can’t come up with anything better, our interactions with a running kernel are very limited (since I/O is handled by CPU).
Programmer is completely unable to stop the running kernel (say, the “stop the kernel if it runs for too long” operation is not possible). Integrate some triggers into your code (count iterations or something) to prevent the kernel running forever.