CUDA becomes unusable until reboot After kernel with infinite loop

Something like 1 out of 3 times, when I execute a kernel with an (accidental) infinite loop, the card goes into a locked up state and isn’t usable again until I reboot the machine.

Every CUDA program I start after that locks up, probably waiting for the card to become ready, but thie never happens.

This is with a 8800GTX on Linux with latest drivers and CUDA version (64 bit).

It would be nice if a future release contained an utility to kill any running tasks on the GPU.

Bad stuff. As a hack-fix, did you try killing X11 and / or loading/unloading the NVidia drivers instead of doing the whole rebooting dance?

I’ll try that next time. But I am afraid doing one of those might lock up the entire machine instead of just the card, which would be even more trouble as I have only ssh access to it :)

The CUDA bug that plagues my code still causes an “effective” infinite loop because it triggers the 5s timeout. After that occurs the first time, CUDA programs still run fine afterwards. But, any program that makes an OpenGL call (even glxgears) will lock up the machine.