Workaround for kernel error Avoid reboot on FC8

Hi, I’m writing my b.sc. thesis for haptic textile simulation using cuda devices.
I’m currently running a conjugate gradient method against cell processor and multicore cpu on Fedora Core 8 (64bit).
My devices are Geforce 8800 GTX and 8600 GT (both with 1024 MB memory) cards.

My question:
When I’m playing with kernel parameters/loops the system is sometimes crashing (short flicker)
and all windows are scrambled until I reboot the system manually. Is there a way to avoid the reboot (maybe by using remote shell)?

The problem of freezing / crashing the system is one big disadvantage comparing to cell solutions.
Maybe somebody has same problems and can help me with this topic.

Best regards.

Is this happening with or without X running?

Which driver are you using?

Thank you for your fast reply.

My driver version is 180.06 and X is running.

cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 180.06 Sat Nov 8 17:50:38 PST 2008
GCC version: gcc version 4.1.2 20070925 (Red Hat 4.1.2-33)

rpm -qa | grep nvidia
kmod-nvidia-2.6.26.6-49.fc8-173.14.12-4.lvn8.1
kmod-nvidia-2.6.26.5-28.fc8-173.14.12-4.lvn8
kmod-nvidia-173.14.12-4.lvn8.1
xorg-x11-drv-nvidia-libs-173.14.12-1.lvn8
xorg-x11-drv-nvidia-173.14.12-1.lvn8

You stated 180.06 at the top, yet at the bottom you list 173.14.12. Both are old, and no longer supported.

Regardless, its quite possible that you’re hitting the 5second watchdog timeout if you’re running CUDA apps while X is running. Do these problems persist with the latest released driver (from NVIDIA.COM) and without X running?