Workaround for kernel error Avoid reboot on FC8

ogoellner · March 22, 2009, 8:18pm

Hi, I’m writing my b.sc. thesis for haptic textile simulation using cuda devices.
I’m currently running a conjugate gradient method against cell processor and multicore cpu on Fedora Core 8 (64bit).
My devices are Geforce 8800 GTX and 8600 GT (both with 1024 MB memory) cards.

My question:
When I’m playing with kernel parameters/loops the system is sometimes crashing (short flicker)
and all windows are scrambled until I reboot the system manually. Is there a way to avoid the reboot (maybe by using remote shell)?

The problem of freezing / crashing the system is one big disadvantage comparing to cell solutions.
Maybe somebody has same problems and can help me with this topic.

Best regards.

netllama · March 22, 2009, 8:21pm

Is this happening with or without X running?

Which driver are you using?

ogoellner · March 22, 2009, 9:25pm

Thank you for your fast reply.

My driver version is 180.06 and X is running.

cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 180.06 Sat Nov 8 17:50:38 PST 2008
GCC version: gcc version 4.1.2 20070925 (Red Hat 4.1.2-33)

rpm -qa | grep nvidia
kmod-nvidia-2.6.26.6-49.fc8-173.14.12-4.lvn8.1
kmod-nvidia-2.6.26.5-28.fc8-173.14.12-4.lvn8
kmod-nvidia-173.14.12-4.lvn8.1
xorg-x11-drv-nvidia-libs-173.14.12-1.lvn8
xorg-x11-drv-nvidia-173.14.12-1.lvn8

netllama · March 22, 2009, 9:32pm

You stated 180.06 at the top, yet at the bottom you list 173.14.12. Both are old, and no longer supported.

Regardless, its quite possible that you’re hitting the 5second watchdog timeout if you’re running CUDA apps while X is running. Do these problems persist with the latest released driver (from NVIDIA.COM) and without X running?

Topic		Replies	Views
NVidia kernel module trouble I am having trouble with the kernel modu CUDA Programming and Performance	19	14963	December 10, 2007
cuda problem with latest fedora updates (march 08) strange behavior and system hung CUDA Programming and Performance	3	2934	April 16, 2008
screen freezes everytime when running cuda CUDA Programming and Performance	3	2691	July 11, 2008
X server random crash / frozen - 2080 (Ubuntu 16.04.5 - Driver 410.48) Linux	1	1186	December 1, 2018
Profiler reboots Windows 7 with cuda 8 and Titan X Visual Profiler and nvprof	1	1202	February 24, 2017
410.66 crash and system freeze under heavy load (Xid 8, Xid 38) Linux	13	2111	November 15, 2018
Cuda 2.2 installation problem Ubuntu 8.10 CUDA Programming and Performance	2	3507	June 15, 2009
Kernel Panics on CentOS7 - Geforce GTX 1080 Ti with Nvidia Driver 384.59 Linux	7	3397	December 5, 2017
System hangs with drivers 319.23, 319.32, 325.08 and others - simple test case included Linux	17	9641	July 1, 2014
Rest crashed CUDA card CUDA Programming and Performance	5	6860	February 23, 2009

Workaround for kernel error Avoid reboot on FC8

Related topics