Nvidia driver causes kernel to panic randomly no driver, no CUDA

I was getting annoying system freezes on Fedora 12, kernel 2.6.31. The nvidia driver simply refused the start with 2.6.32. By system freeze, I mean the CapsLock buton not working, I mean having to do a hard reset to get things working again, and this happened several times a day.
At first, I believed it might be the kernel modules from VirtualBox. They weren’t the cause. I booted into 2.6.32 and unblacklisted nouveau, and the system has been rock solid for almost a week.

Of course, we have two variables change here: the kernel and the presence of the nvidia kernel module. Either way, this still makes me wonder, WTF!!!

Has anyone had such problems? Is there a way to get the nvidia driver (and thus CUDA) working without having digital sex with the reset button?

nouveau is an open source reverse engineered nvidia driver. nvidia has their own that comes with 195… called “nvidia”

I’m very well aware of what “nouveau” and “nvidia” are. The nvidia driver causes the kernel panics I mentioned; nouveau is rock-solid, but then I lose 3D, OpenCL and CUDA functionality. Or I use the nvidia driver and take painkillers from pressing the reset button. Not fun.

Did you mean that you blacklisted nvidia instead?

I compile my nvidia kernel manually during installation and I have a full kernel build source. There’s a new nvidia driver version out 195.36.24.

Also for me, I cannot have multiple kernels. Every kernel change, I have to reinstall the nvidia drivers.

I mean remove “rdblacklist=nouveau” from the kernel command line, and delete xorg.conf. I don’t have to blacklist nvidia, as it simply won’t load if nouveau is present.

#lsmod |grep nvidia yields nothing in this configuration.

I have akmod-nvidia-195.36.15 from rpmfusion installed. It automatically builds the kernel module for each new kernel. I’ll try the new driver once it hits rpmfusion. Last time I tried to use the nvidia installer didn’t end too well for me.

I understand the situation now, thanks for clarifying.

Ok, and you probably already know…check the dmesg and syslog for any errors. NVRM stuff. Maybe it’s a hardware issue. If windows installs and works, you could stress test the GPU.

Also I assume you haven’t overclocked and the memory timings are correct. You can try memtest86 as well. Running cuda and graphics may stress the other components like memory.