Every time my Centos won't boot, it's due to Nvidia Linux drivers causing the problem

The last time my Centos 6.4 Lenovo W510 wouldn’t boot, it was due to the Nvidia drivers:

It took weeks to figure out the ungodly complexities of the Centos Nvidia configuration.

Then, the next time my Centos wouldn’t boot, it was again due to the Nvidia drivers:

Now, for a third time, Centos won’t boot, most likely due to the complexity of Nvidia drivers:

My reason for being here is that, for almost a week, I’ve been unable to boot my Linux laptop due, it seems, to the ungodly complexity of the Nvidia driver setup.

Can anyone here help me with a diagnostic sequence that will tell me what the problem is, all of a sudden, with the Nvidia drivers not working and not allowing a boot to the graphical user interface?

Well, I’ve been running NVIDIA drivers without any serious issues since 2000. Sometimes they had bugs, like KDE 4.x was damn slow for almost a year due to missing acceleration of some new painting operations that KDE heavily used.

Recently NVIDIA has become somewhat slow in supporting new Linux kernels - it’s never happened before though, and as a user of CentOS this issue cannot bother you since CentOS contains an awfully old kernel release.

There’s no special diagnostic routine to help you, Linux unfortunately is all about manual tinkering and if you cannot solve your problem you may want to find better educated (in Linux) people around you.

If your system doesn’t hang on boot physically (e.g. there’s no kernel panic) the best way to resolve your issues is to let it boot, SSH into your box, and then run nvidia-bug-report from there. This way we’ll get the most realistic picture of the status of your computer.

As shown in my initial extremely detailed references, the setup for Nvidia drivers is ungodly complex, e.g., Smurf effect workarounds, hardware acceleration workarounds, display resolution workarounds, mode-validation process workarounds, etc.

Well, I’m on Centos.org, but, I they’re not as Linux qualified as someone who writes Nvidia drivers might be. We will eventually solve the problem though - but so far it has been a week down and counting, all because of a bug somewhere (but where) in the Nvidia driver setup.

I’ve already run that nvidia-bug-report.sh script.
Should I post the results here, or email it to where it says to email it?

Here are all the lines with EE in them in the Centos Xorg.0.log file:

Here is the startx log file (notice the FATAL line) from the command:

startx >& /tmp/startx.log

Notice the line:
FATAL: Error inserting nvidia (/lib/modules/2.6.32-358.18.1.el6.x86_64/weak-updates/nvidia/nvidia.ko): No such device

Yet, the file appears to exist:
Working on the error message:
FATAL: Error inserting nvidia (/lib/modules/2.6.32-358.18.1.el6.x86_64/weak-updates/nvidia/nvidia.ko): No such device

Since I was booted in Knoppix, I had to mount the Centos root partition to see if that file exists.

bootcommand: knoppix32 xmodule=nv acpi=off
root@Microknoppix:~# modprobe dm-mod
root@Microknoppix:~# vgchange -ay
==> 3 logical volume(s) in volume group “vg_burns” now active
root@Microknoppix:~# lvscan
==> ACTIVE ‘/dev/vg_burns/lv_root’ [19.53 GiB] inherit
==> ACTIVE ‘/dev/vg_burns/lv_swap’ [17.64 GiB] inherit
==> ACTIVE ‘/dev/vg_burns/lv_home’ [81.58 GiB] inherit
root@Microknoppix:~# mkdir /mnt/root
root@Microknoppix:~# mount /dev/vg_burns/lv_root /mnt/root

And, now that Centos root is mounted on Knoppix, I can look for that file:
root@Microknoppix:~# file /mnt/root/lib/modules/2.6.32-358.18.1.el6.x86_64/weak-updates/nvidia/nvidia.ko
==> /mnt/root/lib/modules/2.6.32-358.18.1.el6.x86_64/weak-updates/nvidia/nvidia.ko: broken symbolic link to `/lib/modules/2.6.32-358.el6.x86_64/extra/nvidia/nvidia.ko’

Hmm… so the file exists; but it’s a link to a file that doesn’t exist (but that link may only be broken on Knoppix becasue of the fact I’m mounting the Centos file system on Knoppix in order to get a web browser to post here on Centos.org).

Looking for that file, it seems to exist:
root@Microknoppix:~# updatedb

$ locate nvidia.ko

$ file /mnt/root/lib/modules/2.6.32-358.el6.x86_64/extra/nvidia/nvidia.ko
=> /mnt/root/lib/modules/2.6.32-358.el6.x86_64/extra/nvidia/nvidia.ko: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), BuildID[sha1]=0x122333060b52d17eb43c90b400de7c7cb6efacf5, not stripped

Somewhere in all this data must be what is going on with the Nvidia drivers.

Here is more diagnostic information, if it helps:

# nvidia-smi
Thu Oct 10 21:48:38 2013
| NVIDIA-SMI 5.325.15 Driver Version: 325.15 |
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| 0 Quadro FX 880M Off | 0000:01:00.0 N/A | N/A |
| N/A 44C N/A N/A / N/A | 2MB / 1023MB | N/A Default |

| Compute processes: GPU Memory |
| GPU PID Process name Usage |
| 0 Not Supported |

yum --enablerepo elrepo install nvidia-detect

which nvidia-detect

=> /usr/bin/nvidia-detect


=> Probing for supported NVIDIA devices…
=> [10de:0a3c] NVIDIA Corporation GT216 [Quadro FX 880M]
=> This device requires the current 325.15 NVIDIA driver (kmod-nvidia).

In an attempt to re-install the Nvidia drivers, I ran at init 3:

yum --enablerepo elrepo install kmod-nvidia

which also installed the dependency “nvidia-x11-drv.x86_64 0:325.15-1”

But, the laptop refuses to boot with these Nvidia drivers.
Note: It boots fine to the command line; and it boots just fine to Knoppix using:
knoppix64 xmodule=nv acpi=off

I just booted to init 3 in Centos, and the symbolic link exists as does the file it points to:

file /lib/modules/2.6.32-358.18.1.el6.x86_64/weak-updates/nvidia/nvidia.ko

symbolic link to `/lib/modules/2.6.32-358.el6.x86_64/extra/nvidia/nvidia.ko’

ls -l /lib/modules/2.6.32-358.el6.x86_64/extra/nvidia/nvidia.ko

-rw-r–r--. 1 root root 16518370 Aug 6 08:17 /lib/modules/2.6.32-358.el6.x86_64/extra/nvidia/nvidia.ko

file !$

/lib/modules/2.6.32-358.el6.x86_64/extra/nvidia/nvidia.ko: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped

The problem with youtube videos is actually a bug in Adobe Flashplayer so your criticism is off the mark.

Secondly, this forum has a locked thread which asks you to submit an nvidia-bug-report file if you want to get any help from NVIDIA developers and forum users here.

Thirdly, it really helps when you bug reports are concise and not spread over several messages. There’s a button which allows you to edit your posts.

Thanks for that advice. I’ll simply remove the comment as it’s not really germane to the current issue anyway.

I did submit the file to the email address suggested in the output of the file.

OK. I have to run to a meeting at work, but after work tonight (or maybe tomorrow morning), I’ll edit out all but the last message, which is effectively a question of how to put the correct nvidia driver on a Centos 6.4 laptop from the command line with networking.

I tried what I thought was the correct approach - but it failed to boot to the graphical environment.

No such device doesn’t mean the file was missing. It means the module didn’t detect a nvidia graphics card. Can you post the output of lspci -nn and anything in dmesg that’s nvidia related?