Computer was idle for a few hours, when I returned I noticed an unusually loud fan noise, turned on the monitor but it received no signal. Assuming that the computer had crashed, I restarted by holding down the power button, but on startup the monitor still displayed no picture, even in BIOS, and the fan noise remained. After connecting the monitor to onboard (motherboard) VGA port, everything except the GPU worked.
In kern.log and syslog were the following messages, from about an hour before I returned to the computer:
kern.log:Aug 17 12:39:04 jg-desktop kernel: [433566.176567] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
kern.log:Aug 17 12:39:04 jg-desktop kernel: [433566.176578] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
I really need a solution here. I didn’t buy a GPU so I could use the integrated graphics card. Is this a GPU hardware issue? Faulty PSU? BIOS issue? I doubt this is a driver problem, since the GPU doesn’t work even during boot before OS is loaded. Any extra information I could provide? The GPU worked fine for several months prior to this.
Given it was working fine, and now it doesn’t work, (and I’m assuming the software configuration did not change) then it sounds like the problem is a faulty GPU, especially if you mentioned that it doesn’t work before OS is loaded. It could also be that the motherboard PCI-E slot became defective, but you’ll need a known good card to try it out… I’d go with bad GPU.
Before replacing the card, try opening up the case, cleaning it out thoroughly, and carefully reseating the GPU in the PCI slot. Nearby cables (SATA and ATX power, etc.), if routed poorly or ad hoc to accommodate a small, crowded case, can push the GPU and disrupt the connection to the PCI slot. You could very well be dealing with a card which has overheated or otherwise reached a point of no return, but it is worth opening the case and making sure everything is seated nicely and clean.
It’s not a HW bug, it’s a bug of the nvidia driver. Since one of the last updates of the nvidia driver, my OpenSuse 12.3 system also shows this error and I’ve already seen several similar bug reports in other forums.
My OS is still alive after the driver crash. The crash message is also “NVRM: GPU at 0000:01:00.0 has fallen off the bus”. Before that, this message occurs:
[ 22.003751] NVRM: Your system is not currently configured to drive a VGA console
[ 22.003762] NVRM: on the primary VGA device. The NVIDIA Linux graphics driver
[ 22.003769] NVRM: requires the use of a text-mode VGA console. Use of other console
[ 22.003775] NVRM: drivers including, but not limited to, vesafb, may result in
[ 22.003780] NVRM: corruption and stability problems, and is not supported.
Could this have something to do with the driver crash? How can I configure the system to use a VGA console?
GTX 650 Ti 2GB (GV-N65TOC-2GI)
The following drivers were installed at the time; I think the experimental one was used, but not completely sure:
nvidia-experimental-310 310.14-0ubuntu0.3
nvidia-current-updates 304.88-0ubuntu0.0.1
Both have since been updated to:
nvidia-experimental-310 319.32-0ubuntu0.0.1
nvidia-current-updates 304.88-0ubuntu0.0.3
I have had many problems with newer drivers. In my experience, the 670 cards run stably with the 304.88 driver but not later drivers. Can you install that one while using VGA output from the integrated card and testing it? What PCIe gen are you using? The mods have made some murky statements about PCIe gen 3 not working properly but I have seen no documentation of this. Try setting gen 2 in BIOS after you install the 304.88 driver.
Finally, does BIOS recognize the card? It should show up in your PCI devices.
I’m also not using Kernel 3.10 but 3.7.10, my GPU is a GeForce 8400 GS (G98) and the nvidia driver has version 319.32. I don’t want to post it to a public forum, but if anybody is interested, I can send an nvidia-bug-report.log.gz file created after the driver crash.
In the meantime, I’ve found a simple but ugly workaround. I’ve noticed, that the driver crash didn’t occur a second time after a reboot of the system. Now after turning on the PC, I reboot the system without cycling the power immediately after the OS has booted and while the X-server is still running. Today, the system already runs for more than 3 hours without problems. It seems, that something must be initialized twice to run stable.
Correction: one reboot isn’t always sufficient. Today the system did run stable only after the second reboot. So at least in my case it seems to be a random problem, maybe some uninitialized variable or register.