Reproducible: NVRM: GPU at 0000:01:00.0 has fallen off the bus. -- Both screens black, Xorg at 100%

My laptop has an external monitor attached: ViewSonic VX2439 as DFP-1

The screens both go black simultaneously. Often this seems to be triggered
by the use of the scroll button on the mouse. I am not able to reproduce the
issue at will. Unlike other reports, I do not see the mouse cursor after the
problem occurs. The system does not respond to Ctrl-Alt-Fnum, so I am unable
to debug from a console vTTY. I am able to SSH into the system afterward.

When I upgraded to Ubuntu 12.04, I was using the “current” drivers from Canonical,
ie. 295.40. However, those did not support my external monitor well. So I switched
to the 310.32 linux drivers from Nvidia. That is when I first encountered this
issue with both screens going blank. At that point, I decided to drop back to the
“experimental” nvidia drivers from Canonical, 310.14, where it stands today. Still
seeing the problem on 310.14.

The Xorg log file shows:

[ 60.812] (EE) NVIDIA(GPU-0): Failed detecting connected display devices
[ 68.880] [mi] EQ overflowing. Additional events will be discarded until existing events are processed.
[ 68.880]
Backtrace:
[ 68.968] 0: /usr/bin/X (xorg_backtrace+0x26) [0x7ff5441a59e6]
[ 68.968] 1: /usr/bin/X (mieqEnqueue+0x263) [0x7ff5441860c3]
[ 68.968] 2: /usr/bin/X (0x7ff54401d000+0x62a34) [0x7ff54407fa34]
[ 68.968] 3: /usr/lib/xorg/modules/input/evdev_drv.so (0x7ff53c576000+0x5d88) [0x7ff53c57bd88]
[ 68.968] 4: /usr/bin/X (0x7ff54401d000+0x8af37) [0x7ff5440a7f37]
[ 68.968] 5: /usr/bin/X (0x7ff54401d000+0xb0d3a) [0x7ff5440cdd3a]
[ 68.968] 6: /lib/x86_64-linux-gnu/libpthread.so.0 (0x7ff543343000+0xfcb0) [0x7ff543352cb0]
[ 68.968] 7: (vdso) (0x7fff4af8f000+0x7dc) [0x7fff4af8f7dc]
[ 68.968] 8: (vdso) (__vdso_gettimeofday+0x2b) [0x7fff4af8fa1b]
[ 68.968] 9: /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so (0x7ff53ce3b000+0xed1fe) [0x7ff53cf281fe]
[ 68.968] 10: /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so (0x7ff53ce3b000+0x7c1ae) [0x7ff53ceb71ae]
[ 68.968] 11: /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so (0x7ff53ce3b000+0xf3ce6) [0x7ff53cf2ece6]
[ 68.968] 12: /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so (0x7ff53ce3b000+0x4a0952) [0x7ff53d2db952]
[ 68.968] 13: /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so (0x7ff53ce3b000+0x4a0bcb) [0x7ff53d2dbbcb]
[ 68.968] 14: /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so (0x7ff53ce3b000+0x49d03a) [0x7ff53d2d803a]
[ 68.968] 15: /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so (0x7ff53ce3b000+0x49c584) [0x7ff53d2d7584]
[ 68.968] 16: /usr/bin/X (0x7ff54401d000+0xcd011) [0x7ff5440ea011]
[ 68.968] 17: /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so (0x7ff53ce3b000+0x476b07) [0x7ff53d2b1b07]
[ 68.968] 18: /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so (0x7ff53ce3b000+0xb6800) [0x7ff53cef1800]
[ 68.968] 19: /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so (0x7ff53ce3b000+0xb6d95) [0x7ff53cef1d95]
[ 68.968] 20: /usr/bin/X (xf86Wakeup+0x192) [0x7ff5440a86f2]
[ 68.968] 21: /usr/bin/X (WakeupHandler+0x6b) [0x7ff54406f7eb]
[ 68.968] 22: /usr/bin/X (WaitForSomething+0x1b6) [0x7ff5441a2de6]
[ 68.968] 23: /usr/bin/X (0x7ff54401d000+0x4e5f2) [0x7ff54406b5f2]
[ 68.968] 24: /usr/bin/X (0x7ff54401d000+0x3d7ba) [0x7ff54405a7ba]
[ 68.968] 25: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0xed) [0x7ff5421d476d]
[ 68.968] 26: /usr/bin/X (0x7ff54401d000+0x3daad) [0x7ff54405aaad]
[ 68.968] [mi] These backtraces from mieqEnqueue may point to a culprit higher up the stack.
[ 68.968] [mi] mieq is NOT the cause. It is a victim.

[This file was removed because it was flagged as potentially malicious] (69.5 KB)
nvidia-installer.log (2.46 KB)

More information…

My GPU is a Quadro FX 2800M with VBIOS 62.92.a2.00.09.

My problem seems most like this one.

https://devtalk.nvidia.com/default/topic/535519/linux/x-hangs-using-100-cpu-wait-and-mieq-overflowing-errors-in-logs/

nvidia-bug-report.log.gz and nvidia-installer.log attached to first post…

This “topic” was originally posted on 2013-May-28, making it right at 14 days old now.
No reply… This in spite of having done my level best to follow Arron’s instructions
for constructively reporting on issue.

Apparently there are three other topics and four other forum users with a very
similar behavior pattern and/or Xorg.0.log entries.

Ahktenzero
https://devtalk.nvidia.com/default/topic/535519/linux/x-hangs-using-100-cpu-wait-and-mieq-overflowing-errors-in-logs/

Khertan and Franster
https://devtalk.nvidia.com/default/topic/524502/linux/frequent-freeze-crash-of-xorg-with-drivers-310-19-with-gts-250-on-3-2-0-4-amd64/

KenPDX
https://devtalk.nvidia.com/default/topic/534892/linux/x-freeze-with-eq-overflows/

So, if five of us on varying hardware bothered to login and report the issue, how many others
are searching for answers (without the help of the forum search widget) and not finding any?

Regards,

Cryptor

Xorg hang/freeze yesterday with the same mieq overflow reported in Xorg.0.log.

I have dropped back to nvidia 304.88, which is now the recommended nvidia driver
for Ubuntu 12.04. Not very hopeful this will be a fix because ahktenzero reported
going back to 304.64 and still having the problem.

Post #5 here:
https://devtalk.nvidia.com/default/topic/535519/linux/x-hangs-using-100-cpu-wait-and-mieq-overflowing-errors-in-logs/

This in spite of the 304.51 release notes:

“Fixed a bug that caused the X server to sometimes hang in response to input events.”
http://www.nvidia.com/object/linux-display-amd64-304.51-driver

Sounds a lot like the mieq overflow issue, but apparently not the same.

Cryptor

Approx. 2 weeks later. So far no hang (no black screens) on 304.88…

I should also mention that I switched to a much simpler “xorg.conf” two weeks ago.
The previous xorg.conf referred to DFP-0 and DFP-1 with layout settings. I renamed that
one, booted into failsafe graphics and then ran the nvidia settings to generate a new,
default xorg.conf. This was after switching to the recommended 304.88 driver (via
System Settings – Additional Drivers).

Here is the new xorg.conf that has been stable with 304.88 for two weeks on my box.

Section “Screen”
Identifier “Default Screen”
DefaultDepth 24
EndSection

Section “Module”
Load “glx”
EndSection

Section “Device”
Identifier “Default Device”
Driver “nvidia”
Option “NoLogo” “True”
EndSection

From the looks of it, the Xorg server is now having to find the screens and determine
the layout on its own or through the nvidia driver. Does not seem to be related to the
problem at hand…

Cryptor

As expected, the Xorg hang with both screens black is still present with NVIDIA drivers 304.88.
Latest nvidia-bug-report.log.gz attached…

Cryptor

[This file was removed because it was flagged as potentially malicious] (69.5 KB)

@Cryptor - I see you’re on Ubuntu, I originally posted a bug for this on Launchpad back in November 2012 but no joy there either…

https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers/+bug/1077616

After switching to 304.88, which is currently [Recommended] for Ubuntu 12.04, I have
not seen many “EQ overflowing” type freezes. My impression is that either not running
VMWare Workstation or keeping it maximized on a separate virtual desktop improves the
odds. Of course, that is not a scientific observation in any way, shape or form.

However, a recent kernel update rendered the system fairly unstable. Generally, I
would get both screens black either just after login or within an hour or so. I was
still able to SSH into the system and run terminal commands, but there was nothing
on the displays and no access to the console pseudo-ttys. In this situation, Xorg.0.log
did not show the “EQ overflowing” error. It seems to show no error until after the
problem has occurred.

[ 76.074] () NVIDIA(0): device ViewSonic VX2439 Series (DFP-1) (Using EDID
[ 76.074] (
) NVIDIA(0): frequencies has been enabled on all display devices.)
[ 8654.964] (WW) NVIDIA(0): WAIT (2, 6, 0x8000, 0xdfff2fff, 0x0000f518)
[ 8661.964] (WW) NVIDIA(0): WAIT (1, 6, 0x8000, 0xdfff2fff, 0x0000f518)
[ 8661.964] (EE) NVIDIA(GPU-0): Failed detecting connected display devices
[ 8672.967] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0xdfff2fff, 0x0000f570)

I’ll try to attach the latest “nvidia-bug-report.log.gz”…

A little searching led me to this thread.

http://ubuntuforums.org/showthread.php?t=2165400

It turns out that the scripts to relink the NVIDIA drivers with the new kernel are not
quite bulletproof.

So, sometimes the following can help after a kernel upgrade.

$ sudo dpkg-reconfigure nvidia-current-updates

or

$ sudo dpkg-reconfigure nvidia-current

I hope that helps someone with a system that worked (better) before they ran software
updates…

Cryptor

nvidia-bug-report.log.gz (67.8 KB)

The saga continues…

After the “sudo dpkg-reconfigure nvidia-current-updates” brought a some semblance
of stability back yesterday (see previous), I now seem to be getting “[mi] EQ overflowing”
again.

Today I find out that the error:

[mi] EQ overflowing

can be caused by many, many issues in the Xorg server. That being the case,
the root cause for today’s crash could conceivably be different from the earlier
crashes.

So, it seems that my latest particular flavor is given in “dmesg” or /var/log/syslog as:

[99101.294734] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
[99101.294742] NVRM: GPU at 0000:01:00.0 has fallen off the bus.

There is discussion of a somewhat similar issue here:

https://devtalk.nvidia.com/default/topic/567297/linux/linux-3-10-driver-crash/1

However, I’m on Ubuntu vs Arch and my kernel is different:

Linux cryptor-m6500 3.2.0-51-generic #77-Ubuntu SMP Wed Jul 24 20:18:19 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Here is another possibly related thread.

http://www.nvnews.net/vbulletin/showthread.php?p=2571522

Although, they seem to be focused on games running on Wine. That is a different
workload from mine.

Frankly, given the various topics that match a search on this new error message,
it appears that it might not be completely unique to a root cause either.

As far as mitigation goes, I am going to enable persistence in the GPU and switch to
“Ubuntu 2D” rather than the 3D session. I believe that Ubuntu 2D will disable
“compiz”, which may provide a short-term workaround. My unscientific guess is that
it will cause less work in the GPU and possibly avoid the scenario that invokes the
root cause. [Waves hands in the air…]

Here is one of the links suggesting that persistence be enabled. I will caution
that several posts indicate that persistence did not help. YMMV

http://www.cyberciti.biz/faq/debian-ubuntu-rhel-fedora-linux-nvidia-nvrm-gpu-fallen-off-bus/

In summary:

  • still getting random Xorg crashes with both screens black
  • Xorg goes to 100% CPU
  • can still login via SSH afterward
  • still using ViewSonic VX2439 as a second, external monitor along with the LCD on the laptop
  • not running any games, mostly just VMWare Workstation, Chrome, Firefox and Evolution

I understand that there are later drivers published for the “Quadro FX 2800M”. I did try some
of those initially. However, I am still without any comment or reply or suggestion from
“sandipt” or “aplattner” or anyone else at NVidia.

Cryptor

Both screens black again.
This time I was scrolling an Xterm up and down with the scrollbar widget.
Seems like scrolling makes it more likely on my box.

At any rate, note this from dmesg and Xorg.0.log…

dmesg
[71950.826197] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
[71950.826208] NVRM: GPU at 0000:01:00.0 has fallen off the bus.

/var/log/Xorg.0.log
[ 72072.630] (WW) NVIDIA(0): WAIT (2, 6, 0x8000, 0xdfff2fff, 0x0000e41c)
[ 72076.202] [mi] EQ overflowing. Additional events will be discarded until existing events are processed.
[ 72076.202]
Backtrace:
[ 72076.257] 0: /usr/bin/X (xorg_backtrace+0x26) [0x7f4df0d939e6]
[ 72076.257] 1: /usr/bin/X (mieqEnqueue+0x263) [0x7f4df0d740c3]

Cryptor

I seem to have stumbled on a fairly quick way to generate or reproduce the

“NVRM: GPU at 0000:01:00.0 has fallen off the bus.”

error on my system.

I login to Ubuntu (either 3D/compiz or Ubuntu 2D) and then open a “Gnome Terminal”.
On my system, this terminal has 50 lines.

$ echo $LINES
50

This terminal can be located on either my X screen primary display (external DFP-1) on on my
non-primary display (internal LGD, DFP-0).

Now, I create some listings usually about 1000 lines or so.

ls -alt ~ ls -alt /
$ ls -alt /usr/lib

At this point, I have a scrollbar widget that pan back and forth through the listings.
If I scroll rapidly back and forth through these listings (by dragging the scroller widget
vigorously up and down) in the gnome terminal window for 3 or 4 minutes, I will always get
the black out and frozen X session. Sometimes it happens in as little as 30 seconds, but
usually it takes a couple minutes.

BTW, I have enabled persistence as recommended elsewhere and I have been using Ubuntu 2D. Still
the problem persists on my M6500 laptop with Quadro FX 2800M GPU.

Ubuntu 12.04
Linux box 3.2.0-51-generic #77-Ubuntu SMP Wed Jul 24 20:18:19 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
NVIDIA Driver Version: 304.88 [Additional Drivers: version current-updates]
Quadro FX 2800M (GPU 0)
Two displays: ViewSonic VX2439 Series (DFP-1), LGD (DFP-0)

[This file was removed because it was flagged as potentially malicious] (67.5 KB)
[This file was removed because it was flagged as potentially malicious] (66.3 KB)
[This file was removed because it was flagged as potentially malicious] (66.9 KB)
[This file was removed because it was flagged as potentially malicious] (66.5 KB)
[This file was removed because it was flagged as potentially malicious] (66.7 KB)
nvidia-bug-report-2012Aug12-r01.log.gz (76.7 KB)

GPU fell off the bus again today right after a cold boot. Temp does not seem to be the
problem because I checked and it was 37 C just before I ran the test. Both screens went
black within 5 seconds that time.

Typically, the GPU temp is around 51 C, which is one bar into the yellow region on the
NVIDIA settings widget.

Also have “UseEvents” set to “false” now and that is not preventing the issue either.

Section “Device”
Identifier “Device0”
Option “UseEvents” “false”
EndSection

So, does anyone have any idea what does cause the GPU to fall off the bus?

Or does anyone have any suggestions for what logs to capture that might illuminate
the root cause?

Switched to the 310.14 NVIDIA drivers. Same behavior.

Interestingly enough, it only happens with gnome-terminal. Running xterm does not
seem to produce the issue. Of course, scrolling is more difficult in xterm and I have
not spent much time testing there yet, but it is easily reproducible with gnome-terminal
on Quadro FX 2800M.

This looks very similar to this issue, which is labeled as NVIDIA bug “973068”:

http://www.nvnews.net/vbulletin/showthread.php?t=174759

According to “sandipt”:

We are reproduced this issue in house and investigating. Bug is :973068 Fedora 17: X freeze with flash HD video on firefox/chrome with NVRM: GPU at 0000:01:00.0 has fallen off the bus

However, I can find no further mention of bug “973068” in the release notes.

Still getting exactly the same bug on 64-bit Ubuntu 13.10 with GNOME Shell (stable packages only, nothing from “testing” or “unstable”) and the latest 319.60 driver. The problem can occur whilst using Firefox, Opera, gedit, LibreOffice Writer or Totem (“Videos”) - so it isn’t caused by a particular application.

I get this error as well, only my computer also locks up completely after the screens go black one at a time… This happens to me when ever i play a game requiring 3d acceleration. It only seems to happen if I turn on SLI=“mosaic”. I can run this computer for months with out any problem using twinview and play as many games as I want to, but as soon as I enable SLI=“mosaic” and open a game… lock up.
I can’t ssh to this box to run any kind of script or get it to respond in any sort of way aside from pushing and holding the power button.

hardware details

lspci
00:00.0 Host bridge: NVIDIA Corporation C55 Host Bridge (rev a2)
00:00.1 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1)
00:00.2 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1)
00:00.3 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1)
00:00.4 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1)
00:00.5 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a2)
00:00.6 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1)
00:00.7 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1)
00:01.0 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1)
00:01.1 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1)
00:01.2 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1)
00:01.3 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1)
00:01.4 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1)
00:01.5 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1)
00:01.6 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1)
00:02.0 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1)
00:02.1 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1)
00:02.2 RAM memory: NVIDIA Corporation C55 Memory Controller (rev a1)
00:03.0 PCI bridge: NVIDIA Corporation C55 PCI Express bridge (rev a1)
00:09.0 RAM memory: NVIDIA Corporation MCP55 Memory Controller (rev a2)
00:0a.0 ISA bridge: NVIDIA Corporation MCP55 LPC Bridge (rev a3)
00:0a.1 SMBus: NVIDIA Corporation MCP55 SMBus (rev a3)
00:0b.0 USB controller: NVIDIA Corporation MCP55 USB Controller (rev a1)
00:0b.1 USB controller: NVIDIA Corporation MCP55 USB Controller (rev a2)
00:0d.0 IDE interface: NVIDIA Corporation MCP55 IDE (rev a1)
00:0e.0 IDE interface: NVIDIA Corporation MCP55 SATA Controller (rev a3)
00:0e.1 IDE interface: NVIDIA Corporation MCP55 SATA Controller (rev a3)
00:0e.2 IDE interface: NVIDIA Corporation MCP55 SATA Controller (rev a3)
00:0f.0 PCI bridge: NVIDIA Corporation MCP55 PCI bridge (rev a2)
00:0f.1 Audio device: NVIDIA Corporation MCP55 High Definition Audio (rev a2)
00:11.0 Bridge: NVIDIA Corporation MCP55 Ethernet (rev a3)
00:12.0 Bridge: NVIDIA Corporation MCP55 Ethernet (rev a3)
00:13.0 PCI bridge: NVIDIA Corporation MCP55 PCI Express bridge (rev a3)
00:18.0 PCI bridge: NVIDIA Corporation MCP55 PCI Express bridge (rev a3)
01:00.0 PCI bridge: NVIDIA Corporation NF200 PCIe 2.0 switch (rev a2)
02:00.0 PCI bridge: NVIDIA Corporation NF200 PCIe 2.0 switch (rev a2)
02:02.0 PCI bridge: NVIDIA Corporation NF200 PCIe 2.0 switch (rev a2)
03:00.0 VGA compatible controller: NVIDIA Corporation G92 [GeForce GTS 250] (rev a2)
04:00.0 VGA compatible controller: NVIDIA Corporation G92 [GeForce GTS 250] (rev a2)
07:00.0 VGA compatible controller: NVIDIA Corporation G92 [GeForce GTS 250] (rev a2)

this is a Zotak 780i sli extreme mother board
running nvidia-drivers 319.60 on gentoo linux
32-bit PAE

There is no overheating issues or problems with my hardware, everything works fine until I enable sli
this same setup works in windows 7 just fine, no lockups (dual boot)

the only errors i can find in the logs are

Oct 24 04:31:22 maynard kernel: NVRM: GPU at 0000:03:00.0 has fallen off the bus.
Oct 24 04:31:26 maynard kernel: NVRM: GPU at 0000:04:00.0 has fallen off the bus.

Sometimes it’s two out of 3 cards, most times just one card.

Kernel information

linux 3.10.7-gentoo-r1 #4 SMP PREEMPT Thu Oct 31 01:01:31 CDT 2013 i686 Intel® Core™2 Quad CPU
Q6600 @ 2.40GHz GenuineIntel GNU/Linux

alt + sysrq rsciub does nothing, keyboard unresponsive (numlock/caps lock do nothing as well)

any information that is needed to solve this problem I will give including hardware details, compiler flags, kernel configuration and build options etc.