Arch Linux on nvidia 375.20-3 driver: mouse disappears, virtual consoles blank.

Hey All,

Running an up to date Arch Linux install on a System76 Oryx Pro [NVIDIA Corporation GP106M [GeForce GTX 1060] (rev a1)]. Seemingly randomly, my mouse will disappear and my virtual consoles go blank. Everything still works. I can use the mouse if I am very careful and I can type in the virtual console, but the only solution I have found is restarting lightdm (or reboot the machine). After the error, I see MANY entries in dmesg which look like:

NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000001 000000c0 0001007c 00000007 00000000

with variations in the last 5 columns. And,

nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.

sprinkled in between. There are various posts floating around with similar problems. I tried:

  • Switching between virtual console and GUI
  • This section of ArchWiki (https://wiki.archlinux.org/index.php/NVIDIA#DRM_kernel_mode_setting)
  • Opening xterm and running some commands

Anyone have any suggestions?

Additional details. This error only occurs with an external monitor attached. The mouse is still visible on the main laptop screen, but moves extremely slow. I think this is because every time it moves the NVRM messages in dmesg continually fill up.

Thanks for the report. I haven’t seen this, though I’m setting up a comparable configuration to see if I can trigger it.

Just to be clear, by “my virtual consoles go blank” do you mean a terminal emulator (such as xterm or gnome-terminal) run within the desktop is just a black window? Or, do you mean that after the mouse is no longer visible on the external monitor, you press, e.g., ctrl-alt-f1, and your monitor goes blank, instead of displaying a console? If the latter, does the console, when pressing ctrl-alt-f1, display correctly if you haven’t already lost your mouse?

For when this happens, is the “Seemingly randomly” while the system is idle and not receiving input (e.g., could this be correlated to DPMS or another power management event happening), or does this happen randomly while you are actively interacting with the system? If the former, does turning off dpms and friends (xset s off; xset s noblank; xset -dpms) avoid the problem? That isn’t a good work around, but it will help to know if that is what is triggering the problem.

Lastly, do you know if this is a new problem with 375.20, or can you reproduce this with older drivers?

Could you attach an nvidia-bug-report.log?

THanks.

Andy,

I will try to add as much information at I can.

Just to be clear, by “my virtual consoles go blank” do you mean a terminal emulator (such as xterm
or gnome-terminal) run within the desktop is just a black window? Or, do you mean that after the
mouse is no longer visible on the external monitor, you press, e.g., ctrl-alt-f1, and your monitor
goes blank, instead of displaying a console? If the latter, does the console, when pressing ctrl-
alt-f1, display correctly if you haven’t already lost your mouse?

If I am on the main laptop display only, I don’t lose (it works, but invisible) my mouse. Only on external displays. The issue happens without the external attached too, but is much less obvious (which is why I mentioned that above). For example, as I was typing this I tested ctrl-alt-f1 and the screen was blank. I went to ctrl-alt-f2, although blank, I typed carefully to restart lightdm. And yes, after I restarted lightdm I could ctrl-alt-f{1,2,7} etc.

For when this happens, is the “Seemingly randomly” while the system is idle and not receiving
input (e.g., could this be correlated to DPMS or another power management event happening), or
does this happen randomly while you are actively interacting with the system? If the former, does
turning off dpms and friends (xset s off; xset s noblank; xset -dpms) avoid the problem? That
isn’t a good work around, but it will help to know if that is what is triggering the problem.

I think it is while I am actively using it. The best example I have was when I was giving a presentation on an external projector and it happened. I did try xset dpms force off to restore my cursor from https://forum.xfce.org/viewtopic.php?id=10695 at one point. I can try to turn it off more permanently for debugging.

Lastly, do you know if this is a new problem with 375.20, or can you reproduce this with older
drivers?

I can try to install an older driver. Any suggestions? 367.57 seem reasonable?

Could you attach an nvidia-bug-report.log?

I will let it fail again and attach the bug report, then try the next suggestions.

Thanks!

Barry

Thanks for the clarifications.

I sort of wonder if losing the mouse and the blank console symptoms are two separate bugs.

Our console restoration code had some big changes between 370.xx and 375.xx, so I’d be curious if you see the blank console problem with any 370.xx driver.

For losing the mouse, yes, I think testing something 367.xx vintage would be far enough back.

Thanks!

More information will be on the way. Thanks Andy!

As I prepared to install the 367 driver, I replaced nvidia-libgl with mesa-libgl and I haven’t run into the issue again. I am going to give it 2 days to be safe, then reinstall nvidia-libgl to get the bug report. Just want to keep this updated.

Thank you chiroptical for the tip,
your solution applied to me too. Since last 375.20-3 update SDDM became pretty irresponsive when locking Plasma session (plus other weird behaviors) and Gwenview was crashing each time I tried to display a picture fullscreen (call stack was ending somewhere in nvidia libraries). Replacing nvidia-libgl by mesa one seems to solve these two problems. I’m on an up to date Arch Linux too with a NVIDIA Corporation GK107 [GeForce GTX 650] (rev a1)

That is useful isolation; thanks.

However, it is curious if replacing NVIDIA’s OpenGL with Mesa’s OpenGL solved the problem. If NVIDIA’s X driver and server-side GLX is being used but Mesa’s client-side OpenGL is being used, that suggests the system is falling back to GLX indirect rendering, but most modern configurations disable GLX indirect rendering. In both the failing and working configurations, could you capture the output of:

glxinfo | grep "vendor"
and
ldd /usr/bin/glxinfo

?

Working Configuration:

[chiroptical ~]$ glxinfo | grep "vendor"
server glx vendor string: SGI
client glx vendor string: Mesa Project and SGI
OpenGL vendor string: VMware, Inc.

and

[chiroptical ~]$ ldd /usr/bin/glxinfo 
	linux-vdso.so.1 (0x00007fffbcf9b000)
	libGL.so.1 => /usr/lib/libGL.so.1 (0x00007fca54b2e000)
	libX11.so.6 => /usr/lib/libX11.so.6 (0x00007fca547ef000)
	libc.so.6 => /usr/lib/libc.so.6 (0x00007fca54451000)
	libexpat.so.1 => /usr/lib/libexpat.so.1 (0x00007fca54227000)
	libxcb-dri3.so.0 => /usr/lib/libxcb-dri3.so.0 (0x00007fca54024000)
	libxcb-present.so.0 => /usr/lib/libxcb-present.so.0 (0x00007fca53e21000)
	libxcb-sync.so.1 => /usr/lib/libxcb-sync.so.1 (0x00007fca53c1a000)
	libxshmfence.so.1 => /usr/lib/libxshmfence.so.1 (0x00007fca53a17000)
	libglapi.so.0 => /usr/lib/libglapi.so.0 (0x00007fca537e8000)
	libXext.so.6 => /usr/lib/libXext.so.6 (0x00007fca535d6000)
	libXdamage.so.1 => /usr/lib/libXdamage.so.1 (0x00007fca533d3000)
	libXfixes.so.3 => /usr/lib/libXfixes.so.3 (0x00007fca531cd000)
	libX11-xcb.so.1 => /usr/lib/libX11-xcb.so.1 (0x00007fca52fcb000)
	libxcb.so.1 => /usr/lib/libxcb.so.1 (0x00007fca52da2000)
	libxcb-glx.so.0 => /usr/lib/libxcb-glx.so.0 (0x00007fca52b86000)
	libxcb-dri2.so.0 => /usr/lib/libxcb-dri2.so.0 (0x00007fca52981000)
	libXxf86vm.so.1 => /usr/lib/libXxf86vm.so.1 (0x00007fca5277b000)
	libdrm.so.2 => /usr/lib/libdrm.so.2 (0x00007fca5256b000)
	libm.so.6 => /usr/lib/libm.so.6 (0x00007fca52267000)
	libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007fca5204a000)
	libdl.so.2 => /usr/lib/libdl.so.2 (0x00007fca51e46000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fca54da0000)
	libXau.so.6 => /usr/lib/libXau.so.6 (0x00007fca51c42000)
	libXdmcp.so.6 => /usr/lib/libXdmcp.so.6 (0x00007fca51a3c000)

I will be on vacation next week so I can try to recreate the issue and send you the report.

Thanks!

Not working configuration:

[chiroptical ~]$ glxinfo | grep "vendor"
server glx vendor string: NVIDIA Corporation
client glx vendor string: NVIDIA Corporation
OpenGL vendor string: NVIDIA Corporation
[chiroptical ~]$ ldd /usr/bin/glxinfo 
	linux-vdso.so.1 (0x00007ffdec3d4000)
	libGL.so.1 => /usr/lib/libGL.so.1 (0x00007f31a3879000)
	libX11.so.6 => /usr/lib/libX11.so.6 (0x00007f31a353a000)
	libc.so.6 => /usr/lib/libc.so.6 (0x00007f31a319c000)
	libGLX.so.0 => /usr/lib/libGLX.so.0 (0x00007f31a2f6b000)
	libXext.so.6 => /usr/lib/libXext.so.6 (0x00007f31a2d59000)
	libGLdispatch.so.0 => /usr/lib/libGLdispatch.so.0 (0x00007f31a2a83000)
	libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f31a287f000)
	libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f31a2662000)
	libxcb.so.1 => /usr/lib/libxcb.so.1 (0x00007f31a2439000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f31a3af1000)
	libXau.so.6 => /usr/lib/libXau.so.6 (0x00007f31a2235000)
	libXdmcp.so.6 => /usr/lib/libXdmcp.so.6 (0x00007f31a202f000)

As soon as I can get the bug report I will attach. - Barry

Captured. Let me know when you have it. I would like to remove the link.

Thanks, I have your log; you can remove it from google drive.

The particular Xids in the log indicate that the display engine of the GPU cannot translate the buffer handles provided by the driver to the video memory buffers described by those handles. I suspect an earlier error caused the display engine to lose its handle table. But, I can’t find anything earlier in the log to suggest that sort of error.

The one other interesting I see is your use of the “composition pipeline”:

Option  "metamodes" "nvidia-auto-select +0+0 { ForceFullCompositionPipeline = On }"

Would you mind testing without ForceFullCompositionPipeline, to see if that makes a difference in reproducing the problem?

Thanks.

This should already be on. I believe I found that in the Arch Wiki.

[chiroptical ~]$ cat /etc/X11/xorg.conf.d/20-nvidia.conf 
Section "Device"                                                                 
	Identifier "NVIDIA GPU"                                                      
	Driver "nvidia"                                                              
	Option  "metamodes" "nvidia-auto-select +0+0 { ForceFullCompositionPipeline = On }"
EndSection

[chiroptical ~]$ nvidia-settings -q CurrentMetaMode

  Attribute 'CurrentMetaMode' (wilder:0.0): id=50, switchable=yes, source=RandR :: DPY-1: nvidia-auto-select @1920x1080 +0+0 {ViewPortIn=1920x1080, ViewPortOut=1920x1080+0+0,
  ForceCompositionPipeline=On, ForceFullCompositionPipeline=On}, DPY-3: nvidia-auto-select @1680x1050 +1920+0 {ViewPortIn=1680x1050, ViewPortOut=1680x1050+0+0}

OP, have you posted a thread here as well?

Arch Forum
https://www.linuxquestions.org/questions/arch-29/

I did not. I felt it was more appropriate here, especially after narrowing the issue down to nvidia-libgl. The computer is perfectly fine with mesa-libgl.

chiroptical: Can you please retest without ForceFullCompositionPipeline. I.e.,

Option “metamodes” “nvidia-auto-select +0+0”

Thanks.

Sorry Andy, changes made. I’ll keep an eye on it.

New bug report, seemingly same issue: https://drive.google.com/file/d/0B3k9IViw0wE_a25LN0diVFRsT0U/view?usp=sharing

Thanks. I’ve downloaded the log, so you can delete it from google drive, if you like.

The Xid signature is slightly different in that log, but the display engine is having the same problem: for some reason, it cannot translate the buffer handles provided by the driver.

I’ll try to continue investigating.