X Server frequently crashes when using Firefox in Manjaro/GnomeX11 – Dell G5 GTX1060M [Prime, nvidia 415.27]

Hello everyone!

I have bit of a instability problem which I try to describe below. As I understand a lot of people have similar difficulties, so maybe this could help them as well.

I have been trying to solve this for a while now and I have no valuable data on my computer yet, so I’m open for suggestions.

Abstract:
Experiencing frequent crashes, mainly when using Firefox and other standard Gnome/Manjaro applications.
I have set up Prime with proprietary nvidia 415.27 driver running Manjaro on a Dell G5 GTX1060M.

Background:
I have a recently purchased a laptop: Dell G5 5587 with a GeForce GTX 1060 Mobile for casual 3D modelling and programming.

Out of the box it had Ubuntu installed with nvidia graphic card unset (only visible as “unknown”). There was a posibility to activate proprietary nvidia drivers from a menu, but never tried to use their setup, since I wanted to use a different distro anyway and in my naivety thought that this should not be much of a challenge. :D

Long story short, I first tried setting up nvidia on OpenSuse, then Arch Linux and now Im running Manjaro 4.19.20. OpenSuse felt broken from the start, thus I switched to Arch where I tried setting up and running Bumblebee and later bbswitch. Neither worked well for me (which could be due to my own mistakes).
I decided to re-format everything and try to run nvidia proprietary drivers with Prime. By accident I came across a promising tutorial for Manjaro and since it [Manjaro] shares similarities with Arch I gave it a go.

The tutorial I used for graphics after installing Manjaro: https://forum.manjaro.org/t/howto-set-up-prime-with-nvidia-proprietary-driver/40225

Also as a youtube video:

Manjaro installs noveau by default which constantly freezes and sometimes crashes. Thus I jumped directly to step #2: Install NVIDIA drivers in the tutorial and did everything until SDDM (since I don’t have SDDM).

Now the system “works” as long as I avoid using Firefox and many of the standard gnome applications.

However when I have used LibreOffice and Blender 2.8 it never crashed so far. LibreOffice is not heavy on the GPU, but in Blender I have done renderings using Cycles with GPU - CUDA. Not OpenCL since its not available as an option for some reason.

What I have installed:

  1. Manjaro Gnome Edition 4.19.20 (runs X11 - not wayland)
  2. The mentioned tutorial. - that’s it.

When it crashes:
Sometimes is crashes even when using standard Gnome/Manjaro applications, like “Settings”.
However, when using Firefox it crashes as if it was scheduled (I tried to uncheck hardware acceleration in Firefox, but it didn’t help).
It especially crashes when playing youtube videos (I have not installed flash).

All of these crashes follow the same procedure.
First the screen and sound freezes and after a short while caps lock starts blinking, then it goes into restart. When it starts up, it does so with a buggy keyboard - unless I do hard shutdown.

When it usually doesn’t crash:
When using LibreOffice, Blender 2.8

Many Thanks in advance!!!

inxi_out.txt (2.36 KB)
nvidia-bug-report.log.gz (1.04 MB)

Unfortunately, there are no crash traces visible in the logs, did you create it right after the forced reboot?
Please delete the file /etc/X11/xorg.conf.d/optimus.conf, it collides with the system provided config in /usr/share/X11/xorg.conf.d/10-nvidia-drm-outputclass.conf.
Due to the nature of those crashes (flashing keyboard, self-rebooting) this rather sounds like overheating, did you previously monitor temperatures using sensors/nvidia-settings/nvidia-smi?

Thank you generix for looking into it!

I deleted the optimus.sh-file

I made a new attempt to obtain a logfile. Maybe I do something wrong, so this is my procedure:

  • According to guidelines I need to start X with the logverbose flag. Thus, as I start the computer, I go into GRUB and add the flag $vt_handoff 3, so that I only start in console.
  • In console I login as root and I run: startx -- -logverbose 6 (perhaps there is a better way to do it?)
  • Now in X I run nvidia-smi dmon -o T >> smi_log.txt to log temperatures.
  • I proceed to “make it crash”, which in this case is to play something on youtube.
  • When it freezes and caps lock starts blinking at 19:56:58, I shut it down by holding the power-button.
  • Then hold the power-button to power it up.
  • For consistency I do the same startup procedure with $vt_handoff.. and startx..
  • When X starts up I run nvidia-bug-report.sh
  • The temperatures seems stable according to nvidia-smi.

    I found some lines in the bug-report that might be part of the problem:

    line 4550: (/var/log/Xorg.2.log)
    [ 96.041] (EE) /dev/dri/card1: failed to set DRM interface version 1.4: Permission denied

    line 4601:
    [ 96.079] (EE) NVIDIA: Failed to load module “glxserver_nvidia” (module does not exist, 0)
    [ 96.079] (EE) NVIDIA(0): Failed to initialize the GLX module; please check in your X
    [ 96.079] (EE) NVIDIA(0): log file that the GLX module has been loaded in your X
    [ 96.079] (EE) NVIDIA(0): server, and that the module is the NVIDIA GLX module. If
    [ 96.079] (EE) NVIDIA(0): you continue to encounter problems, Please try
    [ 96.079] (EE) NVIDIA(0): reinstalling the NVIDIA driver.

    Leading to this later on:

    [ 96.674] (EE)
    [ 96.674] (EE) Backtrace:
    [ 96.675] (EE) 0: /usr/lib/Xorg (xorg_backtrace+0x4d) [0x55fcad5531fd]
    ….
    [ 96.713] (EE) Server terminated with error (1). Closing log file.

    nvidia-bug-report.log.gz (1.01 MB)
    smi_log.txt (24.8 KB)

    The errors are from some old log, probably during a system update or the like.
    Since you have to forcibly turn off the notebook this seems to be a “regular” kernel panic which points to some hardware problem. According to the smi_log, the gpu is actually doing nothing at the time of the crash, just idle at minimum clocks.
    Unfortunately, the kernel panic would be visible only on the console which is hidden by the Xserver, of course.
    Please run (after crash reboot)
    sudo journalctl -b -1 --no-pager |grep kernel >kernel.txt
    and attach that.
    Probably nothing gets logged, because it’s a kernel panic. Tough luck.

    Just to rule out everything, please modify the file /usr/share/X11/xorg.conf.d/10-nvidia-drm-outputclass.conf
    and comment out both lines with “ModulePath”.

    Hello again!

    Thank you for looking into it!

    To be more specific, the computer will automatically reboot itself after the caps-lock key has blinked for a while. Error code will show up during start-up (see attachment).
    If I let it reboot by itself the keyboard will be corrupt in a way that it will reproduce various keystroke. e.g say you want to type “root” and you get “rrrrrrrrrroot”.
    To avoid this I typically hold down the power-button for a forced reboot.

    I tried to comment and uncomment “ModulePath” and there are some differences regarding the selected OpenGL render. but it keeps crashing – if not more so.

    Commented
    Display: x11 server: X.org 1.20.3 driver: modesetting FAILED: nvidia
    unloaded: intel,nouveau alternate: fbdev,nv,vesa compositor: gnome-shell
    resolution:
    OpenGL: renderer: llvmpipe (LLVM 7.0 256 bits) v: 3.3 Mesa 18.3.2
    compat-v: 3.1 direct render: Yes

    Uncommented
    Display: x11 server: X.org 1.20.3 driver: modesetting FAILED: nvidia
    unloaded: intel,nouveau alternate: fbdev,nv,vesa compositor: gnome-shell
    resolution:
    OpenGL: renderer: GeForce GTX 1060 with Max-Q Design/PCIe/SSE2

    The part where is says modesetting FAILED has recently disappeared and now it only says:
    Display: server: X.org 1.20.3 driver: modesetting,nvidia
    However there is no difference in crash-frequency.

    I attached the kernel.txt file from a recent crash. Unfortunately the crash occurred 16:52
    and the log stops 16:49.
    But I do have some older logs which show more interesting stuff at the end, but I don’t remember the exact circumstances, nor how to reproduce such log.
    Maybe they are also of no use since the key errors don’t seem to be kernel related.

    I also tried to set (as I saw in one tutorial):
    xrandr --setprovideroutputsource Intel NVIDIA-0
    instead of:
    xrandr --setprovideroutputsource modesetting NVIDIA-0
    but it made no difference.
    journal_1.txt (310 KB)
    kernel.txt (81.1 KB)
    journal_2.txt (150 KB)

    The MCE error on boot is kind of normal, just due to some sloppy bios devs.
    Unfortunately, nothing is caught in the logs, catching a kernel panic is not really easy. Since you said you just recently bought the notebook, I presume it’s still under warranty. Due to the nature of those crashes I suspect a general HW fault so you should rather concentrate on RMA’ing it. So either reinstall the original Ubuntu or even Windows including the nvidia driver to be able to reproduce the crash and then call the Dell support.