X.org crashes on Ubuntu 17.10 with driver nvidia-384 after upgrade

Upgraded from 17.04 to 17.10 and now X comes up and eventually stops. This was working on 17.04 with the nvidia-384 driver but on 17.10 it no longer does.

This is an AMD x399 system with a Titan Xp card.

I’m grabbing the drivers from:

500 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu artful/main i386 Packages
release v=17.10,o=LP-PPA-graphics-drivers,a=artful,n=artful,l=Proprietary GPU Drivers,c=main,b=i386
origin ppa.launchpad.net
500 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu artful/main amd64 Packages
release v=17.10,o=LP-PPA-graphics-drivers,a=artful,n=artful,l=Proprietary GPU Drivers,c=main,b=amd64
origin ppa.launchpad.net

I tried:

apt-get purge nvidia-*
apt-get install -y nvidia-*
shutdown -r now

$ dpkg -l | grep nvidia
ii nvidia-384 384.90-0ubuntu3 amd64 NVIDIA binary driver - version 384.90
ii nvidia-opencl-icd-384 384.90-0ubuntu3 amd64 NVIDIA OpenCL ICD
ii nvidia-prime 0.8.5 amd64 Tools to enable NVIDIA’s Prime
ii nvidia-settings 384.90-0ubuntu0~gpu17.10.1 amd64 Tool for configuring the NVIDIA graphics driver

$ sudo modinfo nvidia-384
filename: /lib/modules/4.13.0-16-lowlatency/updates/dkms/nvidia_384.ko
alias: char-major-195-*
version: 384.90
supported: external
license: NVIDIA
srcversion: 9D546A76FA9D9523F03995D
alias: pci:v000010DEd00000E00svsdbc04sc80i00*
alias: pci:v000010DEdsvsdbc03sc02i00
alias: pci:v000010DEdsvsdbc03sc00i00
depends:
name: nvidia
vermagic: 4.13.0-16-lowlatency SMP preempt mod_unload
parm: NVreg_Mobile:int
parm: NVreg_ResmanDebugLevel:int
parm: NVreg_RmLogonRC:int
parm: NVreg_ModifyDeviceFiles:int
parm: NVreg_DeviceFileUID:int
parm: NVreg_DeviceFileGID:int

X will come up (I see it in nvidia-smi) and then come down, then up, then down. I see a lot of these;

888.017] (–) NVIDIA(GPU-0):
[ 888.017] (–) NVIDIA(GPU-0): DFP-3: disconnected
[ 888.017] (–) NVIDIA(GPU-0): DFP-3: Internal DisplayPort
[ 888.017] (–) NVIDIA(GPU-0): DFP-3: 1440.0 MHz maximum pixel clock
[ 888.017] (–) NVIDIA(GPU-0):
[ 888.017] (–) NVIDIA(GPU-0): DFP-4: disconnected
[ 888.017] (–) NVIDIA(GPU-0): DFP-4: Internal TMDS
[ 888.017] (–) NVIDIA(GPU-0): DFP-4: 165.0 MHz maximum pixel clock
[ 888.017] (–) NVIDIA(GPU-0):
[ 888.017] (–) NVIDIA(GPU-0): DFP-5: disconnected
[ 888.017] (–) NVIDIA(GPU-0): DFP-5: Internal DisplayPort
[ 888.018] (–) NVIDIA(GPU-0): DFP-5: 1440.0 MHz maximum pixel clock
[ 888.018] (–) NVIDIA(GPU-0):
[ 888.018] (–) NVIDIA(GPU-0): DFP-6: disconnected
[ 888.018] (–) NVIDIA(GPU-0): DFP-6: Internal TMDS
[ 888.018] (–) NVIDIA(GPU-0): DFP-6: 165.0 MHz maximum pixel clock
[ 888.018] (–) NVIDIA(GPU-0):
[ 888.355] () Option “fd” “38”
[ 888.355] (II) event1 - (II) Power Button: (II) device removed
[ 888.356] (
) Option “fd” “41”
[ 888.356] (II) event0 - (II) Power Button: (II) device removed
[ 888.356] (**) Option “fd” “42”
[ 888.356] (II) event9 - (II) Eee PC WMI hotkeys: (II) device removed
[ 888.357] (II) UnloadModule: “libinput”
[ 888.357] (II) systemd-logind: releasing fd for 13:73
[ 888.370] (II) UnloadModule: “libinput”
[ 888.370] (II) systemd-logind: releasing fd for 13:64
[ 888.387] (II) UnloadModule: “libinput”
[ 888.387] (II) systemd-logind: releasing fd for 13:65
[ 888.439] (II) NVIDIA(GPU-0): Deleting GPU-0
[ 888.443] (II) Server terminated successfully (0). Closing log file.

And nvidia-settings -l gives me:

$ nvidia-settings -V -l

WARNING: NV-CONTROL extension not found on this Display.

ERROR: Error querying enabled displays on GPU 0 (Missing Extension).

ERROR: Error querying connected displays on GPU 0 (Missing Extension).

WARNING: NV-CONTROL extension not found on this Display.

WARNING: Unable to determine number of NVIDIA GPUs on ‘kitt:10.0’.

WARNING: Unable to determine number of NVIDIA Frame Lock Devices on ‘kitt:10.0’.

WARNING: Unable to determine number of NVIDIA VCSs on ‘kitt:10.0’.

WARNING: Unable to determine number of NVIDIA SDI Input Devices on ‘kitt:10.0’.

WARNING: Unable to determine number of NVIDIA Fans on ‘kitt:10.0’.

WARNING: Unable to determine number of NVIDIA Thermal Sensors on ‘kitt:10.0’.

WARNING: Unable to determine number of NVIDIA 3D Vision Pro Transceivers on ‘kitt:10.0’.

WARNING: Unable to determine number of NVIDIA Display Devices on ‘kitt:10.0’.

WARNING: Unable to determine number of NVIDIA X Screens on ‘kitt:10.0’.

My xorg.conf file:

$ cat /etc/X11/xorg.conf

nvidia-xconfig: X configuration file generated by nvidia-xconfig

nvidia-xconfig: version 384.90 (buildmeister@swio-display-x86-rhel47-05) Tue Sep 19 18:13:03 PDT 2017

Section “ServerLayout”
Identifier “Layout0”
Screen 0 “Screen0” 0 0
InputDevice “Keyboard0” “CoreKeyboard”
InputDevice “Mouse0” “CorePointer”
EndSection

Section “Files”
EndSection

Section “InputDevice”

# generated from default
Identifier     "Mouse0"
Driver         "mouse"
Option         "Protocol" "auto"
Option         "Device" "/dev/psaux"
Option         "Emulate3Buttons" "no"
Option         "ZAxisMapping" "4 5"

EndSection

Section “InputDevice”

# generated from default
Identifier     "Keyboard0"
Driver         "kbd"

EndSection

Section “Monitor”
Identifier “Monitor0”
VendorName “Unknown”
ModelName “Unknown”
HorizSync 28.0 - 33.0
VertRefresh 43.0 - 72.0
Option “DPMS”
EndSection

Section “Device”
Identifier “Device0”
Driver “nvidia”
VendorName “NVIDIA Corporation”
EndSection

Section “Screen”
Identifier “Screen0”
Device “Device0”
Monitor “Monitor0”
DefaultDepth 24
Option “AllowEmptyInitialConfiguration” “True”
SubSection “Display”
Depth 24
EndSubSection
EndSection

What am I doing wrong?
nvidia-bug-report.log.gz (167 KB)

Hello, sorry you didn’t get an answer, I just want to add a “me too” post.

I’ve cleanly installed Ubuntu 17.10, and installed the driver 384.90 from the “Software&Updates” “Additional Drivers” tab.

It appears to work fine for a while and then it just crashes, be it one minute after boot, or 5, or 10.

I have a GTX 980 and no overclocking enabled. I can make it recover if I switch between TTY back and forth a few times.

I don’t seem to find how to attach the log. The button showed up after posting. Very intuitive.
nvidia-bug-report.log.gz (248 KB)

Hi,
Smae here, updated from 17.04 to 17.10.

Screen is freezing every second. in the log i have:

Oct 23 13:13:29 htpc /usr/lib/gdm3/gdm-x-session[1837]: (--) NVIDIA(GPU-0):
Oct 23 13:13:29 htpc /usr/lib/gdm3/gdm-x-session[1837]: (--) NVIDIA(GPU-0): CTV (DFP-1): connected
Oct 23 13:13:29 htpc /usr/lib/gdm3/gdm-x-session[1837]: (--) NVIDIA(GPU-0): CTV (DFP-1): Internal TMDS
Oct 23 13:13:29 htpc /usr/lib/gdm3/gdm-x-session[1837]: (--) NVIDIA(GPU-0): CTV (DFP-1): 340.0 MHz maximum pixel clock
Oct 23 13:13:29 htpc /usr/lib/gdm3/gdm-x-session[1837]: (--) NVIDIA(GPU-0):
Oct 23 13:13:30 htpc /usr/lib/gdm3/gdm-x-session[1837]: (--) NVIDIA(GPU-0): CTV (DFP-1): connected
Oct 23 13:13:30 htpc /usr/lib/gdm3/gdm-x-session[1837]: (--) NVIDIA(GPU-0): CTV (DFP-1): Internal TMDS
Oct 23 13:13:30 htpc /usr/lib/gdm3/gdm-x-session[1837]: (--) NVIDIA(GPU-0): CTV (DFP-1): 340.0 MHz maximum pixel clock
Oct 23 13:13:30 htpc /usr/lib/gdm3/gdm-x-session[1837]: (--) NVIDIA(GPU-0):
Oct 23 13:13:30 htpc /usr/lib/gdm3/gdm-x-session[1837]: (--) NVIDIA(GPU-0): CTV (DFP-1): connected
Oct 23 13:13:30 htpc /usr/lib/gdm3/gdm-x-session[1837]: (--) NVIDIA(GPU-0): CTV (DFP-1): Internal TMDS
Oct 23 13:13:30 htpc /usr/lib/gdm3/gdm-x-session[1837]: (--) NVIDIA(GPU-0): CTV (DFP-1): 340.0 MHz maximum pixel clock
Oct 23 13:13:30 htpc /usr/lib/gdm3/gdm-x-session[1837]: (--) NVIDIA(GPU-0):
Oct 23 13:13:30 htpc /usr/lib/gdm3/gdm-x-session[1837]: (--) NVIDIA(GPU-0): CTV (DFP-1): connected
Oct 23 13:13:30 htpc /usr/lib/gdm3/gdm-x-session[1837]: (--) NVIDIA(GPU-0): CTV (DFP-1): Internal TMDS
Oct 23 13:13:30 htpc /usr/lib/gdm3/gdm-x-session[1837]: (--) NVIDIA(GPU-0): CTV (DFP-1): 340.0 MHz maximum pixel clock
Oct 23 13:13:30 htpc /usr/lib/gdm3/gdm-x-session[1837]: (--) NVIDIA(GPU-0):
Oct 23 13:13:31 htpc /usr/lib/gdm3/gdm-x-session[1837]: (--) NVIDIA(GPU-0): CTV (DFP-1): connected
Oct 23 13:13:31 htpc /usr/lib/gdm3/gdm-x-session[1837]: (--) NVIDIA(GPU-0): CTV (DFP-1): Internal TMDS
Oct 23 13:13:31 htpc /usr/lib/gdm3/gdm-x-session[1837]: (--) NVIDIA(GPU-0): CTV (DFP-1): 340.0 MHz maximum pixel clock
Oct 23 13:13:31 htpc /usr/lib/gdm3/gdm-x-session[1837]: (--) NVIDIA(GPU-0):
Oct 23 13:13:31 htpc /usr/lib/gdm3/gdm-x-session[1837]: (--) NVIDIA(GPU-0): CTV (DFP-1): connected
Oct 23 13:13:31 htpc /usr/lib/gdm3/gdm-x-session[1837]: (--) NVIDIA(GPU-0): CTV (DFP-1): Internal TMDS
Oct 23 13:13:31 htpc /usr/lib/gdm3/gdm-x-session[1837]: (--) NVIDIA(GPU-0): CTV (DFP-1): 340.0 MHz maximum pixel clock

System is not usable =(

Well, at least I know I’m not the only one.

Does NV want to chime in here?

Does anyone have a work around? (I don’t see a dedicated nvidia-375 to rollback too either, only a transitional one to 384?)

Did you try to disable wayland in /etc/gdm3/custom.conf ?

No change. (I uncommented the “WaylandEnabled=false” line, rebooted, same thing).

This looks more like gnome crashing and restarting. What’s the output of
sudo journalctl -b0 --no-pager |grep gnome

Oct 23 16:09:57 kitt gnome-shell[50409]: Execution of main.js threw exception: JS_EvaluateScript() failed

Oct 23 16:09:57 kitt gnome-session[50386]: gnome-session-binary[50386]: WARNING: App 'org.gnome.Shell.desktop' exited with code 1
Oct 23 16:09:57 kitt gnome-session-binary[50386]: WARNING: App 'org.gnome.Shell.desktop' exited with code 1
Oct 23 16:09:58 kitt gnome-shell[50456]: JS WARNING: [resource:///org/gnome/shell/ui/main.js 315]: reference to undefined property "MetaStage"
Oct 23 16:09:58 kitt gnome-shell[50456]: JS WARNING: [resource:///org/gnome/shell/ui/layout.js 217]: reference to undefined property "MetaWindowGroup"
Oct 23 16:09:58 kitt gnome-shell[50456]: JS ERROR: TypeError: this.primaryMonitor is undefined
                                         LayoutManager<._updateBoxes@resource:///org/gnome/shell/ui/layout.js:469:9
                                         wrapper@resource:///org/gnome/gjs/modules/_legacy.js:82:22
                                         LayoutManager<._monitorsChanged@resource:///org/gnome/shell/ui/layout.js:503:9
                                         wrapper@resource:///org/gnome/gjs/modules/_legacy.js:82:22
                                         LayoutManager<._init@resource:///org/gnome/shell/ui/layout.js:280:9
                                         wrapper@resource:///org/gnome/gjs/modules/_legacy.js:82:22
                                         _Base.prototype._construct@resource:///org/gnome/gjs/modules/_legacy.js:18:5
                                         Class.prototype._construct/newClass@resource:///org/gnome/gjs/modules/_legacy.js:117:20
                                         _initializeUI@resource:///org/gnome/shell/ui/main.js:152:21
                                         start@resource:///org/gnome/shell/ui/main.js:126:5
Oct 23 16:09:58 kitt gnome-shell[50456]: Execution of main.js threw exception: JS_EvaluateScript() failed
Oct 23 16:09:58 kitt gnome-session[50386]: gnome-session-binary[50386]: WARNING: App 'org.gnome.Shell.desktop' exited with code 1
Oct 23 16:09:58 kitt gnome-session[50386]: gnome-session-binary[50386]: WARNING: App 'org.gnome.Shell.desktop' respawning too quickly
Oct 23 16:09:58 kitt gnome-session[50386]: gnome-session-binary[50386]: CRITICAL: We failed, but the fail whale is dead. Sorry....
Oct 23 16:09:58 kitt gnome-session-binary[50386]: Unrecoverable failure in required component org.gnome.Shell.desktop
Oct 23 16:09:58 kitt gnome-session-binary[50386]: WARNING: App 'org.gnome.Shell.desktop' exited with code 1
Oct 23 16:09:58 kitt gnome-session-binary[50386]: WARNING: App 'org.gnome.Shell.desktop' respawning too quickly
Oct 23 16:09:58 kitt gnome-session-binary[50386]: CRITICAL: We failed, but the fail whale is dead. Sorry....
Oct 23 16:09:58 kitt gnome-screensav[50404]: gnome-screensaver: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.
Oct 23 16:09:58 kitt dbus-daemon[50526]: Activating service name='org.gnome.ScreenSaver'
Oct 23 16:09:58 kitt org.gnome.ScreenSaver[50526]: Unable to init server: Could not connect: Connection refused
Oct 23 16:09:58 kitt gnome-screensav[50534]: Cannot open display:
Oct 23 16:09:58 kitt dbus-daemon[50526]: Activated service 'org.gnome.ScreenSaver' failed: Process org.gnome.ScreenSaver exited with status 1
Oct 23 16:09:58 kitt gnome-session[50528]: gnome-session-binary[50528]: CRITICAL: Unable to create a DBus proxy for GnomeScreensaver: Error calling StartServiceByName for org.gnome.ScreenSaver: GDBus.Error:org.freedesktop.DBus.Error.Spawn.ChildExited: Process org.gnome.ScreenSaver exited with status 1
Oct 23 16:09:58 kitt gnome-session-binary[50528]: CRITICAL: Unable to create a DBus proxy for GnomeScreensaver: Error calling StartServiceByName for org.gnome.ScreenSaver: GDBus.Error:org.freedesktop.DBus.Error.Spawn.ChildExited: Process org.gnome.ScreenSaver exited with status 1
Oct 23 16:09:58 kitt gnome-shell[50536]: Can't initialize KMS backend: could not find drm kms device
Oct 23 16:09:58 kitt gnome-session[50528]: gnome-session-binary[50528]: WARNING: App 'org.gnome.Shell.desktop' exited with code 1
Oct 23 16:09:58 kitt gnome-session-binary[50528]: WARNING: App 'org.gnome.Shell.desktop' exited with code 1
Oct 23 16:09:58 kitt gnome-session-binary[50528]: Unrecoverable failure in required component org.gnome.Shell.desktop

But why is X spewing out all the disconnect messages as well from the Nvidia driver?

Pisymbol, thanks for pointing that out. Indeed, your problem is a different one than that of the others. On your system, the nvidia driver is failing to detect any display and since gnome can’t run headless, it’s restarting over and over.
Can you test if the 387.12 driver is working on your system?

All other people: on your system, the display is detected, X is starting fine, check if disabling wayland works. Otherwise, open a new thread.

387.12 does the exact same thing (same error messages).

Btw, there is no display connected when the machine boots up (it is primarily a ML box that occasionally acts as a desktop). But this wasn’t a problem before with 375 on 17.04 (i.e. I could connect the HDMI connection later and things would just work).

Call me crazy, but this definitely seems like a bug? Is there any work around other than turning X completely off?

Meh. You should have mentioned that earlier. Not a bug then. Gnome can’t start when no display is connected. 17.04 had Unity, that could start headless.
If you want to start headless with Gnome, you will have to fake a monitor in xorg.conf. Or change to a different DE.

According to this:
https://www.phoronix.com/scan.php?page=news_item&px=GNOME-Shell-3.26.1
the Gnome people finally implemented a headless mode. So seems like either Ubuntu doesn’t ship the update yet or there’a bug with that. Check your package version and report a bug with Gnome if you have the right one.

What about the rest of us, though? I don’t mess with the displays at all, and still experience problems. For now I had to uninstall the driver and am using neuveau, but the performance suffers with neuveau.

Andybdanny, some things I noticed in your logs

  • You have the onboard intel GPU active, that’s configured though not having a monitor attached. Can you disable that in bios?
  • There’s some flaky USB card reader in your system, is there an sdcard in it?

Besides, did you disable wayland?

what is the output of “ldd /usr/bin/gnome-shell| grep GL”? If there is the term “mesa” in there, your mutter is linked incorrectly against it.

If it says something like:
libEGL.so.1 => /usr/lib/libEGL.so.1 (0x00007fbe653da000)
libGLdispatch.so.0 => /usr/lib/libGLdispatch.so.0 (0x00007fbe5fc7a000)
then the issue is elsewhere.

This is a bit drastic but if you can, try compiling gdm with “–disable-wayland-support”, mutter with “–disable-wayland --disable-native-backend --enable-kms-egl-platform=no --disable-wayland-egl-server” and then recompile gnome-shell as normal.
It is sad but that’s the only way I managed to get a stable X11 session under gnome.

The problem might be the lowlatency-kernel. I have already opened a bug-report in ubuntu (Bug #1725169).
Uninstall this kernel and it will work.

The headless explanation is just horrible if that is the root cause of this. I have never had this problem in the past (including with gdm).

I bought a headless HDMI dongle from Amazon (fit 4k), it arrives tomorrow. I will plug it in and presumably then the problem should just go away.

And yeah…holy cow…I zonked in a my dummy HDMI dongle and no more crashes. I reverted back to 384 (from 387.12) and it works. Eee gad this is silly.

pisymbol did you plugged this to your nvidia card? or to onboard?

Nvidia card.