Ubuntu 18.04 on ASUS ZenBook UX550VE crashes soon after boot to desktop

The problem occurs on a fresh install of Ubuntu with Nvidia 396 drivers installed from graphics-drivers PPA and an X session (no Wayland). A couple of seconds after desktop shows up (autologin enabled) everything freezes, mouse/touchpad are unresponsive, so is keyboard. Once (and that’s captured in the attached logs) I’ve managed to get to the state that keyboard actually responded to SysRq events and for some reason remounting in ro unfroze X for another couple of seconds. Any other boot ended with a completely unresponsive system and I had to hold power button to turn the laptop off.

nvidia-bug-report.log.gz (112 KB)

Now that’s a sh…load of problems. A minor one, see if you can disable the sdcard reader in bios, that one doesn’t work anyway in linux and just produces pci bus errors, flooding logs. Otherwise you may try setting kernel parammeter pci=noaer to at least suppress those.
Then the whole thunderbolt subsystem crashes and the tb devices going awol, not showing up in lspci anymore though being some standard, well supported intel JHL6540 Thunderbolt 3 Bridge. Maybe try a 4.17 kernel and see if that improves the situation.
Finally the nvidia gpu hangs with an XID 31, what kind of DE are you using, standard ubuntu gnome?

I’ve tried 4.17 with no success.

Standard Ubuntu Gnome Shell.

Made another try with 4.17, 396 drivers, with disabled sdcard reader (I think so at least). Same results. Logs in nvidia-bug-report-nosdcard.log.gz

I’ve tried with “noapic noacpi nosplash irqpoll” which booted properly and doesn’t hang the OS. There are two issues though. One that there are multiple pixels randomly flickering on the screen. Two after a couple of minutes input devices stop working, then the DE restarts, flickering stops for a few seconds and it is like this for another couple of minutes. Laptop fans are blowing like crazy. This time inxi shows that it is using nvidia driver:

~      inxi -Fx                                                                                                                                                                                                                         
System:    Host: nasus Kernel: 4.17.3-041703-generic x86_64 bits: 64 gcc: 7.3.0
           Desktop: Gnome 3.28.2 (Gtk 3.22.30-1ubuntu1) Distro: Ubuntu 18.04 LTS
Machine:   Device: laptop System: ASUSTeK product: UX550VE v: 1.0 serial: N/A
           Mobo: ASUSTeK model: UX550VE v: 1.0 serial: N/A
           UEFI: American Megatrends v: UX550VE.304 date: 11/08/2017
Battery    BAT0: charge: 62.6 Wh 97.8% condition: 64.0/73.8 Wh (87%) model: ASUSTeK ASUS status: Discharging
CPU:       Quad core Intel Core i5-7300HQ (-MCP-) arch: Skylake rev.9 cache: 6144 KB
           flags: (lm nx sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx) bmips: 19968
           clock speeds: max: 3500 MHz 1: 3247 MHz 2: 3248 MHz 3: 3209 MHz 4: 3271 MHz
Graphics:  Card-1: Intel Device 591b bus-ID: 00:02.0
           Card-2: NVIDIA GP107M [GeForce GTX 1050 Ti Mobile] bus-ID: 01:00.0
           Display Server: x11 (X.Org 1.19.6 ) drivers: modesetting,nvidia (unloaded: fbdev,vesa,nouveau)
           Resolution: 1920x1080@60.01hz, 1920x1080@60.00hz
           OpenGL: renderer: GeForce GTX 1050 Ti/PCIe/SSE2 version: 4.6.0 NVIDIA 396.24.02 Direct Render: Yes
Network:   Card: Intel Wireless 8265 / 8275 driver: iwlwifi bus-ID: 03:00.0
           IF: wlp3s0 state: up mac: f4:96:34:e7:55:b8
Drives:    HDD Total Size: 512.1GB (3.2% used)
           ID-1: /dev/sda model: Micron_1100_MTFD size: 512.1GB temp: 37C
Partition: ID-1: / size: 234G used: 16G (7%) fs: ext4 dev: /dev/sda5
RAID:      No RAID devices: /proc/mdstat, md_mod kernel module present
Sensors:   System Temperatures: cpu: 69.0C mobo: N/A gpu: 0.0:63C
           Fan Speeds (in rpm): cpu: N/A
Info:      Processes: 282 Uptime: 47 min Memory: 2280.7/15926.0MB Init: systemd runlevel: 5 Gcc sys: 7.3.0
           Client: Shell (zsh 5.4.2) inxi: 2.3.56

Where all the previous attempts had fbdev as a graphics driver (I’ve disabled nouveau to make it work). The logs are in nvidia-bug-report-noapic.log.gz

nvidia-bug-report-nosdcard.log.gz (75.3 KB)
nvidia-bug-report-noapic.log.gz (124 KB)

Disabling the sdcard reader worked, one log spam less.
Regarding the thunderbolt oopses, try to disable fwupd, run
sudo systemctl disable fwupd
Of course, that doesn’t help with the main problem. Remove the noacpi… kernel parameters, instead try
acpi_osi=! acpi_osi=“Windows 2009”
Revert back to kernel 4.15 and try driver version 384.130

Thunderbolt errors seem to disappear after disabling fwupd.

I couldn’t avoid errors while manually (PPAs didn’t work) installing 384.130 drivers so I’m not entirely sure if they got installed properly but it booted just fine on Wayland. No glitches, no crashes, inxi reported that the driver is being used, but I can’t run nvidia-settings due to an error:

ERROR: Unable to load info from any available system

Logs in nvidia-bug-report-384.130.log.gz

Then I’ve disabled Wayland which resulted in X “bootloop”. When X started it was alternating black screen and boot splash screen (didn’t go to desktop at all).

Logs in nvidia-bug-report-384.130-broke-after-disabling-wayland.log.gz
nvidia-bug-report-384.130.log.gz (104 KB)
nvidia-bug-report-384.130-broke-after-disabling-wayland.log.gz (109 KB)

The .run installer doesn’t really fit on ubuntu especially on optimus it might break your system, so it is not advised to use it. I suspect the GL/GLX libs/links now are broken.
OTOH though, the available drivers aren’t working anyway. So maybe do some tests with the driver that’s installed now.
Disable gdm first so it doesn’try to start it on boot so yould have to wait 90sec for systemd to give up:
sudo systemctl disable display-manager
create a file in your home dir
~/.xinitrc

xrandr --setprovideroutputsource modesetting NVIDIA-0
xrandr --auto
exec xterm

after reboot, login on console, then use
startx
to start a xserver and post the error message or the resulting xorg log.

I’m not very attached to Ubuntu as I just thought it will have the best hardware detection and will work ootb but oh well. May as well go with Arch if it would be less painful.

xrandr --setprovideroutputsource modesetting NVIDIA-0 results with

Could not find provider with name NVIDIA-0

Xorg.0.log attached.
Xorg.0.log.gz (8.04 KB)

GLX loads fine but it doesn’t find the DDX, should be easy to fix. Run
find /usr -name “nvidia_drv.so”
to find it, note down the path and create an xorg.conf containing

Section "Files"
  ModulePath "/usr/lib/xorg/modules, foundPath"
EndSection

replace foundPath with the found path, of course.

The only path was /usr/lib/xorg/modules/drivers/nvidia_drv.so and Xorg picked it up but I can’t see that anything changed. :/
Xorg-2.0.log.gz (7.74 KB)

Of course just the path without the driver.
Try using
sudo modprobe nvidia-drm
first. Alternatively, use this as xorg.conf:

Section "ServerLayout"
    Identifier     "layout"
    Screen      0  "nvidia" 0 0
    Inactive       "intel"
EndSection

Section "Files"
    ModulePath "/usr/lib/xorg/modules, /usr/lib/xorg/modules/drivers"
EndSection

Section "Device"
    Identifier     "intel"
    Driver         "modesetting"
    Option         "AccelMethod" "none"
    BusID          "PCI:0:2:0"
EndSection

Section "Device"
    Identifier     "nvidia"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:1:0:0"
    Option         "AllowEmptyInitialConfiguration"
EndSection

Section "Screen"
    Identifier     "nvidia"
    Device         "nvidia"
EndSection

Loading nvidia-drm gives

modprobe: ERROR: ../libkmod/libkmod-module.c:832 kmod_module_insert_module() could not find module by name='off'
modprobe: ERROR: could not insert 'off': Unknown symbol in module, or unknown parameter (see dmesg)

and using the xorg.conf yields “No screens found” error. Dmesg is silent.

Probably the modules compilation failed during manual install? I’ve managed to install 384.130 on Mint LiveUSB (unlike on Ubuntu) but I didn’t want to wipe Ubuntu to try it out before you give up here.

No, ubuntu just seems have an alias set to turn off the nvidia driver.
run
grep nvidia /etc/modprobe.d
and look for something like
alias nvidia-drm off
and
alias nvidia-modprobe off
and remove that file.

should have read
grep nvidia /etc/modprobe.d/*

But if you have 384.130 on mint, try that. It’s just for finding a driver that actually works and if not, look for a hardware issue.

That’s embarrassing. Those modules were indeed blacklisted and aliased.

So the driver is running fine. I’m having those random pixels flickering here and there. They don’t end up on a screenshot so here’s a vid: https://youtu.be/lDxrZFRjtT4 . Can this be the most I’ll get from the setup or do you have anything else up your sleeves?

TBH, this looks more like a hardware issue, maybe defective vmem which would also explain the driver crashing before. You could use this to check:
https://github.com/ihaque/memtestG80
requires installing cuda, though.

Even if it works flawlessly on Windows? The laptop has had recently motherboard replaced.

Final error count after 100 iterations over 2048 MiB of GPU memory: 0 errors

I have a similar issue but with my desktop using a GTX 970.

Could you try booting into 18.04 using the “nomodeset” parameters?

Also… Try installing 16.04, I know it’s an older version but this version works fine for me you’ll need to use the “nomodeset” parameter again for 16.04 until you can get to the desktop to install graphics drivers.

I don’t know what it is in 18.04 but it may be just a case of waiting out until whatever is wrong is fixed, if it’s fixed.