Complete freeze with nvidia-prime

frei · February 16, 2017, 8:26pm

run a xps 15, i7700HQ, 9560 with Ubuntu 16.10 and “apt-get install nvidia-378”.
After an update of nvidia-378 on 2017-02-14 X did stop working. After debugging a while without success I reverted with “apt-get purge nvidia-378; apt-get install nvidia-375; reboot” and it works again.

rabinnh · February 16, 2017, 9:29pm

Some dumb questions:

Do you have DKMS installed?
Did you reboot before starting X again?

I have a 9560 and it’s running 378 on 16.04. just fine.

htrex · February 16, 2017, 10:31pm

Here are my → http://htx.webfactional.com/nvidia-logs.zip
cat /var/log/gpu-manager.log
cat /var/log/Xorg.0.log
lspci -v
cat /proc/acpi/bbswitch
cat /usr/lib/nvidia-378-prime/ld.so.conf
lsmod
dmesg

I’ve dropped the zip file on a personal space as I don’t seem to have an option to attach files here. [edit->] and also attached here.

This are snapshots after a normal boot on the Intel GPU where I’ve moved to my home the file /lib/systemd/system/nvidia-persistenced.service

The problem I’m seeing is with shutdowns or reboots. I’m apparently logged out from X but the laptop doesn’t turn off by itself. I’ve a blackscreen with filesystem status:

/dev/nvme0n1p7: recovering journal
/dev/nvme0n1p7: clean 500287/21094400 files, 21493115/84362496 blocks

That seems a message left from the last boot, I must always force the laptop to turn off with the power button so the filesystem is forcibly closed and the journal recovered.
There I can’t get a terminal with CTRL+ALT+F1, system is freezed.

nvidia-logs.zip (31.9 KB)

htrex · February 17, 2017, 9:27am

Downgraded to nvidia-375, same problem, while on intel GPU I must always force the laptop to turn off with the power button as it freezes.

generix · February 17, 2017, 9:46am

Thanks for the logs.
You had a problem added when upgraded to 378.13 (from dmesg):

[    5.590376] NVRM: API mismatch: the client has the version 378.13, but
               NVRM: this kernel module has the version 378.09.  Please
               NVRM: make sure that this kernel module and all NVIDIA driver
               NVRM: components have the same version.

The kernel modules didn’t get updated. But that’s another thing. More of a question is, why is a client connecting at that time. Must be the xserver but then it’s starting too early. Nvidia modules get unloaded after that. And bbswitch fails to turn off the nvidia gpu.
cat /proc/acpi/bbswitch
gave you
0000:01:00.0 ON
while it should be OFF
Can you try to turn it off then,
echo OFF > /proc/acpi/bbswitch
and then check again with
cat /proc/acpi/bbswitch
If that works, try to turn it on again.
If turning off/on doesn’t work, this might be either a problem with bbswitch or ACPI of your computer.
Ever tried to suspend/resume when on nvidia?

generix · February 17, 2017, 9:58am

Forgot: after every nvidia driver install/upgrade/downgrade you will have to remove the file /lib/systemd/system/nvidia-persistenced.service again. Having the persistence daemon started on module load will make debugging problems harder.

htrex · February 17, 2017, 10:44am

I’m trying with nvidia-375 and removed the /lib/systemd/system/nvidia-persistenced.service again.

htrex@OrionXPS:~$ cat /proc/acpi/bbswitch
0000:01:00.0 ON
htrex@OrionXPS:~$ sudo echo OFF > /proc/acpi/bbswitch
bash: /proc/acpi/bbswitch: Permission denied

edit: that’s on nvidia profile

generix · February 17, 2017, 11:12am

That failes because of the ‘>’. Open a root shell first:
sudo -s
then try turning it off and on again. All this of course while on intel.

htrex · February 18, 2017, 9:18am

While on Intel, nvidia-375 drivers, /lib/systemd/system/nvidia-persistenced.service removed

htrex@OrionXPS:~$ sudo -s
[sudo] password for htrex:
root@OrionXPS:~# echo OFF > /proc/acpi/bbswitch
root@OrionXPS:~# cat /proc/acpi/bbswitch
0000:01:00.0 ON
root@OrionXPS:~# echo ON > /proc/acpi/bbswitch
root@OrionXPS:~# cat /proc/acpi/bbswitch
0000:01:00.0 ON

generix · February 18, 2017, 1:23pm

There seems to be the problem. Looks like the gpu enters some undefined state either by using bbswitch to turn it off or by unloading the nvidia modules. So when you shutdown the kernel hangs because it can’t power off the gpu.
Three (likely) possible bugs:

bug in bbswitch
bug in acpi
bug in nvidia driver on unload
To rule out the third possibility, please
switch to nvidia using prime-select nvidia
disable displaymanager using systemctl disable display-manager
(don’t know if 16.04/10 uses display-manager or lightdm as the service)
reboot
After reboot, you should be on text console
there unload nvidia drivers
rmmod nvidia-uvm
rmmod nvidia-drm
rmmod nvidia-modeset
rmmod nvidia
(Edit: make sure, the driver is unloaded: lsmod |grep nvidia )
Then reboot using systemctl reboot
If it hangs then, this is a bug in the driver
If it reboots cleanly, bug in bbswitch or acpi.
You can get your desktop back using systemctl enable display-manager

zibri1 · December 13, 2017, 6:45pm

I have the same problem.

Notebook ASUS ROG GL703VD

This notebook has intel/nvidia combo (both active).
Without nouveau and nvidia drivers everything works.
If I install nouveau (which was the default) or any nvidia driver so far, the notebook freezes on shutdown.
I agree that there might be a timing problem because this notebook is very fast… it takes a few SECONDS (like less than 10) to boot (from SSD and a default ubuntu 17.10 installation).

No solutions since February??

zibri1 · December 13, 2017, 7:38pm

The problem seems related to NVIDIA driver unloading.
If I do prime-select intel
then reboot (it hangs because nvidia was selected).
Force poweroff with power button, then boot… then the system works and shut downs correctly…
If nvidia is selected (or nouveau) linux hangs at shutdown. no error.
My notebook has the latest bios by the way.
I didn’t try with other bioses because there is only one on asus website.

generix · December 13, 2017, 11:49pm

Try kernel parameter
acpi_osi=! acpi=“Windows 2009”
Report back with nvidia-bug-report.sh run and output attached.

zibri1 · December 14, 2017, 9:53am

The setting does not change the shutdown lockup but I found that the problem appears only with GDM3 (which is installed by default)
Everything works fine with lightdm.

thelambeers · May 18, 2018, 8:02am

I’m curious if anyone here runs Arch Linux, or better yet if the file /usr/lib/xorg/modules/input/mouse_drv.so exists on their system.

I came across something I never would’ve found otherwise by running bumblebee in the foreground with debugging enabled:

[ 3450.473647] [DEBUG][XORG] (II) Using input driver 'mouse' for '<default pointer>'
[ 3450.473652] [DEBUG][XORG] (**) Option "CorePointer" "on"
[ 3450.473657] [DEBUG][XORG] (**) <default pointer>: always reports core events
[ 3450.473663] [DEBUG][XORG] /usr/bin/X: symbol lookup error: /usr/lib/xorg/modules/input/mouse_drv.so: undefined symbol: xf86GetOS
[ 3450.484047] [DEBUG]Process with PID 1753 returned code 127
[ 3450.484084] [ERROR]X did not start properly
[ 3450.484202] [DEBUG]Socket closed.
^C[ 3454.082978] [WARN]Received Interrupt signal.
[ 3454.083027] [DEBUG]Socket closed.
[ 3454.083439] [DEBUG]Killing all remaining processes.

I know for me I got the impression my laptop had been locking up whenever I would try to run optirun but in all actuality it was due to the bumblebee service spamming my system logs using it’s default flag of --use-syslog:

May 18 03:12:36 c1-linuxdev bumblebeed[18682]: X did not start properly
May 18 03:12:36 c1-linuxdev bumblebeed[18682]: [XORG] (WW) NVIDIA(0): Unable to get display device for DPI computation.
May 18 03:12:36 c1-linuxdev bumblebeed[18682]: X did not start properly
May 18 03:12:37 c1-linuxdev bumblebeed[18682]: [XORG] (WW) NVIDIA(0): Unable to get display device for DPI computation.
May 18 03:12:37 c1-linuxdev bumblebeed[18682]: X did not start properly
May 18 03:12:37 c1-linuxdev bumblebeed[18682]: [XORG] (WW) NVIDIA(0): Unable to get display device for DPI computation.
May 18 03:12:37 c1-linuxdev bumblebeed[18682]: X did not start properly
May 18 03:12:37 c1-linuxdev bumblebeed[18682]: [XORG] (WW) NVIDIA(0): Unable to get display device for DPI computation.
May 18 03:12:37 c1-linuxdev bumblebeed[18682]: X did not start properly
May 18 03:12:38 c1-linuxdev bumblebeed[18682]: [XORG] (WW) NVIDIA(0): Unable to get display device for DPI computation.
May 18 03:12:38 c1-linuxdev bumblebeed[18682]: X did not start properly
May 18 03:12:38 c1-linuxdev bumblebeed[18682]: [XORG] (WW) NVIDIA(0): Unable to get display device for DPI computation.
May 18 03:12:38 c1-linuxdev bumblebeed[18682]: X did not start properly
May 18 03:12:38 c1-linuxdev bumblebeed[18682]: [XORG] (WW) NVIDIA(0): Unable to get display device for DPI computation.
May 18 03:12:38 c1-linuxdev bumblebeed[18682]: X did not start properly
May 18 03:12:39 c1-linuxdev bumblebeed[18682]: [XORG] (WW) NVIDIA(0): Unable to get display device for DPI computation.
May 18 03:12:39 c1-linuxdev bumblebeed[18682]: X did not start properly
May 18 03:12:39 c1-linuxdev bumblebeed[18682]: [XORG] (WW) NVIDIA(0): Unable to get display device for DPI computation.
May 18 03:12:39 c1-linuxdev bumblebeed[18682]: X did not start properly
May 18 03:12:39 c1-linuxdev bumblebeed[18682]: [XORG] (WW) NVIDIA(0): Unable to get display device for DPI computation.
May 18 03:12:39 c1-linuxdev bumblebeed[18682]: X did not start properly
May 18 03:12:40 c1-linuxdev bumblebeed[18682]: [XORG] (WW) NVIDIA(0): Unable to get display device for DPI computation.
May 18 03:12:40 c1-linuxdev bumblebeed[18682]: X did not start properly
May 18 03:12:40 c1-linuxdev bumblebeed[18682]: [XORG] (WW) NVIDIA(0): Unable to get display device for DPI computation.
May 18 03:12:40 c1-linuxdev bumblebeed[18682]: X did not start properly
May 18 03:12:40 c1-linuxdev bumblebeed[18682]: [XORG] (WW) NVIDIA(0): Unable to get display device for DPI computation.
May 18 03:12:40 c1-linuxdev bumblebeed[18682]: X did not start properly
May 18 03:12:41 c1-linuxdev bumblebeed[18682]: [XORG] (WW) NVIDIA(0): Unable to get display device for DPI computation.
May 18 03:12:41 c1-linuxdev bumblebeed[18682]: X did not start properly
May 18 03:12:41 c1-linuxdev bumblebeed[18682]: [XORG] (WW) NVIDIA(0): Unable to get display device for DPI computation.
May 18 03:12:41 c1-linuxdev bumblebeed[18682]: X did not start properly
May 18 03:12:41 c1-linuxdev bumblebeed[18682]: [XORG] (WW) NVIDIA(0): Unable to get display device for DPI computation.
May 18 03:12:41 c1-linuxdev bumblebeed[18682]: X did not start properly
May 18 03:12:42 c1-linuxdev bumblebeed[18682]: [XORG] (WW) NVIDIA(0): Unable to get display device for DPI computation.
May 18 03:12:42 c1-linuxdev bumblebeed[18682]: X did not start properly
May 18 03:12:42 c1-linuxdev bumblebeed[18682]: [XORG] (WW) NVIDIA(0): Unable to get display device for DPI computation.
May 18 03:12:42 c1-linuxdev bumblebeed[18682]: X did not start properly
May 18 03:12:42 c1-linuxdev bumblebeed[18682]: [XORG] (WW) NVIDIA(0): Unable to get display device for DPI computation.

It was a tricky one to catch since the service file gives the impression that it should be backing off every 60 seconds on failures but in actuality that’s only if the bumblebee daemon dies, not the Xorg binary it attempts to fork.

Anyway, removing the package that owned /usr/lib/xorg/modules/input/mouse_drv.so solved my issues – as a test perhaps you can temporarily move it out of the way to debug?

sudo mv -fv /usr/lib/xorg/modules/input/mouse_drv.so /usr/lib/xorg/modules/input/mouse_drv.so.bak

aplattner · May 18, 2018, 2:28pm

Hi zibri_,

Is there anything interesting from the failed shutdown in your system log after a reboot? Alternatively, if you can SSH into the system from a remote system and watch the output of “dmesg -w” while it tries to shut down, maybe it might catch something interesting.

I think generix is on the right track: if the problem reproduces with both nouveau and nvidia, then that’s it’s pretty likely to be a platform problem rather than a driver problem.

@thelambeers: The xf86GetOS function was removed in xserver 1.20, which Arch Linux just upgraded to recently. Whoever maintains the mouse_drv package needs to rebuild it against the new X server. That’s unlikely to be related to zibri_'s problem since xserver 1.20 just came out.

Topic		Replies	Views
Laptop freezes changing video cards Linux	11	2452	October 14, 2021
prime-select nvidia > log off doesn't work, but reboot does on 375, Quadro M1000M, Dell Precision 550 Linux	29	6886	June 1, 2017
Laptop freezes when starting X11 and discrete graphics are OFF - Alienware M15 R4 with RTX 2080 MaxQ Linux	3	1542	August 14, 2019
Failed to unload nvidia driver (MSI 6QE GS60 \| Nvidia 970M \| Debian 10) Linux	9	2187	October 12, 2021
Problem with resume from suspend (Ubuntu 16.04, GT 940MX) Linux	171	62485	August 2, 2021
Optimus on Ubuntu 18.04 is a step backwards ... but I found the first good solution Linux	103	77358	November 27, 2018
Black Screen on boot = display output freezing with Nvidia GeForce GTX 1650 driver nvidia, 460.73.01, 5.4.0-73-lowlatency on Linux Mint 20.1 Ulyssa Linux	0	893	May 16, 2021
Ubuntu 18.04 completely freezes after a few minutes of being booted Linux	25	18815	October 8, 2021
System seems locked while rebooting with Linux 5.2.1 and nvidia drivers 430.34 or 430.26 Linux	80	7511	November 11, 2019
Ubuntu 13.10, nvidia-prime, suspend/resume bug, touchpad bug Linux	31	22856	November 6, 2014

Complete freeze with nvidia-prime

Related topics