Upgrade to 381.09, after suspend/hib. on resume, GPU fan STOPS WORKING!!

From post:
https://devtalk.nvidia.com/default/topic/990898/linux/suspend-corrupts-window-manager-after-upgrade-to-378-09/

Decided to create new topic although related to the one above albeit new driver version + new bug.

Installed 381.09, seems fine after suspend/hib. resume on graphics/windows corruption issue but GPU fan STOPS WORKING!!
Only GPU throttling but NO FAN KICKING IN

Restart X won’t fix it, only reboot.
Now it’s getting dangerous…

HW: Clevo N150RD i7-6700HQ 16Gb Nvidia GTX960M
Distributor ID: LinuxMint
Description: Linux Mint 18.1 Serena
Release: 18.1
Codename: serena
4.4.0-70-generic #91-Ubuntu SMP Wed Mar 22 12:47:43 UTC 2017 GNU/Linux

UPDATE:
Rolled back to 375.39 (Current long-lived branch release: 375.39), graph/window corruption is back after resume form susp/hib AND fan also does not kick in after resume.
Not a HW issue as I tested with w10 and all works ok after and before susp/hib.
nvidia-bug-report.log.gz (166 KB)

Hi KcM,
I think you can try upgrading your kernel too. Does your system have latest SBIOS? Is the possible for you to upgrade SBIOS to latest one available? Is the issue hit for both sleep and hibernate of system? Before sleep/hibernate when you start system does fan working fine?

Only GPU throttling but NO FAN KICKING IN
How did you confirm that? Are you using any command?

Hi Sandip,

Kernel upgrade: preferably not as I want to remain on LTS branch.

SBIOS: yes latest available.

Temps/throttling: I see temps rising dramatically either through the Optimus app/Thermal settings and/or nvidia sensors applet. Throttling only on Optimus.

FAN: By physical inspection as the driver/nvidia apps does not provide this info on linux (?!)

“Is the issue hit for both sleep and hibernate of system?”

  • Yes, regardless of which.

“Before sleep/hibernate when you start system does fan working fine?”

  • Yes, this only happens after resume from either state.

For the record, plenty of people who always used the latest stable Linux kernel opted to switch to 4.9 LTS branch and just stick to LTS kernels from now on.
This is especially more convenient when using out of tree kernel modules.

That means certain machines may run the 4.9 kernel branch till the end of 2018 but still use the latest NVIDIA driver.

Hi HussamT,

Unfortunately on mint18/ubuntu 16.04, only 4.4.x is available as LTS in distro repo’s.

Other options are 4.8.x and 4.10.x with or without hwe: hardware enablement stack, and these are not LTS.

As I understand hwe is quite a commitment as also xserver-xorg-core is another version for these hwe kernels.

So are you saying that it’s a kernel issue?
How can I verify this? What logging could back this up?

KcM, I’m not suggesting it is a kernel issue (4.9 is pretty solid).
However, you can try installing a secondary 4.10 kernel if you can obtain one and check if booting that kernel helps. It is always possible NVIDIA accidentally changed something that fits new kernels better and broke older kernels, in which case this would be an NVIDIA bug.

As for logging, you can try adding Loglevel=7 kennel parameter. Type ‘dmesg’ after hitting the issue and note for any odd output that NVIDIA developers may find interesting.

I’ll try hibernating my PC and checking if the fans continue working.

Edit: I just tried this myself. I am also running linux kernel 4.9 and beta nvidia driver. Fans continue to work after resuming so it looks as though this is not a kernel bug.
Perhaps nvidia issue then?
Also sorry for posting in your thread. I am running the same kernel so I was worried.

The reason Sandip asked about the SBIOS and kernel is that laptops typically have whole-system integrated coolers, which our outside of the control of the nvidia driver. If nvidia-settings doesn’t show any fan info in the thermal settings page, it’s because there aren’t any driver-controlled coolers in the system.

I tried different kernel (4.9 vs 4.4LTS) and fan now operates after resume from sleep/hib.

This was due to fan speed control being done by motherboard and not nvidia driver. I’m on a laptop.

GFX corruption on my end is solved with 370 or 381 nvidia drivers.

Thanks to all for your input!