Xorg fails to start, and system panics/shuts down with critical temperature reached error message

So this is somewhat related to my earlier post, except this time its pure Nvidia, no optimus, and no intel gpu in the mix. BIOS is set to nvidia only.

When I start up, X tries to start up and fails, and my laptop starts to get really warm, so warm in fact I get the following message, and then the laptop shuts off.

Sep 20 23:16:19 balsa kernel: [   51.853610] thermal thermal_zone0: critical temperature reached(128 C),shutting down

When X tries to start, but fails, dmesg contains the following:

Sep 20 23:15:33 balsa kernel: [    5.054737] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  325.15  Wed Jul 31 18:50:56 PDT 2013
Sep 20 23:15:51 balsa kernel: [   23.425151] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
Sep 20 23:15:51 balsa kernel: [   23.425187] NVRM: os_pci_init_handle: invalid context!
Sep 20 23:15:51 balsa kernel: [   23.425191] NVRM: os_pci_init_handle: invalid context!
Sep 20 23:15:51 balsa kernel: [   23.425198] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
Sep 20 23:15:51 balsa kernel: [   23.425205] NVRM: os_pci_init_handle: invalid context!
Sep 20 23:15:51 balsa kernel: [   23.425207] NVRM: os_pci_init_handle: invalid context!
Sep 20 23:15:51 balsa kernel: [   23.462018] NVRM: RmInitAdapter failed! (0x25:0x28:1157)
Sep 20 23:15:51 balsa kernel: [   23.462028] NVRM: rm_init_adapter(0) failed

So yeah. something is seriously broken. It was working fine with an older kernel and driver. I’m not sure which ones at the moment, there are a lot of combinations to try.

Sys details:
Lenovo Thinkpad W530
Distro: Debian jessie/sid
GPU: Quadro K2000M
Kernel: linux 3.10.11-1
Xorg: 1.12.4-6.2+b3
Nvidia Drv: 325.15-2

In my other thread, with optimus enabled, my laptop also gets quite warm, and doesn’t cool down, but it doesn’t quite get to the point of shutting down.
nvidia-bug-report.log.gz (44.2 KB)

Ok, tried booting with kernel 3.9.8-1 from debian, and things are working decently. fan might be spun up a bit more than normal, but at least my laptop doesn’t overheat and die.

It seems the newer 325 drivers have fixed the virtual terminals not being accessible after X starts, which is nice.

Just tried kernel 3.11 from experimental and its similar to 3.10, except it doesn’t overheat and die. same error messages from the nvidia driver, though theres some new acpi warnings:

[   22.212599] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)

nvidia bug report attached.
nvidia-bug-report.log.gz (44.8 KB)

Just wondering if anyone knows what might be up.

Is this also happened with kernel 3.11.1 ?
kernel reference :
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.11.1-saucy/
https://www.kernel.org/pub/linux/kernel/v3.x/linux-3.11.1.tar.xz

It happens with 3.11.0, I can try 3.11.1 if you think it’ll help. Actually, 3.11 was worse than 3.10. iirc.

Some interesting changes:

I’ve updated to Xorg 1.14.

Kernel 3.9.8: things seem to work, and temps are decently low here.
Kernel 3.10.7: GPU Fell off the bus error in discrete mode or optimus mode.
Kernel 3.11.0: temp error. no chance to rebuild the nv dkms drivers.

I’ve just tried a vanilla 3.11.1, without trying optimus at all, and I get similar issues. No video, no text consoles, and the same errors.

[   23.872998] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
[   23.899318] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
[   23.899327] NVRM: os_pci_init_handle: invalid context!
[   23.899329] NVRM: os_pci_init_handle: invalid context!
[   23.899334] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
[   23.899339] NVRM: os_pci_init_handle: invalid context!
[   23.899340] NVRM: os_pci_init_handle: invalid context!
[   23.923610] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
[   23.923944] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
[   23.924255] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
[   23.924561] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
[   23.924866] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
[   23.925170] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
[   23.925474] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
[   23.925778] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
[   23.926082] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
[   23.926386] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
[   23.926689] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
[   23.926993] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
[   23.927297] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
[   23.927688] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
[   23.927994] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
[   23.928298] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
[   23.928602] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
[   23.928907] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
[   23.929211] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
[   23.929514] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
[   23.929828] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
[   23.930132] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
[   23.930436] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
[   23.935488] NVRM: RmInitAdapter failed! (0x25:0x28:1157)
[   23.935498] NVRM: rm_init_adapter(0) failed

I’m attaching the bug report log.
[This file was removed because it was flagged as potentially malicious] (43.9 KB)

Anything I can try? I’d really like to solve this issue, and then start using optimus. rather excited about that if it can extend battery life and reduce temperatures.

Filed bug 1380016 to track this issue.

Thanks.

I eagerly await a fix :)