Dell Precision 7510: 384.59 + Fedora 26 v. 4.12.5-300 + Optimus + bumblebee = fallen off the bus

The system is Dell Precision 7510:

Base Board Information
        Manufacturer: Dell Inc.
        Product Name: 0M91XC
        Version: A00
System Information
        Manufacturer: Dell Inc.
        Product Name: Precision 7510
BIOS Information
        Vendor: Dell Inc.
        Version: 1.13.6
        Release Date: 06/21/2017

Kernel

BOOT_IMAGE=/vmlinuz-4.11.11-300.fc26.x86_64 root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap rd.driver.blacklist=nouveau rd.driver.blacklist=psmouse i915.enable_guc_loading=1 i915.enable_guc_submission=1 intel_pstate=skylake_hwp i915.enable_psr=1 i915.disable_power_well=0 nouveau.modeset=0 rd.driver.blacklist=nouveau rhgb quiet LANG=en_US.UTF-8

Linux <snip> 4.11.11-300.fc26.x86_64 #1 SMP Mon Jul 17 16:32:11 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

LSPCI

00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers (rev 07)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) (rev 07)
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 (rev 06)
00:04.0 Signal processing controller: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem (rev 07)
00:14.0 USB controller: Intel Corporation Sunrise Point-H USB 3.0 xHCI Controller (rev 31)
00:14.2 Signal processing controller: Intel Corporation Sunrise Point-H Thermal subsystem (rev 31)
00:16.0 Communication controller: Intel Corporation Sunrise Point-H CSME HECI #1 (rev 31)
00:17.0 SATA controller: Intel Corporation Sunrise Point-H SATA controller [AHCI mode] (rev 31)
00:1c.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #2 (rev f1)
00:1c.2 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #3 (rev f1)
00:1c.4 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #5 (rev f1)
00:1d.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #9 (rev f1)
00:1f.0 ISA bridge: Intel Corporation Sunrise Point-H LPC Controller (rev 31)
00:1f.2 Memory controller: Intel Corporation Sunrise Point-H PMC (rev 31)
00:1f.3 Audio device: Intel Corporation Sunrise Point-H HD Audio (rev 31)
00:1f.4 SMBus: Intel Corporation Sunrise Point-H SMBus (rev 31)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-LM (rev 31)
01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M2000M] (rev a2)
02:00.0 Network controller: Intel Corporation Wireless 8260 (rev 3a)
03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader (rev 01)
3d:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM951/PM951 (rev 01)
b

Driver compiles normally but fails to start

[621897.233278] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[621897.233838] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=none,decodes=none:owns=none
[621897.233901] NVRM: The NVIDIA GPU 0000:01:00.0
                NVRM: (PCI ID: 10de:13b0) installed in this system has
                NVRM: fallen off the bus and is not responding to commands.
[621897.233907] nvidia: probe of 0000:01:00.0 failed with error -1
[621897.233924] NVRM: The NVIDIA probe routine failed for 1 device(s).
[621897.233924] NVRM: None of the NVIDIA graphics adapters were initialized!
[621897.234065] nvidia-nvlink: Unregistered the Nvlink Core, major device number 237

Though the symptoms are different, maybe an echo of this bug:
https://devtalk.nvidia.com/default/topic/971733/?comment=5190452
i.e. does kernel parameter pcie_port_pm=off help?

Symptoms are different that’s why I haven’t tried it.

Cross-referencing here: https://github.com/Bumblebee-Project/Bumblebee/issues/905

This workaround helps, but it was supposed to be fixed.

[    5.427561] bbswitch: loading out-of-tree module taints kernel.
[    5.427581] bbswitch: module verification failed: signature and/or required key missing - tainting kernel
[    5.427769] bbswitch: version 0.8
[    5.427773] bbswitch: Found integrated VGA device 0000:00:02.0: \_SB_.PCI0.GFX0
[    5.427778] bbswitch: Found discrete VGA device 0000:01:00.0: \_SB_.PCI0.PEG0.PEGP
[    5.427787] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20170303/nsarguments-95)
[    5.427896] bbswitch: detected an Optimus _DSM function
[    5.427904] pci 0000:01:00.0: enabling device (0000 -> 0003)
[    5.427981] bbswitch: Succesfully loaded. Discrete card 0000:01:00.0 is on

[   41.770230] VFIO - User Level meta-driver version: 0.3
[   41.783195] nvidia: module license 'NVIDIA' taints kernel.
[   41.783197] Disabling lock debugging due to kernel taint
[   41.790717] nvidia-nvlink: Nvlink Core is being initialized, major device number 236
[   41.790942] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[   41.791022] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  384.59  Wed Jul 19 23:53:34 PDT 2017 (using threaded interrupts)
[   41.801975] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 235
[   41.804071] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  384.59  Wed Jul 19 23:46:42 PDT 2017
[   41.807648] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[   41.807650] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 1
[   41.810505] [drm] [nvidia-drm] [GPU ID 0x00000100] Unloading driver
[   41.846363] nvidia-modeset: Unloading
[   41.863872] nvidia-uvm: Unloaded the UVM driver in 8 mode
[   41.881347] nvidia-nvlink: Unregistered the Nvlink Core, major device number 236
[   48.526932] nvidia-nvlink: Nvlink Core is being initialized, major device number 236
[   48.527193] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=none,decodes=none:owns=none
[   48.527303] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  384.59  Wed Jul 19 23:53:34 PDT 2017 (using threaded interrupts)
[   48.528679] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  384.59  Wed Jul 19 23:46:42 PDT 2017
[   48.531404] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[   48.531406] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 1
[   48.667997] [drm] [nvidia-drm] [GPU ID 0x00000100] Unloading driver
[   48.707118] nvidia-modeset: Unloading
[   48.731003] nvidia-nvlink: Unregistered the Nvlink Core, major device number 236
[   48.754811] bbswitch: disabling discrete graphics
[   48.754821] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20170303/nsarguments-95)
[   48.766844] pci_raw_set_power_state: 2 callbacks suppressed
[   48.766846] pci 0000:01:00.0: Refused to change power state, currently in D0
[  110.068087] bbswitch: enabling discrete graphics
[  110.420641] nvidia-nvlink: Nvlink Core is being initialized, major device number 236
[  110.420847] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=none,decodes=none:owns=none
[  110.420926] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  384.59  Wed Jul 19 23:53:34 PDT 2017 (using threaded interrupts)
[  110.444476] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20170303/nsarguments-95)
[  110.444537] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20170303/nsarguments-95)
[  110.444577] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20170303/nsarguments-95)
[  110.444702] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20170303/nsarguments-95)
[  110.444738] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20170303/nsarguments-95)
[  110.444784] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20170303/nsarguments-95)
[  110.444821] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20170303/nsarguments-95)
[  111.329735] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20170303/nsarguments-95)
[  111.329978] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20170303/nsarguments-95)
[  111.345069] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20170303/nsarguments-95)
[  111.745662] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  384.59  Wed Jul 19 23:46:42 PDT 2017
[  131.914080] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20170303/nsarguments-95)
[  131.920250] nvidia-modeset: Unloading
[  131.938087] nvidia-nvlink: Unregistered the Nvlink Core, major device number 236
[  131.958784] bbswitch: disabling discrete graphics
[  131.958792] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20170303/nsarguments-95)
[  131.970801] pci 0000:01:00.0: Refused to change power state, currently in D0
[  131.970801] pci 0000:01:00.0: Refused to change power state, currently in D0

Doesn’t look good. Instead of not being able to power on the gpu, bbswitch now failes to turn it off resulting in higher power consumption. Telling by the link you provided, Redhat seems to have messed with pcie power management in that kernel version. Did anyone report a bug with them?

I presume this is the result of pcie_port_pm=off.