KMS not working with GT940M on Fedora 26

I am running Fedora 26 with the NVIDIA driver (version 384.59) from negativo17’s repository on a ThinkPad T560 equipped with Intel HD Graphics 520 + GeForce GT940M. In order to make PRIME synchronization work I’ve tried adding nvidia-drm.modeset=1 to the kernel command line. It boots successfully and I get to the login screen, however, X crashes after logging in.

dmesg shows the following errors:

[   98.670736] divide error: 0000 [#4] SMP
[   98.670789] Modules linked in: ccm xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_$
[   98.671536]  kvm_intel mei_wdt snd_hda_ext_core snd_soc_sst_match kvm snd_soc_core arc4 uvcvideo irqbypass intel_cstate intel_uncore btusb btrtl btbcm intel_rapl_perf btintel bluetooth videobuf2_vmal$
[   98.672250]  i2c_algo_bit drm_kms_helper ghash_clmulni_intel drm ptp serio_raw pps_core rtsx_pci wmi video
[   98.672349] CPU: 0 PID: 2408 Comm: Xorg Tainted: P     UD    OE   4.12.8-300.fc26.x86_64 #1
[   98.672423] Hardware name: LENOVO 20FH002RGE/20FH002RGE, BIOS N1KET21W (1.08 ) 04/20/2016
[   98.672495] task: ffff914462332640 task.stack: ffff9f2dc814c000
[   98.672556] RIP: 0010:nvidia_drm_dumb_create+0x70/0x1e0 [nvidia_drm]
[   98.672614] RSP: 0018:ffff9f2dc814fd18 EFLAGS: 00010246
[   98.672665] RAX: 0000000000000003 RBX: ffff914476ac8800 RCX: 0000000000001000
[   98.672745] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff91445ece9a00
[   98.672808] RBP: ffff9f2dc814fd70 R08: 0000000000000027 R09: 0000000000000001
[   98.672872] R10: ffff914476ac8800 R11: ffff9f2dc814fdd0 R12: ffff91445ece9a00
[   98.672935] R13: ffff9144734e0c00 R14: ffffffffc087a000 R15: ffff9f2dc814fdd0
[   98.673000] FS:  00007efe14c812c0(0000) GS:ffff914481400000(0000) knlGS:0000000000000000
[   98.673077] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   98.673150] CR2: 00007efe0b72e1f0 CR3: 000000021db1b000 CR4: 00000000003406f0
[   98.673214] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   98.673277] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   98.673342] Call Trace:
[   98.673396]  drm_mode_create_dumb_ioctl+0x82/0x90 [drm]
[   98.673463]  drm_ioctl+0x213/0x4d0 [drm]
[   98.673520]  ? __drm_printfn_debug+0x30/0x30 [drm]
[   98.673600]  do_vfs_ioctl+0xa5/0x600
[   98.673638]  SyS_ioctl+0x79/0x90
[   98.673676]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[   98.673720] RIP: 0033:0x7efe124fd5e7
[   98.673754] RSP: 002b:00007fff22c1bf38 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[   98.673823] RAX: ffffffffffffffda RBX: 00007efe0c18f030 RCX: 00007efe124fd5e7
[   98.673886] RDX: 00007fff22c1bf70 RSI: 00000000c02064b2 RDI: 000000000000000c
[   98.673950] RBP: 00007fff22c1beb0 R08: 0000000000e364b0 R09: 0000000000000001
[   98.674038] R10: 000000000000002f R11: 0000000000000246 R12: 0000000000e48890
[   98.674115] R13: 0000000000e487a0 R14: 00007fff22c1bed8 R15: 000000000083f698
[   98.674179] Code: 8b 75 48 48 c7 45 c0 00 00 00 00 48 c7 45 c8 00 00 00 00 48 c7 45 b0 00 00 00 00 83 c0 07 c1 e8 03 0f af 42 04 31 d2 8d 44 06 ff <f7> f6 0f af c6 41 89 47 14 41 0f af 07 48 8d b0 ff$
[   98.674423] RIP: nvidia_drm_dumb_create+0x70/0x1e0 [nvidia_drm] RSP: ffff9f2dc814fd18
[   98.708843] ---[ end trace c07d674181ee5c52 ]---
[  120.629963] [drm:nvidia_drm_gem_import_nvkms_memory [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000600] Failed to import NVKMS memory to GEM object

I’ve uploaded nvidia-bug-report.log here: https://drive.google.com/file/d/0B5nM6Ug4YVnKNWpRaUtncFZwTTA/view?usp=sharing

Try installing nvidia-modprobe or creating udev rule (udev method has selinux context issue) to create the missing device nodes when using modeset.

see

Indeed installing nvidia-modprobe fixed the problem and made KMS and PRIME Synchronization work. Somehow I have missed the GitHub issue while googling. But well, now the NVIDIA driver seems to ignore the PowerMizer settings I am using to throttle the GPU when running on battery.

I have set the following options in /etc/X11/xorg.conf.d/10-nvidia.conf:

Section "Device"
    Identifier "Device0"
    Option "RegistryDwords" "PowerMizerEnable=0x1; PerfLevelSrc=0x2222; PowerMizerLevel=0x3; PowerMizerDefault=0x3; PowerMizerDefaultAC=0x0"
EndSection

Section "OutputClass"
    Identifier "nvidia"
    MatchDriver "nvidia-drm"
    Driver "nvidia"
    Option "AllowEmptyInitialConfiguration"
    Option "PrimaryGPU" "yes"
    Option "SLI" "Auto"
    Option "BaseMosaic" "on"
    ModulePath "/usr/lib64/nvidia/xorg"
EndSection

Section "OutputClass"
    Identifier "intel"
    MatchDriver "i915"
    Driver "modesetting"
EndSection

The NVIDIA driver doesn’t seem to care and keeps switching to lower P-states when on battery although the above settings should force the GPU to P8. It works fine when removing nvidia-drm.modeset=1 from the command line though.

I am also having another weird issue where the NVIDIA driver won’t initialize correctly during boot sometimes.

From dmesg:

Aug 26 22:58:14 pp3345-Laptop kernel: irq 16: nobody cared (try booting with the "irqpoll" option)
Aug 26 22:58:14 pp3345-Laptop kernel: CPU: 2 PID: 0 Comm: swapper/2 Tainted: P     U     OE   4.12.8-300.fc26.x86_64 #1
Aug 26 22:58:14 pp3345-Laptop kernel: Hardware name: LENOVO 20FH002RGE/20FH002RGE, BIOS N1KET21W (1.08 ) 04/20/2016
Aug 26 22:58:14 pp3345-Laptop kernel: Call Trace:
Aug 26 22:58:14 pp3345-Laptop kernel:  <IRQ>
Aug 26 22:58:14 pp3345-Laptop kernel:  dump_stack+0x63/0x90
Aug 26 22:58:14 pp3345-Laptop kernel:  __report_bad_irq+0x35/0xc0
Aug 26 22:58:14 pp3345-Laptop kernel:  note_interrupt+0x24b/0x290
Aug 26 22:58:14 pp3345-Laptop kernel:  handle_irq_event_percpu+0x54/0x80
Aug 26 22:58:14 pp3345-Laptop kernel:  handle_irq_event+0x2c/0x50
Aug 26 22:58:14 pp3345-Laptop kernel:  handle_fasteoi_irq+0x86/0x140
Aug 26 22:58:14 pp3345-Laptop kernel:  handle_irq+0xa9/0x110
Aug 26 22:58:14 pp3345-Laptop kernel:  do_IRQ+0x46/0xd0
Aug 26 22:58:14 pp3345-Laptop kernel:  common_interrupt+0x93/0x93
Aug 26 22:58:14 pp3345-Laptop kernel: RIP: 0010:cpuidle_enter_state+0x12b/0x2d0
Aug 26 22:58:14 pp3345-Laptop kernel: RSP: 0018:ffffac21c0cebe58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffae
Aug 26 22:58:14 pp3345-Laptop kernel: RAX: ffff9d5f8151a280 RBX: 000000034bbf419e RCX: 000000000000001f
Aug 26 22:58:14 pp3345-Laptop kernel: RDX: 000000034bbf419e RSI: ffff9d5f81517a98 RDI: 0000000000000000
Aug 26 22:58:14 pp3345-Laptop kernel: RBP: ffffac21c0cebe98 R08: cccccccccccccccd R09: 0000000000000008
Aug 26 22:58:14 pp3345-Laptop kernel: R10: ffffac21c0cebe28 R11: 0000000000000002 R12: ffff9d5f81522920
Aug 26 22:58:14 pp3345-Laptop kernel: R13: 0000000000000000 R14: 0000000000000001 R15: ffffffffa7f81938
Aug 26 22:58:14 pp3345-Laptop kernel:  </IRQ>
Aug 26 22:58:14 pp3345-Laptop kernel:  ? cpuidle_enter_state+0x11b/0x2d0
Aug 26 22:58:14 pp3345-Laptop kernel:  cpuidle_enter+0x17/0x20
Aug 26 22:58:14 pp3345-Laptop kernel:  call_cpuidle+0x23/0x40
Aug 26 22:58:14 pp3345-Laptop kernel:  do_idle+0x18a/0x1e0
Aug 26 22:58:14 pp3345-Laptop kernel:  cpu_startup_entry+0x71/0x80
Aug 26 22:58:14 pp3345-Laptop kernel:  start_secondary+0x154/0x190
Aug 26 22:58:14 pp3345-Laptop kernel:  secondary_startup_64+0x9f/0x9f
Aug 26 22:58:14 pp3345-Laptop kernel: handlers:
Aug 26 22:58:14 pp3345-Laptop kernel: [<ffffffffc05700b0>] i801_isr [i2c_i801]
Aug 26 22:58:14 pp3345-Laptop kernel: Disabling IRQ #16
Aug 26 22:58:14 pp3345-Laptop kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 239
Aug 26 22:58:14 pp3345-Laptop kernel: nvidia 0000:06:00.0: enabling device (0006 -> 0007)
Aug 26 22:58:14 pp3345-Laptop kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  384.59  Wed Jul 19 23:53:34 PDT 2017 (using threaded interrupts)
Aug 26 22:58:14 pp3345-Laptop kernel: nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 238
Aug 26 22:58:14 pp3345-Laptop kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  384.59  Wed Jul 19 23:46:42 PDT 2017
Aug 26 22:58:14 pp3345-Laptop kernel: [drm] [nvidia-drm] [GPU ID 0x00000600] Loading driver
Aug 26 22:58:14 pp3345-Laptop kernel: ACPI Warning: \_SB.PCI0.RP09.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20170303/nsarguments-95)
Aug 26 22:58:14 pp3345-Laptop kernel: ACPI Warning: \_SB.PCI0.RP09.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20170303/nsarguments-95)
Aug 26 22:58:14 pp3345-Laptop kernel: ACPI Warning: \_SB.PCI0.RP09.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20170303/nsarguments-95)
Aug 26 22:58:14 pp3345-Laptop kernel: ACPI Warning: \_SB.PCI0.RP09.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20170303/nsarguments-95)
Aug 26 22:58:14 pp3345-Laptop kernel: ACPI Warning: \_SB.PCI0.RP09.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20170303/nsarguments-95)
Aug 26 22:58:14 pp3345-Laptop kernel: ACPI Warning: \_SB.PCI0.RP09.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20170303/nsarguments-95)
Aug 26 22:58:14 pp3345-Laptop kernel: ACPI Warning: \_SB.PCI0.RP09.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20170303/nsarguments-95)
Aug 26 22:58:14 pp3345-Laptop kernel: ACPI Warning: \_SB.PCI0.RP09.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20170303/nsarguments-95)
Aug 26 22:58:19 pp3345-Laptop kernel: ACPI Warning: \_SB.PCI0.RP09.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20170303/nsarguments-95)
Aug 26 22:58:19 pp3345-Laptop kernel: NVRM: RmInitAdapter failed! (0x12:0x45:1825)
Aug 26 22:58:19 pp3345-Laptop kernel: NVRM: rm_init_adapter failed for device bearing minor number 0
Aug 26 22:58:19 pp3345-Laptop kernel: [drm:nvidia_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000600] Failed to allocate NvKmsKapiDevice
Aug 26 22:58:19 pp3345-Laptop kernel: [drm:nvidia_drm_probe_devices [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000600] Failed to register device

It looks like a race condition to me as the unhandled IRQ seems to originate from the GPU, but can’t be handled since the driver isn’t yet loaded. A reboot will usually fix the issue, but it seems to happen every ~3rd or 4th boot, thus it’s a bit annoying.

Another issue I am experiencing with KMS enabled is that gnome-shell starts eating up my CPU as long as the display is off if the laptop was in standby at least once.

  1. Boot system
  2. Lock session (=> screen off)
  3. Everything is fine, every time
  4. Unlock
  5. Put device to standby
  6. Wake up, unlock
  7. Lock session
  8. gnome-shell hogs CPU at 100% usage - from now on this happens every time I lock the screen as long as the display is off (CPU consumption is reduced to normal as soon as screen is turned on, even when still on lockscreen)

This is 100% reproducible for me, but only happens when using the NVIDIA driver and only with nvidia-drm.modeset=1 set.

Anybody at NVIDIA who cares to look into these issues? It’s been nearly six weeks now without any answer from NVIDIA.

nvidia-drm.modeset=1 enables nvidia egl-wayland support, GDM/mutter tries to use it even though it doesn’t support egl-wayland.
The high CPU load is caused by GDM/mutter using mesa fallback driver.

Why blame nvidia for a gnome issue?, have you tried disabling wayland in gdm custom.conf?

I am exclusively running Xorg (as visible above due to the fact that I posted a snippet from my Xorg.conf), hence disabling Wayland via /etc/gdm/custom.conf doesn’t make any difference. Even if it did, the other issues about the non-working PowerMizer and the race condition during boot still persist. Those obviously can’t be issues of GNOME and must be caused by the NVIDIA driver.

Quick note: you don’t have acpid running, so the driver simply doesn’t know if you’re on battery or not. That’s mentioned in the log.
The Performance State of nvidia-smi is different from the Performance Level of nvidia-settings
nvidia-smi: lowest=P12(I think)-> highest=P0
nvidia-settings: lowest=0->highest=3,4,5 depending on GPU.
Edit: the Powermizer levels are: lowest=0x3->highest=0x1

I do know that acpid is required for PowerMizer to work correctly, however, I fixed that already some time ago (after the log file from my first post was created). PowerMizer works perfectly fine with the settings from my Xorg.conf when KMS is disabled.

next note: the irq failure is connected to i801_smbus, not nvidia. That’s just failing as a consequence.