367/370.xx + 980m w/4k screen = lock up at boot (Ubuntu 16.10)

I tested with the new 367.44 drivers, to be thorough, but they result in the same lockup scenario.

I installed these drivers using the official Nvidia .run package, just to rule out any issues with the build process or Ubuntu graphics-drivers PPA.

Aug 24 13:22:53 sager kernel: [   18.954515] nvidia: module license 'NVIDIA' taints kernel.
Aug 24 13:22:53 sager kernel: [   18.957924] nvidia: module verification failed: signature and/or required key missing - tainting kernel
Aug 24 13:22:53 sager kernel: [   18.963393] nvidia-nvlink: Nvlink Core is being initialized, major device number 246
Aug 24 13:22:53 sager kernel: [   18.963406] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  367.44  Wed Aug 17 22:24:07 PDT 2016
Aug 24 13:22:53 sager kernel: [   18.972016] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  367.44  Wed Aug 17 21:54:40 PDT 2016
Aug 24 13:22:53 sager kernel: [   18.973164] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
Aug 24 13:22:53 sager kernel: [   19.423917] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input19
Aug 24 13:22:53 sager kernel: [   19.423977] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input20
Aug 24 13:22:53 sager kernel: [   19.424040] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input21
Aug 24 13:22:53 sager kernel: [   19.424088] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input22

Aug 24 13:22:54 sager kernel: [   34.110289] nvidia-modeset: Allocated GPU:0 (GPU-e2f980da-ea7e-4335-6ae3-41ae731aed6d) @ PCI:0000:01:00.0
Aug 24 13:22:54 sager kernel: [   34.158758] NVRM: GPU at PCI:0000:01:00: GPU-e2f980da-ea7e-4335-6ae3-41ae731aed6d
Aug 24 13:22:54 sager kernel: [   34.158761] NVRM: Xid (PCI:0000:01:00): 61, 1362(3368) 00000000 00000000

Aug 24 13:23:01 sager kernel: [   41.182295] nvidia-modeset: WARNING: GPU:0: Lost display notification; continuing.

Aug 24 13:23:04 sager kernel: [   43.933741] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000957d:0:0:0x00000040

...

Aug 24 13:23:28 sager kernel: [   68.112255] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [Xorg:5117]
Aug 24 13:23:28 sager kernel: [   68.112258] Modules linked in: pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) cmac bnep vboxdrv(OE) binfmt_misc nls_iso8859_1 snd_hda_codec_hdmi arc4 snd_usb_audio snd_usbmidi_lib btusb btrtl snd_hda_codec_realtek snd_hda_codec_generic nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) i915_bpo iwlmvm mac80211 intel_ips i2c_algo_bit intel_rapl snd_hda_intel x86_pkg_temp_thermal snd_hda_codec intel_powerclamp kvm_intel snd_hda_core kvm snd_hwdep irqbypass snd_pcm iwlwifi joydev snd_seq_midi snd_seq_midi_event cfg80211 snd_rawmidi input_leds serio_raw snd_seq rtsx_pci_ms snd_seq_device memstick snd_timer drm_kms_helper snd drm soundcore fb_sys_fops syscopyarea mei_me sysfillrect sysimgblt mei hci_uart btbcm btqca btintel bluetooth shpchp intel_lpss_acpi intel_lpss mac_hid acpi_pad coretemp parport_pc ppdev lp parport autofs4 btrfs xor raid6_pq drbg ansi_cprng algif_skcipher af_alg dm_crypt hid_generic usbhid mmc_block rtsx_pci_sdmmc mxm_wmi crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse r8169 ahci rtsx_pci mii libahci wmi i2c_hid pinctrl_sunrisepoint hid pinctrl_intel video fjes
Aug 24 13:23:28 sager kernel: [   68.112355] CPU: 3 PID: 5117 Comm: Xorg Tainted: P           OE   4.4.0-9134-generic #53-Ubuntu
Aug 24 13:23:28 sager kernel: [   68.112356] Hardware name: Notebook                         P65_P67RGRERA/P65_P67RGRERA, BIOS 1.05.13RLS1 02/02/2016
Aug 24 13:23:28 sager kernel: [   68.112357] task: ffff88084a682c40 ti: ffff88084bfc0000 task.ti: ffff88084bfc0000
Aug 24 13:23:28 sager kernel: [   68.112358] RIP: 0010:[<ffffffffc15b39c0>]  [<ffffffffc15b39c0>] _nv001865kms+0xd0/0x120 [nvidia_modeset]
Aug 24 13:23:28 sager kernel: [   68.112367] RSP: 0018:ffff88084bfc3718  EFLAGS: 00000283
Aug 24 13:23:28 sager kernel: [   68.112368] RAX: 00000000000000dc RBX: ffff880828156008 RCX: 0000000000000000
Aug 24 13:23:28 sager kernel: [   68.112368] RDX: 0000000000000000 RSI: 000000000062c411 RDI: ffffffff81e27b80
Aug 24 13:23:28 sager kernel: [   68.112369] RBP: ffff880087549000 R08: 0000000000000001 R09: 0000000000000004
Aug 24 13:23:28 sager kernel: [   68.112370] R10: 0000000000000020 R11: 0000000000000000 R12: 00053ad55f6f1a7d
Aug 24 13:23:28 sager kernel: [   68.112370] R13: 0000000000000001 R14: ffff880849d4af08 R15: 0000000000000001
Aug 24 13:23:28 sager kernel: [   68.112371] FS:  00007fecd71efa40(0000) GS:ffff8808764c0000(0000) knlGS:0000000000000000
Aug 24 13:23:28 sager kernel: [   68.112372] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 24 13:23:28 sager kernel: [   68.112373] CR2: 000055be9b737474 CR3: 000000082143e000 CR4: 00000000003406e0
Aug 24 13:23:28 sager kernel: [   68.112374] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 24 13:23:28 sager kernel: [   68.112374] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Aug 24 13:23:28 sager kernel: [   68.112375] Stack:
Aug 24 13:23:28 sager kernel: [   68.112375]  ffff880828156008 0000000000000000 ffff880828155808 00000000c15a6d35
Aug 24 13:23:28 sager kernel: [   68.112377]  0000000000000000 0000000000000001 ffff880828155808 ffff880828156008
Aug 24 13:23:28 sager kernel: [   68.112378]  ffff880828155808 0000000000000001 00000000ffffffff ffffffffc15aad93
Aug 24 13:23:28 sager kernel: [   68.112379] Call Trace:
Aug 24 13:23:28 sager kernel: [   68.112385]  [<ffffffffc15aad93>] ? _nv001762kms+0xa3/0x110 [nvidia_modeset]
Aug 24 13:23:28 sager kernel: [   68.112394]  [<ffffffffc15c5e08>] ? _nv001994kms+0x1ca8/0x2300 [nvidia_modeset]
Aug 24 13:23:28 sager kernel: [   68.112399]  [<ffffffffc15923e0>] ? nvkms_alloc+0x50/0x60 [nvidia_modeset]
Aug 24 13:23:28 sager kernel: [   68.112403]  [<ffffffffc1593530>] ? _nv000325kms+0x30/0x30 [nvidia_modeset]
Aug 24 13:23:28 sager kernel: [   68.112407]  [<ffffffffc159355e>] ? _nv000334kms+0x2e/0x40 [nvidia_modeset]
Aug 24 13:23:28 sager kernel: [   68.112411]  [<ffffffffc15940f1>] ? nvKmsIoctl+0x161/0x1e0 [nvidia_modeset]
Aug 24 13:23:28 sager kernel: [   68.112415]  [<ffffffffc1592d95>] ? nvkms_ioctl_common+0x45/0x80 [nvidia_modeset]
Aug 24 13:23:28 sager kernel: [   68.112420]  [<ffffffffc1592e41>] ? nvkms_ioctl+0x71/0xa0 [nvidia_modeset]
Aug 24 13:23:28 sager kernel: [   68.112468]  [<ffffffffc09cf080>] ? nvidia_frontend_compat_ioctl+0x40/0x50 [nvidia]
Aug 24 13:23:28 sager kernel: [   68.112506]  [<ffffffffc09cf09e>] ? nvidia_frontend_unlocked_ioctl+0xe/0x10 [nvidia]
Aug 24 13:23:28 sager kernel: [   68.112508]  [<ffffffff81220c3f>] ? do_vfs_ioctl+0x29f/0x490
Aug 24 13:23:28 sager kernel: [   68.112510]  [<ffffffff8120f811>] ? __sb_end_write+0x21/0x30
Aug 24 13:23:28 sager kernel: [   68.112512]  [<ffffffff8120d41d>] ? vfs_write+0x15d/0x1a0
Aug 24 13:23:28 sager kernel: [   68.112513]  [<ffffffff81220ea9>] ? SyS_ioctl+0x79/0x90
Aug 24 13:23:28 sager kernel: [   68.112515]  [<ffffffff8182df32>] ? entry_SYSCALL_64_fastpath+0x16/0x71
Aug 24 13:23:28 sager kernel: [   68.112516] Code: 44 21 f8 41 39 c5 74 45 e8 ce eb fd ff 44 8b 9b 88 04 00 00 45 85 db 75 e4 49 39 c4 73 df 48 8b 4c 24 08 49 8b 44 ce 60 8b 40 04 <41> 39 86 b4 00 00 00 75 c9 48 8b 7c 24 10 48 c7 c2 f8 a8 60 c1

At the point where LightDM should start, the display briefly flashes from a screen with an underline cursor on the top left to a flat black/zero-brightness black screen and then back to the underline cursor. After that, the system is hung hard until I do a REISUB reboot.

Removing 367.44 and reinstalling 364.19 always works fine. Anything newer than 364 causes these hangs.

I have the same (or very similar) issue with: Linux Mint 18 XFCE, Asus GTX 970, Nvidia 370.23, 4.7.0-2.1-liquorix-amd64

[152424.129185] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000927c:0:0:0x00000020
[152426.129854] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000927c:0:0:0x00000020
[152428.130240] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000927c:0:0:0x00000020
[152430.140585] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000927c:0:0:0x00000020
[152432.143924] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000927c:0:0:0x00000020
[152434.184130] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000927c:0:0:0x00000020
[152436.184472] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000927c:0:0:0x00000020
[152440.573673] chrome[6845]: segfault at 968 ip 00007effb514a643 sp 00007ffc4921a3f0 error 4 in libX11.so.6.3.0[7effb5122000+135000]
[152440.938467] nvidia-modeset: Freed GPU:0 (GPU-ba7b0eba-e425-b284-0164-7cf80e5e6df9) @ PCI:0000:01:00.0
[152442.472835] nvidia-modeset: Allocated GPU:0 (GPU-ba7b0eba-e425-b284-0164-7cf80e5e6df9) @ PCI:0000:01:00.0

The screen went black and after few seconds X server restarted.

Tested with the new 370.28 drivers; They are still failing… but there appears to be a slight glimmer of progress:

Judging from the log, the system was not locked hard with these drivers - though the screen still showed black with the cursor in the top. As przybyl reported, I’m also still seeing the time-out messages in the log.

Sep  9 12:45:55 sager kernel: [   47.170789] nvidia-modeset: WARNING: GPU:0: Lost display notification; continuing.
Sep  9 12:45:58 sager kernel: [   49.925481] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000957d:0:0:0x00000040

Tomorrow when I’m home, I will install them again and see if I can remote into the machine to run the nvidia debug gathering.

Despite not appearing to be hung, the 370.28 drivers are still doing something to the system. Networking never starts, so I can’t run the nvidia debug. Therefore, I’m back to using 364 until the next release comes out to test.