Kernel oops when hibernating (GTX 960)

Hi,

I’m unable to get hibernate working as a kernel “NULL pointer dereference” occurs during the hibernation process. Earlier in the process, I also see a couple of warnings and errors. These appear very soon after hibernation begins:

[  320.287909] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuin
g.
[  320.290118] NVRM: GPU at PCI:0000:00:09: GPU-53f856e7-e322-8303-ed16-bdedaf06badf
[  320.290809] NVRM: Xid (PCI:0000:00:09): 56, CMDre 00000000 00000088 00010009 00000007 00000000
[  323.344645] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuin
g.
[  325.346346] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000957d:0:0
[  333.361349] nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed:
 0x65 (Call timed out [NV_ERR_TIMEOUT])
[  333.362995] nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffe
r

For a few seconds after that the kernel continues to hibernate before hitting the following:

[  334.590697] BUG: unable to handle kernel NULL pointer dereference at 00000000000009f0
[  334.590707] IP: _nv000407kms+0x2bb/0x540 [nvidia_modeset]
[  334.590707] PGD 0 
[  334.590708] P4D 0 
[  334.590708] 
[  334.590709] Oops: 0002 [#1] SMP PTI
[  334.590710] Modules linked in: uas usb_storage hid_logitech_hidpp hid_logitech ff_memless hid_l
ogitech_dj hid_multitouch rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace crct10dif_pclmul joyde
v nls_iso8859_1 crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_help
er cryptd hid_generic snd_hda_codec_hdmi usbhid hid snd_hda_intel snd_hda_codec snd_usb_audio snd_
hda_core snd_usbmidi_lib snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq int
el_rapl_perf snd_seq_device snd_timer input_leds snd serio_raw nvidia_uvm(POE) virtio_rng soundcor
e shpchp i2c_piix4 qemu_fw_cfg mac_hid parport_pc ppdev lp sunrpc parport ip_tables x_tables autof
s4 9pnet_virtio 9p 9pnet fscache nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) drm_kms_helper sy
scopyarea sysfillrect sysimgblt
[  334.590735]  fb_sys_fops drm psmouse virtio_blk virtio_net pata_acpi floppy
[  334.590739] CPU: 0 PID: 1726 Comm: kworker/u12:74 Tainted: P           OE   4.13.0-36-generic #
40-Ubuntu
[  334.590740] calling  0000:00:08.0+ @ 1769, parent: pci0000:00
[  334.590741] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[  334.590744] Workqueue: events_unbound async_run_entry_fn
[  334.590745] task: ffff9c2de6bdc740 task.stack: ffffb20483a5c000
[  334.590752] RIP: 0010:_nv000407kms+0x2bb/0x540 [nvidia_modeset]
[  334.590753] RSP: 0018:ffffb20483a5faf0 EFLAGS: 00010296
[  334.590754] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9c2de37d9b20
[  334.590755] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9c2de9577808
[  334.590755] RBP: ffffffffc1373080 R08: 0000000000000001 R09: 0000000000000058
[  334.590756] R10: 0000000000000058 R11: 0000000000000058 R12: ffff9c2de9577808
[  334.590756] R13: ffff9c2de9577008 R14: 00000000ffffffff R15: 0000000000000000
[  334.590758] FS:  0000000000000000(0000) GS:ffff9c2dffc00000(0000) knlGS:0000000000000000
[  334.590758] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  334.590759] CR2: 00000000000009f0 CR3: 000000012280a003 CR4: 00000000003606f0
[  334.590762] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  334.590763] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  334.590763] Call Trace:
[  334.590772]  ? _nv000026kms+0x5fc/0x700 [nvidia_modeset]
[  334.590773]  ? put_dec+0x18/0xa0
[  334.590774]  ? number+0x30a/0x330
[  334.590781]  ? _nv001978kms+0x94/0x100 [nvidia_modeset]
[  334.590788]  ? nvKmsSuspend+0x39/0x60 [nvidia_modeset]
[  334.590794]  ? nvkms_suspend+0x1f/0x30 [nvidia_modeset]
[  334.590795] calling  0000:00:07.0+ @ 1683, parent: pci0000:00
[  334.590902]  ? nvidia_modeset_suspend+0x1d/0x30 [nvidia]
[  334.590904] calling  0000:00:06.0+ @ 1736, parent: pci0000:00
[  334.590987]  ? nvidia_suspend+0x37/0x80 [nvidia]
[  334.590989]  ? pci_legacy_suspend+0x3a/0xc0
[  334.590991]  ? pci_pm_poweroff+0xce/0x100
[  334.590992]  ? dpm_run_callback+0x5a/0x150
[  334.590994]  ? pci_pm_restore+0xb0/0xb0
[  334.590995]  ? __device_suspend+0x11f/0x3a0
[  334.590996]  ? async_suspend+0x1f/0xa0
[  334.590997]  ? async_run_entry_fn+0x3c/0x150
[  334.590999]  ? process_one_work+0x1ec/0x410
[  334.591000]  ? worker_thread+0x32/0x410
[  334.591001]  ? kthread+0x128/0x140
[  334.591003]  ? process_one_work+0x410/0x410
[  334.591004]  ? kthread_create_on_node+0x70/0x70
[  334.591006]  ? ret_from_fork+0x35/0x40
[  334.591006] Code: 00 00 4c 89 fe 89 c3 e8 24 7a ff ff 48 8b 7c 24 18 09 c3 ba 01 00 00 00 4c 89 fe e8 10 7a ff ff 08 c3 0f 85 0c 02 00 00 41 ff ce <41> c7 87 f0 09 00 00 00 00 00 00 48 83 6c 24 20 08 48 83 6c 24 
[  334.591030] RIP: _nv000407kms+0x2bb/0x540 [nvidia_modeset] RSP: ffffb20483a5faf0
[  334.591030] CR2: 00000000000009f0
[  334.591032] ---[ end trace afc6e93af5cc9451 ]---

Cheers,
Owen
nvidia-bug-report.log.gz (234 KB)
kernel-out-ubuntu.txt (19 KB)

Your dmesg output in logs is a bit odd, I can’t see any console framebuffer driver being used. It looks like an efi boot but efifb does not seem to be used.
HussamT in this forum found out that he could avoid The XID 56 in a similar case by setting the resolution of the console fb to native resolution of the display. Having mbr boot, he used grub for it.