34x/35x/36x freeze at reboot/shutdown/TTY switch, rcu_sched self-detected stall detected

Hello,

I have sent this to linux-bugs@nvidia.com, but I am creating a thread here just in case someone else saw this too.

I am running Debian Jessie on a desktop system with an Nvidia 780.

With recent releases of the Nvidia driver, 34x/35x series, there is a ~30 second freeze at shutdown/reboot, and there is a rcu_sched self-detected stall with a backtrace in journald. This is consistent and happens every time.

I am experiencing this problem with both kernel 3.16 and 4.0.

Has anyone else seen this happen?

Jul 05 13:03:41 luca-desktop nmbd[3576]: Stopping NetBIOS name server: nmbd.
Jul 05 13:04:05 luca-desktop gdm-Xorg-:0[962]: (II) NVIDIA: Freed GPU:0 (GPU-d39fba5b-c027-dfac-4b49-000f28fc8cf9) @
Jul 05 13:04:05 luca-desktop gdm-Xorg-:0[962]: (II) NVIDIA:     PCI:0000:01:00.0
Jul 05 13:04:05 luca-desktop kernel: INFO: rcu_sched self-detected stall on CPU { 0}  (t=5250 jiffies g=1123 c=1122 q=1099)
Jul 05 13:04:05 luca-desktop kernel: sending NMI to all CPUs:
Jul 05 13:04:05 luca-desktop kernel: NMI backtrace for cpu 0
Jul 05 13:04:05 luca-desktop kernel: CPU: 0 PID: 962 Comm: Xorg Tainted: P           O  3.16.0-4-amd64 #1 Debian 3.16.7-ckt11-1
Jul 05 13:04:05 luca-desktop kernel: Hardware name: Gigabyte Technology Co., Ltd. H87-HD3/H87-HD3, BIOS F9 07/16/2014
Jul 05 13:04:05 luca-desktop kernel: task: ffff8800c53ce960 ti: ffff8802223d4000 task.ti: ffff8802223d4000
Jul 05 13:04:05 luca-desktop kernel: RIP: 0010:[<ffffffff812bb910>]  [<ffffffff812bb910>] find_next_bit+0x40/0xd0
Jul 05 13:04:05 luca-desktop kernel: RSP: 0018:ffff88022e203df8  EFLAGS: 00000006
Jul 05 13:04:05 luca-desktop kernel: RAX: 0000000000000000 RBX: ffff88022e20ce40 RCX: 0000000000000001
Jul 05 13:04:05 luca-desktop kernel: RDX: ffff88022e20ce60 RSI: 0000000000000100 RDI: 0000000000000100
Jul 05 13:04:05 luca-desktop kernel: RBP: ffff88022e20ce80 R08: ffff88022e20ce48 R09: 0000000000000008
Jul 05 13:04:05 luca-desktop kernel: R10: 000000000000e988 R11: 0000000000020000 R12: 000000000000cec0
Jul 05 13:04:05 luca-desktop kernel: R13: 0000000000080000 R14: 0000000000000055 R15: 0000000000000000
Jul 05 13:04:05 luca-desktop kernel: FS:  00007f94d74c2980(0000) GS:ffff88022e200000(0000) knlGS:0000000000000000
Jul 05 13:04:05 luca-desktop kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 05 13:04:05 luca-desktop kernel: CR2: 00007f94cfd5c050 CR3: 00000002223f8000 CR4: 00000000001407f0
Jul 05 13:04:05 luca-desktop kernel: Stack:
Jul 05 13:04:05 luca-desktop kernel:  ffffffff8104a8af 0000000000000010 000000000000ce80 0000000200000000
Jul 05 13:04:05 luca-desktop kernel:  0000000000000086 ffff88022e20d660 ffffffff81853680 0000000000000000
Jul 05 13:04:05 luca-desktop kernel:  ffffffff818e2940 000000000000044b ffffffff81853680 ffffffff81046a03
Jul 05 13:04:05 luca-desktop kernel: Call Trace:
Jul 05 13:04:05 luca-desktop kernel:  <IRQ> 
Jul 05 13:04:05 luca-desktop kernel:  [<ffffffff8104a8af>] ? __x2apic_send_IPI_mask+0xbf/0x1a0
Jul 05 13:04:05 luca-desktop kernel:  [<ffffffff81046a03>] ? arch_trigger_all_cpu_backtrace+0xc3/0x140
Jul 05 13:04:05 luca-desktop kernel:  [<ffffffff810c53fa>] ? rcu_check_callbacks+0x3ea/0x630
Jul 05 13:04:05 luca-desktop kernel:  [<ffffffff810c6f45>] ? timekeeping_update.constprop.9+0x35/0x70
Jul 05 13:04:05 luca-desktop kernel:  [<ffffffff810cfe10>] ? tick_sched_handle.isra.16+0x60/0x60
Jul 05 13:04:05 luca-desktop kernel:  [<ffffffff81074ae0>] ? update_process_times+0x40/0x70
Jul 05 13:04:05 luca-desktop kernel:  [<ffffffff810cfdd0>] ? tick_sched_handle.isra.16+0x20/0x60
Jul 05 13:04:05 luca-desktop kernel:  [<ffffffff810cfe4c>] ? tick_sched_timer+0x3c/0x60
Jul 05 13:04:05 luca-desktop kernel:  [<ffffffff8108b097>] ? __run_hrtimer+0x67/0x1c0
Jul 05 13:04:05 luca-desktop kernel:  [<ffffffff8108b449>] ? hrtimer_interrupt+0xe9/0x220
Jul 05 13:04:05 luca-desktop kernel:  [<ffffffff8151444b>] ? smp_apic_timer_interrupt+0x3b/0x60
Jul 05 13:04:05 luca-desktop kernel:  [<ffffffff8151253d>] ? apic_timer_interrupt+0x6d/0x80
Jul 05 13:04:05 luca-desktop kernel:  <EOI> 
Jul 05 13:04:05 luca-desktop kernel:  [<ffffffffa0b9cd10>] ? rm_shutdown_gvi_device+0x1f0/0x2f0 [nvidia]
Jul 05 13:04:05 luca-desktop kernel:  [<ffffffffa0b82902>] ? _nv002260rm+0x52/0xf0 [nvidia]
Jul 05 13:04:05 luca-desktop kernel:  [<ffffffffa0b7b1fd>] ? _nv016140rm+0x4d4d/0xbd60 [nvidia]
Jul 05 13:04:05 luca-desktop kernel:  [<ffffffffa0b75755>] ? _nv000818rm+0x85/0xb0 [nvidia]
Jul 05 13:04:05 luca-desktop kernel:  [<ffffffffa0b9d204>] ? _nv012151rm+0x164/0x540 [nvidia]
Jul 05 13:04:05 luca-desktop kernel:  [<ffffffffa0b8adac>] ? _nv012520rm+0x7c/0x170 [nvidia]
Jul 05 13:04:05 luca-desktop kernel:  [<ffffffffa0b8f092>] ? _nv000717rm+0x2d2/0x360 [nvidia]
Jul 05 13:04:05 luca-desktop kernel:  [<ffffffffa0b8f39d>] ? _nv000641rm+0x27d/0x510 [nvidia]
Jul 05 13:04:05 luca-desktop kernel:  [<ffffffffa0b924b7>] ? _nv014181rm+0x87/0xc0 [nvidia]
Jul 05 13:04:05 luca-desktop kernel:  [<ffffffffa0b88276>] ? _nv000695rm+0x626/0x7d0 [nvidia]
Jul 05 13:04:05 luca-desktop kernel:  [<ffffffffa0b923a3>] ? rm_ioctl+0x73/0x100 [nvidia]
Jul 05 13:04:05 luca-desktop kernel:  [<ffffffffa06e4b3e>] ? nvidia_ioctl+0x13e/0x460 [nvidia]
Jul 05 13:04:05 luca-desktop kernel:  [<ffffffffa06e33ca>] ? nvidia_frontend_ioctl+0x2a/0x60 [nvidia]
Jul 05 13:04:05 luca-desktop kernel:  [<ffffffffa06e3419>] ? nvidia_frontend_unlocked_ioctl+0x19/0x20 [nvidia]
Jul 05 13:04:05 luca-desktop kernel:  [<ffffffff811ba50f>] ? do_vfs_ioctl+0x2cf/0x4b0
Jul 05 13:04:05 luca-desktop kernel:  [<ffffffff810851dc>] ? task_work_run+0x9c/0xd0
Jul 05 13:04:05 luca-desktop kernel:  [<ffffffff811ba771>] ? SyS_ioctl+0x81/0xa0
Jul 05 13:04:05 luca-desktop kernel:  [<ffffffff815135e8>] ? page_fault+0x28/0x30
Jul 05 13:04:05 luca-desktop kernel:  [<ffffffff815115cd>] ? system_call_fast_compare_end+0x10/0x15
Jul 05 13:04:05 luca-desktop kernel: Code: c7 48 89 d7 48 83 e7 c0 48 29 fe 83 e1 3f 75 6b 48 f7 c6 c0 ff ff ff 0f 84 93 00 00 00 49 8b 00 49 8d 50 08 48 85 c0 74 0f eb 3f <48> 83 c2 08 48 8b 42 f8 48 85 c0 75 32 48 83 ee 40 48 83 c7 40

nvidia-bug-report.log.gz (217 KB)

An update, in case someone experiences the same problem and finds the logs through Google:

The crash doesn’t happen anymore since I configured GRUB to use VBE. I did this change because TTYs were not working with the Nvidia driver (black screen when trying to switch to TTY 1/2/etc).
Now the freeze/crash when the X server shuts down/restart no longer happens.

This is the configuration I added to my /etc/default/grub:

GRUB_GFXMODE=1920x1080x32
GRUB_GFXPAYLOAD_LINUX=“keep”
GRUB_VIDEO_BACKEND=“all_video”
GRUB_TERMINAL_OUTPUT=“gfxterm”
GRUB_FONT_PATH="/boot/grub/fonts/unicode.pf2"

EDIT: forgot, also have this in the cmdline:

GRUB_CMDLINE_LINUX_DEFAULT=“quiet vga=normal nomodeset”

Another update: noticed that also disabling the Intel integrated card (Haswell, CPU i5-4670, GPU HD4600) from the BIOS also solves the problem.

I also noticed the exact same problem when switching TTYs (only the first time, then the ~minute delay and backtrace appear). If fact, this problem only happens once per boot, either when switching TTY or rebooting.

Could it be that modeset in the kernel causes problems, given disabling it or disabling the Intel card which uses it solves the issue?

EDIT: forgot to mention, this still happens with the 361/364 drivers too, with kernel 4.3 and 4.4, libdrm2 2.4.67 and Intel DDX 2.99.917.