I experienced a system freeze on a GTX 970 with driver v370.28 this morning.
I was running a Second Life viewer for some time and the whole system froze (display frozen, keyboard unresponsive, etc) for 10 seconds or so, before things returned almost to normal, but with the card down-clocked to 539MHz. The following message got dumped into /var/log/messages:
Oct 21 11:23:48 localhost klogd: NVRM: GPU at PCI:0000:01:00: GPU-9cf0476e-4dbe-8c0f-4352-800be7075c41
Oct 21 11:23:48 localhost klogd: NVRM: Xid (PCI:0000:01:00): 8, Channel 00000010
Oct 21 11:23:51 localhost klogd: NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 21 11:23:53 localhost klogd: NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 21 11:23:53 localhost klogd: NVRM: Xid (PCI:0000:01:00): 50, L2 -> L1
And when I logged off from Second Life later on, I got:
Oct 21 11:30:25 localhost klogd: WARNING: CPU: 3 PID: 12049 at lib/vsprintf.c:1900 format_decode+0x3ac/0x3d0
Oct 21 11:30:25 localhost klogd: Please remove unsupported %{ in format string
Oct 21 11:30:25 localhost klogd: Modules linked in: nvidia_modeset(PO) vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) nvidia(PO) nvidia_drm(PO)
Oct 21 11:30:25 localhost klogd: CPU: 3 PID: 12049 Comm: cat Tainted: P O 4.8.3 #1
Oct 21 11:30:25 localhost klogd: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z68 Extreme4 Gen3, BIOS P2.30 06/29/2012
Oct 21 11:30:25 localhost klogd: 0000000000000286 0000000000000000 ffffffff804bd068 0000000000000007
Oct 21 11:30:25 localhost klogd: ffff8806e5c5fc58 0000000000000000 ffffffff8028df7a ffff8806d616e4e0
Oct 21 11:30:25 localhost klogd: ffff8806e5c5fd00 ffffffff80a0006a ffff8807f7b88000 ffff8806e5c5fd60
Oct 21 11:30:25 localhost klogd: Call Trace:
Oct 21 11:30:25 localhost klogd: [<ffffffff804bd068>] ? dump_stack+0x47/0x5f
Oct 21 11:30:25 localhost klogd: [<ffffffff8028df7a>] ? __warn+0xea/0x110
Oct 21 11:30:25 localhost klogd: [<ffffffff8028e058>] ? warn_slowpath_fmt+0x48/0x50
Oct 21 11:30:25 localhost klogd: [<ffffffff80305384>] ? get_page_from_freelist+0x234/0x7b0
Oct 21 11:30:25 localhost klogd: [<ffffffff804c604c>] ? format_decode+0x3ac/0x3d0
Oct 21 11:30:25 localhost klogd: [<ffffffff804c8075>] ? vsnprintf+0x65/0x560
Oct 21 11:30:25 localhost klogd: [<ffffffff803722cb>] ? seq_vprintf+0x2b/0x50
Oct 21 11:30:25 localhost klogd: [<ffffffff8037232e>] ? seq_printf+0x3e/0x50
Oct 21 11:30:25 localhost klogd: [<ffffffff803ad548>] ? version_proc_show+0x38/0x40
Oct 21 11:30:25 localhost klogd: [<ffffffff8037262f>] ? seq_read+0x12f/0x3b0
Oct 21 11:30:25 localhost klogd: [<ffffffff80332eb1>] ? anon_vma_prepare+0x31/0x180
Oct 21 11:30:25 localhost klogd: [<ffffffff803a511d>] ? proc_reg_read+0x3d/0x70
Oct 21 11:30:25 localhost klogd: [<ffffffff80350cfe>] ? __vfs_read+0x1e/0x110
Oct 21 11:30:25 localhost klogd: [<ffffffff8031958c>] ? vm_mmap_pgoff+0xbc/0xe0
Oct 21 11:30:25 localhost klogd: [<ffffffff803523c2>] ? vfs_read+0xa2/0x130
Oct 21 11:30:25 localhost klogd: [<ffffffff8035249b>] ? SyS_read+0x4b/0xc0
Oct 21 11:30:25 localhost klogd: [<ffffffff8084771b>] ? entry_SYSCALL_64_fastpath+0x13/0x8f
Oct 21 11:30:25 localhost klogd: ---[ end trace cb66c20254363bd2 ]---
I updated yesterday from Linux kernel (vanilla) v4.8.2 to v4.8.3, so this may perhaps be the reason for such a weird bug, that I never got confronted with in the past month I have been running driver v370.28.
Also, I noticed that after that freeze, the Mate “command” applet which runs every 15 seconds a personal “gpustat” script (using nvidia-smi) to display the GPUs temperature, fan speed, etc in the Mate panel, was not reporting anything any more (i.e. nvidia-smi was no more working properly).
I’m also attaching the traditional nvidia-bug-report.log.gz
nvidia-bug-report.log.gz (71.8 KB)