I’m getting hangs with these “Xid” messages on my new PC, which has a Gigabyte GTX 780, whereby a single application — usually a game, though I suspect this might have hit KWin or X11 at some point, as I’ve had a few system-wide hangs — will have its graphics hang and one of the messages below appear in the kernel log. Sometimes a new frame (or part thereof) will be rendered, though it will sometimes have graphical “glitches”, either as though only part of the display (usually in a few contiguous rectangular regions) will update (typically the very edge of the screen will not).
While I’ve definitely had this issue in the recent port of Portal 2, it appears very quickly (within a minute or two of starting) and reproducibly in both “Crusader Kings 2” and Double Fine’s “Steed” prototype from the 2014 “Amnesia Fortnight”, running under wine 1.7.13. Most other OpenGL programs (including many games, albeit usually less taxing ones) can run for a considerable time with no such problems.
The messages which appear are of the form:
NVRM: GPU at 0000:01:00: GPU-51e8de7f-e984-ff21-7f5e-e9ef3d2d36fc NVRM: Xid (0000:01:00): 13, 0008 00000000 0000a197 000017d8 00000203 0000000c NVRM: Xid (0000:01:00): 32, Channel ID 0000000c intr 00040000 NVRM: Xid (0000:01:00): 13, 000c 00000000 0000a197 000017d8 00010001 0000000c NVRM: Xid (0000:01:00): 13, 000c 00000000 0000a197 000017d8 00000011 0000000c
The nvidia kernel module also complains about a lack of VGA console, and VT switching does not work. The system boots from EFI, and I’ve been unable to find a way of getting a VGA compatible console out of it.
At one point, during the shutdown procedure, I got a:
BUG: soft lockup - CPU#0 stuck for 22s! [X:1041] Modules linked in: ip6t_rpfilter bnep ip6t_REJECT bluetooth cfg80211 xt_conntrack rfkill ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_c binfmt_misc nfsd auth_rpcgss nfs_acl lockd sunrpc i915 i2c_algo_bit drm_kms_helper firewire_ohci mxm_wmi drm firewire_core crc_itu_t i2c_core wmi video Mar 08 21:01:01 sparky kernel: CPU: 0 PID: 1041 Comm: X Tainted: PF IO 3.13.5-202.fc20.x86_64 #1 Hardware name: Gigabyte Technology Co., Ltd. Z87X-UD5H/Z87X-UD5H-CF, BIOS F8 01/17/2014 task: ffff8808053488a0 ti: ffff8807fa462000 task.ti: ffff8807fa462000 RIP: 0010:[<ffffffffa0a7027e>] [<ffffffffa0a7027e>] os_io_write_dword+0xe/0x10 [nvidia] RSP: 0018:ffff8807fa463c20 EFLAGS: 00000286 RAX: 0000000000009400 RBX: 0000000000000001 RCX: ffffffffa0e91bb0 RDX: 000000000000e008 RSI: 0000000000009400 RDI: 000000000000e008 RBP: ffff8807fa463c20 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000001 R13: ffff8807f83b2f28 R14: 0000000000000001 R15: 0000000000000000 FS: 00007f79642389c0(0000) GS:ffff88083f200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f79634e8850 CR3: 000000080f125000 CR4: 00000000001407f0 Stack: ffff8807f83b2f28 ffffffffa0a5019e ffffffffa0e96c84 ffffffffa0a5c633 ffff8807f83b2f80 ffffffffa0a53216 ffff8807fa454008 ffffffffa0a50726 ffff8807fa454008 ffff8807f83b2f84 0000000000004f02 0000000000000001 Call Trace: [<ffffffffa0a5019e>] rm_shutdown_gvi_device+0x106/0x290 [nvidia] [<ffffffffa0a5c633>] ? _nv017101rm+0x8746/0xcee3 [nvidia] [<ffffffffa0a53216>] ? _nv000956rm+0x83/0xa4 [nvidia] [<ffffffffa0a50726>] ? _nv012910rm+0x19d/0x9f0 [nvidia] [<ffffffffa0a3ff3a>] ? _nv013192rm+0x8c/0x16d [nvidia] [<ffffffffa0a444d9>] ? _nv000840rm+0x359/0x3c9 [nvidia] [<ffffffffa0a4446b>] ? _nv000840rm+0x2eb/0x3c9 [nvidia] [<ffffffffa0a449fe>] ? _nv000763rm+0x4b5/0x552 [nvidia] [<ffffffffa0a46e4a>] ? _nv014930rm+0x99/0xbb [nvidia] [<ffffffffa0a3d4ff>] ? _nv000818rm+0x44f/0x9d7 [nvidia] [<ffffffffa0a46d27>] ? rm_ioctl+0x76/0x100 [nvidia] [<ffffffffa0a70b00>] ? os_pci_read_byte+0x10/0x40 [nvidia] [<ffffffffa0a65577>] ? nvidia_ioctl+0x147/0x480 [nvidia] [<ffffffffa0a7237f>] ? nvidia_frontend_ioctl+0x2f/0x70 [nvidia] [<ffffffffa0a723e1>] ? nvidia_frontend_unlocked_ioctl+0x21/0x30 [nvidia] [<ffffffff811cb6f8>] ? do_vfs_ioctl+0x2d8/0x4a0 [<ffffffff811ba79e>] ? ____fput+0xe/0x10 [<ffffffff811cb941>] ? SyS_ioctl+0x81/0xa0 [<ffffffff816919fe>] ? do_page_fault+0xe/0x10 [<ffffffff81695f69>] ? system_call_fastpath+0x16/0x1b Code: 00 00 55 89 f0 89 fa 48 89 e5 66 ef 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 89 f0 89 fa 48 89 e5 ef <5d> c3 0f 1f 44 00 00 55 89 fa 48 89
The system has got other “BUG: Soft Lockup” issues which do not have the nvidia module appear in their stack traces and which may not be nVidia related at all: the system is new and could have other, as yet unidentified issues.
I’ve seen this issue on Fedora 20. The machine has had a complete, exhaustive memory test with MemTest86+ which showed no issues. The graphics card does not appear to be overheating, and is otherwise working well. The system has two 1440x900 monitors, connected over DVI.
The system also has an intel integrated GPU (it is a desktop with an Intel Core i7 4770K CPU). I’ve disabled this in the UEFI BIOS, and am booting with intel_iommu=off, but it does not seem to help. I’ve also reseated the GPU, again this had no effect.
The output from nvidia-bug-report.sh is located here:
Please let me know if there is any more information I can provide, or if this is a known issue or a hardware fault: I am anxious to get this issue resolved.