Hi All,
Been experiencing semi-regular crashes when using an Nvidia driver on Linux (Mint, variation of Ubuntu). At some point, after the screens saver kicks in, the screen freezes and is unresponsive. Internally, a page allocation error occurs in XOrg, apparently in response to an Nvidia driver action. (See dump below.) Google revealed a number of similar issues, but no resolution.
Any suggestions for resolution or to diagnose the issue?
The problem started when switching to a 4K monitor and the latest version of Linux Mint. (Yes, that is two changes at once; needed the new Mint version to meaningfully use a 4K monitor.) The problem did not occur with a 1080p monitor, nor did it occur when using the integrated Intel graphics with that monitor. Occurs with both the 440 and the new 450 drivers.
Setup:
$> lspci -v
01:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 710] (rev a1) (prog-if 00 [VGA controller])
Subsystem: Micro-Star International Co., Ltd. [MSI] GK208B [GeForce GT 710]
Flags: bus master, fast devsel, latency 0, IRQ 52
...
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
The system has 32G of memory and typically has many GBs free.
$> modinfo nvidia
filename: /lib/modules/5.4.0-45-generic/kernel/drivers/char/drm/nvidia.ko
alias: char-major-195-*
version: 450.66
supported: external
license: NVIDIA
srcversion: 68525565C8AD4BD8D2EA2A5
...
Crash info from /var/log/syslog
, which appears to show an nvidia driver operation of some kind:
Sep 3 02:33:42 paul-linux kernel: [50385.458650] Xorg: page allocation failure: order:5, mode:0x40cc0(GFP_KERNEL|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
Sep 3 02:33:42 paul-linux kernel: [50385.458656] CPU: 2 PID: 1324 Comm: Xorg Tainted: P OE 5.4.0-45-generic #49-Ubuntu
Sep 3 02:33:42 paul-linux kernel: [50385.458657] Hardware name: System manufacturer System Product Name/P8Z68-V PRO GEN3, BIOS 3802 01/15/2015
Sep 3 02:33:42 paul-linux kernel: [50385.458657] Call Trace:
Sep 3 02:33:42 paul-linux kernel: [50385.458664] dump_stack+0x6d/0x9a
Sep 3 02:33:42 paul-linux kernel: [50385.458668] warn_alloc.cold+0x7b/0xdf
Sep 3 02:33:42 paul-linux kernel: [50385.458670] __alloc_pages_slowpath+0xe07/0xe50
Sep 3 02:33:42 paul-linux kernel: [50385.458673] ? get_page_from_freelist+0x233/0x390
Sep 3 02:33:42 paul-linux kernel: [50385.458675] __alloc_pages_nodemask+0x2d0/0x320
Sep 3 02:33:42 paul-linux kernel: [50385.458677] alloc_pages_current+0x87/0xe0
Sep 3 02:33:42 paul-linux kernel: [50385.458680] kmalloc_order+0x1f/0x80
Sep 3 02:33:42 paul-linux kernel: [50385.458681] kmalloc_order_trace+0x24/0xa0
Sep 3 02:33:42 paul-linux kernel: [50385.458682] __kmalloc+0x220/0x280
Sep 3 02:33:42 paul-linux kernel: [50385.458698] nvkms_alloc+0x24/0x60 [nvidia_modeset]
Sep 3 02:33:42 paul-linux kernel: [50385.458712] _nv002653kms+0x16/0x30 [nvidia_modeset]
Sep 3 02:33:42 paul-linux kernel: [50385.458722] WARNING: kernel stack frame pointer at 000000009306a642 in Xorg:1324 has bad value 00000000ab21e5a4
...
Sep 3 02:33:42 paul-linux kernel: [50385.458760] 00000000bcec5f3c: ffffffffc21f9c46 (_nv002653kms+0x16/0x30 [nvidia_modeset])
...
Sep 3 02:33:42 paul-linux kernel: [50385.458784] 000000007c69afc8: ffffffff8e6814e7 (__alloc_pages_slowpath+0xe07/0xe50)
...
Sep 3 02:33:42 paul-linux kernel: [50385.459096] 000000003f8da3b1: ffffffffc0d7c07b (nvidia_frontend_unlocked_ioctl+0x3b/0x50 [nvidia])
Happy to provide more details. Or, if Nvidia is the victim of some other issue, would appreciate hints to track it down.
Thanks,
- Paulnvidia-bug-report.log.gz (236.5 KB)