Hi,
I’m still experiencing random crashdumps on RHEL7.7 due to page allocation failures in Xorg in the nvidia_modeset druver,
Here’s more information:
[983520.867208] Hardware name: Dell Inc. PowerEdge T440/00X7CK, BIOS 2.4.7 10/28/2019
[983520.867209] Call Trace:
[983520.867217] [<ffffffff9797ac23>] dump_stack+0x19/0x1b
[983520.867223] [<ffffffff973c3d70>] warn_alloc_failed+0x110/0x180
[983520.867226] [<ffffffff973c897f>] __alloc_pages_nodemask+0x9df/0xbe0
[983520.867230] [<ffffffff97416b28>] alloc_pages_current+0x98/0x110
[983520.867295] [<ffffffffc1ddaf70>] ? _nv000489kms+0x50/0x50 [nvidia_modeset]
[983520.867299] [<ffffffff973e3b28>] kmalloc_order+0x18/0x40
[983520.867302] [<ffffffff97422056>] kmalloc_order_trace+0x26/0xa0
[983520.867304] [<ffffffff97426611>] ? __kmalloc+0x211/0x230
[983520.867320] [<ffffffffc1ddaf70>] ? _nv000489kms+0x50/0x50 [nvidia_modeset]
[983520.867322] [<ffffffff97426611>] __kmalloc+0x211/0x230
[983520.867338] [<ffffffffc1ddaf70>] ? _nv000489kms+0x50/0x50 [nvidia_modeset]
[983520.867353] [<ffffffffc1dd83f7>] nvkms_alloc+0x27/0x70 [nvidia_modeset]
[983520.867374] [<ffffffffc1e15866>] _nv002516kms+0x16/0x30 [nvidia_modeset]
[983520.867393] [<ffffffffc1e0bbc8>] ? _nv002623kms+0x68/0x1f70 [nvidia_modeset]
[983520.867396] [<ffffffff97416b28>] ? alloc_pages_current+0x98/0x110
[983520.867411] [<ffffffffc1ddaf70>] ? _nv000489kms+0x50/0x50 [nvidia_modeset]
[983520.867415] [<ffffffff973e3b28>] ? kmalloc_order+0x18/0x40
[983520.867417] [<ffffffff97422056>] ? kmalloc_order_trace+0x26/0xa0
[983520.867419] [<ffffffff97426611>] ? __kmalloc+0x211/0x230
[983520.867434] [<ffffffffc1ddaf70>] ? _nv000489kms+0x50/0x50 [nvidia_modeset]
[983520.867450] [<ffffffffc1ddb481>] ? _nv000618kms+0x31/0xe0 [nvidia_modeset]
[983520.867471] [<ffffffffc1ddaf70>] ? _nv000489kms+0x50/0x50 [nvidia_modeset]
[983520.867488] [<ffffffffc1ddc8c6>] ? nvKmsIoctl+0x96/0x1d0 [nvidia_modeset]
[983520.867504] [<ffffffffc1dd9012>] ? nvkms_ioctl_common+0x42/0x80 [nvidia_modeset]
[983520.867520] [<ffffffffc1dd9113>] ? nvkms_ioctl+0xc3/0x110 [nvidia_modeset]
[983520.867737] [<ffffffffc0760083>] ? nvidia_frontend_unlocked_ioctl+0x43/0x50 [nvidia]
[983520.867741] [<ffffffff9745fb40>] ? do_vfs_ioctl+0x3a0/0x5a0
[983520.867745] [<ffffffff97988678>] ? __do_page_fault+0x238/0x500
[983520.867748] [<ffffffff9745fde1>] ? SyS_ioctl+0xa1/0xc0
[983520.867750] [<ffffffff9798dede>] ? system_call_fastpath+0x25/0x2a
[983520.867752] Mem-Info:
[983520.867760] active_anon:3493200 inactive_anon:157792 isolated_anon:0
active_file:15156414 inactive_file:14609790 isolated_file:0
unevictable:176373 dirty:1216487 writeback:0 unstable:0
slab_reclaimable:1069476 slab_unreclaimable:813394
mapped:227242 shmem:161513 pagetables:51099 bounce:0
This has been happening for the past few months under various 430.x and 440.x drivers.
The system has 384Gb memory (including 224G in hugepages, leaving about 160G in normal pages).
I’m attaching the vmcore-dmesg.txt and nvidia-bug-report.log.gz here…
nvidia-bug-report.log.gz (5.48 MB)
vmcore-dmesg.txt (1020 KB)