Hi There,
I got yet another crashump in nvkms on RHEL7.8 (latest kernel) using the latest stable driver (450.57).
This time the crash happened on a Quadro P2200 GPU (previous reports had been on GTX 1660 Ti GPUs).
There seems to aggravating factors:
- Machine has 512gb RAM of which 384gb are set aside for hugepages.
- Chrome was running on Xorg.
This issue is similar to:
Here are some of my reports on the NVidia website:
Again, this is on a fully patched NUMA machine and the call stack trace is identical:
[235013.032911] X: page allocation failure: order:5, mode:0x40d0
[235013.032916] CPU: 50 PID: 9887 Comm: X Kdump: loaded Tainted: P OE ------------ T 3.10.0-1127.18.2.el7.x86_64 #1
[235013.032918] Hardware name: Dell Inc. PowerEdge T640/04WYPY, BIOS 2.8.1 06/29/2020
[235013.032919] Call Trace:
[235013.032930] [<ffffffffaa97ffa5>] dump_stack+0x19/0x1b
[235013.032934] [<ffffffffaa3c4b70>] warn_alloc_failed+0x110/0x180
[235013.032936] [<ffffffffaa97b4c0>] __alloc_pages_slowpath+0x6bb/0x729
[235013.032939] [<ffffffffaa3c91f6>] __alloc_pages_nodemask+0x436/0x450
[235013.032943] [<ffffffffaa418ea8>] alloc_pages_current+0x98/0x110
[235013.032946] [<ffffffffaa3e57c8>] kmalloc_order+0x18/0x40
[235013.032949] [<ffffffffaa424466>] kmalloc_order_trace+0x26/0xa0
[235013.032951] [<ffffffffaa4283f1>] ? __kmalloc+0x211/0x230
[235013.032952] [<ffffffffaa4283f1>] __kmalloc+0x211/0x230
[235013.033006] [<ffffffffc22ee3f7>] nvkms_alloc+0x27/0x70 [nvidia_modeset]
[235013.033021] [<ffffffffc232ce86>] _nv002654kms+0x16/0x30 [nvidia_modeset]
[235013.033034] [<ffffffffc2324066>] ? _nv002760kms+0x66/0x1470 [nvidia_modeset]
[235013.033045] [<ffffffffc22f1090>] ? _nv000531kms+0x50/0x50 [nvidia_modeset]
[235013.033046] [<ffffffffaa3e57c8>] ? kmalloc_order+0x18/0x40
[235013.033047] [<ffffffffaa424466>] ? kmalloc_order_trace+0x26/0xa0
[235013.033048] [<ffffffffaa4283f1>] ? __kmalloc+0x211/0x230
[235013.033058] [<ffffffffc22f1090>] ? _nv000531kms+0x50/0x50 [nvidia_modeset]
[235013.033068] [<ffffffffc22f15a1>] ? _nv000673kms+0x31/0xe0 [nvidia_modeset]
[235013.033082] [<ffffffffc22f1090>] ? _nv000531kms+0x50/0x50 [nvidia_modeset]
[235013.033092] [<ffffffffc22f29f6>] ? nvKmsIoctl+0x96/0x1d0 [nvidia_modeset]
[235013.033102] [<ffffffffc22ef022>] ? nvkms_ioctl_common+0x42/0x80 [nvidia_modeset]
[235013.033112] [<ffffffffc22ef123>] ? nvkms_ioctl+0xc3/0x110 [nvidia_modeset]
[235013.033206] [<ffffffffc068f083>] ? nvidia_frontend_unlocked_ioctl+0x43/0x50 [nvidia]
[235013.033210] [<ffffffffaa462890>] ? do_vfs_ioctl+0x3a0/0x5b0
[235013.033213] [<ffffffffaa98d678>] ? __do_page_fault+0x238/0x500
[235013.033214] [<ffffffffaa462b41>] ? SyS_ioctl+0xa1/0xc0
[235013.033217] [<ffffffffaa992ed2>] ? system_call_fastpath+0x25/0x2a
[235013.033218] Mem-Info:
[235013.033236] active_anon:3511544 inactive_anon:1120792 isolated_anon:0
active_file:12973796 inactive_file:10215559 isolated_file:32
unevictable:493405 dirty:156 writeback:0 unstable:0
slab_reclaimable:1308485 slab_unreclaimable:1154922
mapped:223515 shmem:318761 pagetables:65619 bounce:0
free:1136004 free_pcp:516 free_cma:0
[235013.033242] Node 0 DMA free:15864kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15980kB managed:15896kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[235013.033246] lowmem_reserve[]: 0 1333 257060 257060
[235013.033252] Node 0 DMA32 free:1023476kB min:5428kB low:6784kB high:8140kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1693292kB managed:1365580kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[235013.033255] lowmem_reserve[]: 0 0 255727 255727
[235013.033260] Node 0 Normal free:1899692kB min:1041188kB low:1301484kB high:1561780kB active_anon:6640212kB inactive_anon:2113984kB active_file:27761332kB inactive_file:18884208kB unevictable:86492kB isolated(anon):0kB isolated(file):0kB present:266076160kB managed:261864516kB mlocked:86084kB dirty:336kB writeback:0kB mapped:613196kB shmem:1095788kB slab_reclaimable:3068880kB slab_unreclaimable:2756384kB kernel_stack:46944kB pagetables:149248kB unstable:0kB bounce:0kB free_pcp:1312kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[235013.033264] lowmem_reserve[]: 0 0 0 0
[235013.033269] Node 1 Normal free:1604984kB min:1050468kB low:1313084kB high:1575700kB active_anon:7405964kB inactive_anon:2369184kB active_file:24133852kB inactive_file:21978028kB unevictable:1887128kB isolated(anon):0kB isolated(file):128kB present:268435456kB managed:264198980kB mlocked:1887128kB dirty:288kB writeback:0kB mapped:280864kB shmem:179256kB slab_reclaimable:2165060kB slab_unreclaimable:1863272kB kernel_stack:41536kB pagetables:113228kB unstable:0kB bounce:0kB free_pcp:752kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[235013.033272] lowmem_reserve[]: 0 0 0 0
[235013.033274] Node 0 DMA: 0*4kB 1*8kB (U) 1*16kB (U) 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15864kB
[235013.033281] Node 0 DMA32: 5*4kB (UM) 6*8kB (UM) 5*16kB (UM) 5*32kB (UM) 5*64kB (UM) 7*128kB (UM) 2*256kB (UM) 3*512kB (M) 4*1024kB (UM) 2*2048kB (UM) 247*4096kB (M) = 1023476kB
[235013.033287] Node 0 Normal: 196675*4kB (UEM) 102907*8kB (UEM) 9099*16kB (UEM) 2016*32kB (UEM) 818*64kB (UEM) 137*128kB (UEM) 31*256kB (UEM) 6*512kB (UEM) 0*1024kB 0*2048kB 0*4096kB = 19
00948kB