OOM Killer log in nvidia jetson xavier

Hello,

I’m trying to understand the OOM killer log generated from the Xavier board.

Q.

  1. Why does OOM killer wake up even though there exist enough free pages in DMA & Normal Zone?

I found that the highmem/moveableonly becomes 0 and the OOM killer kills the process.

[274455.983709] 8324608 pages RAM
[274455.983712] 0 pages HighMem/MovableOnly

But, I don’t understand how OOM killer makes this decision.
As far as I know, the number of free pages of DMA and Normal should be less than the current one for OOM killer to wake up.
For example with DMA, low_watermark(4276kB) + reserved (30156 which means 120624kB) is still much smaller than the free pages left in DMA 768096kB

[274455.983570] DMA free:768096kB min:2480kB low:4276kB high:6072kB active_anon:3704kB inactive_anon:10652kB active_file:124kB inactive_file:0kB unevictable:0kB writepending:0kB present:1841152kB managed:1815112kB mlocked:0kB slab_reclaimable:276kB slab_unreclaimable:1032kB kernel_stack:48kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:639428kB
[274455.983573] lowmem_reserve[]: 0 30156 30156 30156

This is total log of OOM killer

[274455.983133] kworker/u16:0 invoked oom-killer: gfp_mask=0x27080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=0, order=2, oom_score_adj=0
[274455.983390] kworker/u16:0 cpuset=/ mems_allowed=0
[274455.983414] CPU: 4 PID: 6852 Comm: kworker/u16:0 Not tainted 4.9.253-tegra #1
[274455.983417] Hardware name: Jetson-AGX (DT)
[274455.983434] Workqueue: events_unbound call_usermodehelper_exec_work
[274455.983440] Call trace:
[274455.983447] [<ffffff800808ba40>] dump_backtrace+0x0/0x198
[274455.983451] [<ffffff800808c004>] show_stack+0x24/0x30
[274455.983458] [<ffffff8008f6121c>] dump_stack+0xa0/0xc4
[274455.983462] [<ffffff8008f5f0b4>] dump_header+0x6c/0x1b8
[274455.983469] [<ffffff80081c7b64>] oom_kill_process+0x29c/0x4c8
[274455.983473] [<ffffff80081c823c>] out_of_memory+0x1e4/0x308
[274455.983477] [<ffffff80081ce0e8>] __alloc_pages_nodemask+0x810/0xcb8
[274455.983486] [<ffffff80080af7f0>] copy_process.isra.7.part.8+0xf0/0x1578
[274455.983490] [<ffffff80080b0e18>] _do_fork+0xd8/0x470
[274455.983493] [<ffffff80080b1258>] kernel_thread+0x48/0x58
[274455.983497] [<ffffff80080d0504>] call_usermodehelper_exec_work+0x34/0xd0
[274455.983502] [<ffffff80080d40ac>] process_one_work+0x1e4/0x4b0
[274455.983505] [<ffffff80080d43c8>] worker_thread+0x50/0x4c8
[274455.983510] [<ffffff80080db09c>] kthread+0xec/0xf0
[274455.983514] [<ffffff80080838a0>] ret_from_fork+0x10/0x30
[274455.983544] Mem-Info:
[274455.983555] active_anon:9750 inactive_anon:11166 isolated_anon:0
                 active_file:1977 inactive_file:3643 isolated_file:0
                 unevictable:3879 dirty:0 writeback:0 unstable:0
                 slab_reclaimable:19486 slab_unreclaimable:38438
                 mapped:1659 shmem:0 pagetables:13699 bounce:0
                 free:227374 free_pcp:126 free_cma:159857
[274455.983563] Node 0 active_anon:39000kB inactive_anon:44664kB active_file:7908kB inactive_file:14572kB unevictable:15516kB isolated(anon):0kB isolated(file):0kB mapped:6636kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 26624kB writeback_tmp:0kB unstable:0kB pages_scanned:51 all_unreclaimable? no
[274455.983570] DMA free:768096kB min:2480kB low:4276kB high:6072kB active_anon:3704kB inactive_anon:10652kB active_file:124kB inactive_file:0kB unevictable:0kB writepending:0kB present:1841152kB managed:1815112kB mlocked:0kB slab_reclaimable:276kB slab_unreclaimable:1032kB kernel_stack:48kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:639428kB
[274455.983573] lowmem_reserve[]: 0 30156 30156 30156
[274455.983591] Normal free:141400kB min:42572kB low:73448kB high:104324kB active_anon:35296kB inactive_anon:34268kB active_file:7716kB inactive_file:14544kB unevictable:15516kB writepending:0kB present:31457280kB managed:30879756kB mlocked:0kB slab_reclaimable:77668kB slab_unreclaimable:152720kB kernel_stack:8960kB pagetables:54796kB bounce:0kB free_pcp:664kB local_pcp:0kB free_cma:0kB
[274455.983593] lowmem_reserve[]: 0 0 0 0
[274455.983605] DMA: 48*4kB (UMC) 48*8kB (UMC) 60*16kB (UMC) 70*32kB (UMC) 19*64kB (UMC) 14*128kB (MC) 15*256kB (MC) 16*512kB (UMC) 12*1024kB (MC) 26*2048kB (UM) 167*4096kB (UMEC) = 768384kB
[274455.983653] Normal: 7117*4kB (UMEH) 4516*8kB (UMEH) 807*16kB (UMH) 1870*32kB (UMH) 35*64kB (UMH) 13*128kB (UMH) 2*256kB (MH) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 141764kB
[274455.983692] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[274455.983696] 10399 total pagecache pages
[274455.983699] 905 pages in swap cache
[274455.983702] Swap cache stats: add 120739217, delete 120863355, find 6722236/34372969
[274455.983705] Free swap  = 13121920kB
[274455.983707] Total swap = 16347424kB
[274455.983709] 8324608 pages RAM
[274455.983712] 0 pages HighMem/MovableOnly
[274455.983714] 150891 pages reserved
[274455.983716] 188416 pages cma reserved

Below link may help to explain it.

Thanks for the reply

I think I already read that article before…
I’m more about why the OOM killer was invoked in the first place rather than how to not kill the process…

Do you have any ideas?

Sorry, don’t have clean for the real condition to trigger the OOM.

I also cannot answer, but I’ll add that some software is in user space such that virtual swapped memory can be used. Other software, such as what the GPU uses for CUDA, must be actual physical RAM (swap or other virtual memory will not work for this). Possibly you are seeing something which has a need for physical RAM even though virtual memory is still available.

Hmm…

If i add something more, the OOM Killer was invoked under a huge device memory pressure, I was running several DNN applications at once.

Since the Xavier platform has integrated RAM, is it possible that OOM Killer didn’t print out the amount of the device memory in use?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.