Hello,
With Jetson TX2 board, I am trying to run object detection program written in python.
And the python program is based on keras-ratinanet(GitHub - fizyr/keras-retinanet: Keras implementation of RetinaNet object detection.).
The Python used is version 2.7.
As you might know doing object detection using keras-ratinanet is committed via calls as shown below:
image = preprocess_image(image)
image, scale = resize_image(image)
image1 = np.expand_dims(image, axis=0)
boxes, scores, labels = model.predict_on_batch(image1)
While doing “model.predict_on_batch(image1)”, kernel kills python because python triggered OOM killer.
Please look dmesg log below:
[11975.204122] python invoked oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0
[11975.211813] python cpuset=/ mems_allowed=0
[11975.216023] CPU: 0 PID: 6207 Comm: python Not tainted 4.4.38-tegra #1
[11975.222451] Hardware name: quill (DT)
[11975.226106] Call trace:
[11975.228554] [] dump_backtrace+0x0/0x100
[11975.233944] [] show_stack+0x14/0x1c
[11975.238989] [] dump_stack+0x98/0xc0
[11975.244035] [] dump_header.isra.9+0x60/0x1a4
[11975.249860] [] oom_kill_process+0x26c/0x44c
[11975.255594] [] out_of_memory+0x2e0/0x328
[11975.261070] [] __alloc_pages_nodemask+0x93c/0xa60
[11975.267324] [] filemap_fault+0x1a4/0x490
[11975.272800] [] ext4_filemap_fault+0x34/0x50
[11975.278536] [] __do_fault+0x3c/0xb4
[11975.283578] [] handle_mm_fault+0xb18/0x15b0
[11975.289312] [] do_page_fault+0x1c8/0x444
[11975.294787] [] do_mem_abort+0x40/0xa0
[11975.300001] [] do_el0_ia_bp_hardening+0x58/0x60
[11975.306083] [] el0_ia+0x18/0x1c
[11975.310928] Mem-Info:
[11975.313224] active_anon:634550 inactive_anon:3164 isolated_anon:0
active_file:342 inactive_file:423 isolated_file:7
unevictable:4 dirty:0 writeback:0 unstable:0
slab_reclaimable:8763 slab_unreclaimable:11339
mapped:2139 shmem:3749 pagetables:3909 bounce:0
free:9643 free_pcp:61 free_cma:4954
[11975.346389] DMA free:26432kB min:2888kB low:3608kB high:4332kB active_anon:653744kB inactive_anon:3588kB active_file:192kB inactive_file:1768kB unevictable:0kB isolated(anon):0kB isolated(file):28kB present:2078720kB managed:2050448kB mlocked:0kB dirty:0kB writeback:0kB mapped:1784kB shmem:4112kB slab_reclaimable:7276kB slab_unreclaimable:8448kB kernel_stack:2128kB pagetables:4104kB unstable:0kB bounce:0kB free_pcp:164kB local_pcp:0kB free_cma:19816kB writeback_tmp:0kB pages_scanned:12172 all_unreclaimable? yes
[11975.391647] lowmem_reserve: 0 5843 5843
[11975.395716] Normal free:12296kB min:8444kB low:10552kB high:12664kB active_anon:1884456kB inactive_anon:9068kB active_file:320kB inactive_file:2284kB unevictable:16kB isolated(anon):0kB isolated(file):0kB present:6129664kB managed:5984124kB mlocked:16kB dirty:0kB writeback:0kB mapped:5452kB shmem:10884kB slab_reclaimable:27776kB slab_unreclaimable:36908kB kernel_stack:6896kB pagetables:11532kB unstable:0kB bounce:0kB free_pcp:776kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:19332 all_unreclaimable? yes
[11975.441543] lowmem_reserve: 0 0 0
[11975.445082] DMA: 494kB (UEC) 338kB (UEC) 4716kB (UMEC) 3332kB (UMEC) 8764kB (UMC) 33128kB (MC) 18256kB (C) 15512kB (C) 21024kB (C) 02048kB 04096kB = 26396kB
[11975.460455] Normal: 174kB (E) 48kB (UE) 4116kB (UE) 432kB (ME) 1064kB (UME) 54128kB (M) 0256kB 0512kB 01024kB 02048kB 14096kB (H) = 12532kB
[11975.474336] 4647 total pagecache pages
[11975.478105] 0 pages in swap cache
[11975.481488] Swap cache stats: add 0, delete 0, find 0/0
[11975.486729] Free swap = 0kB
[11975.489629] Total swap = 0kB
[11975.492538] 2052096 pages RAM
[11975.495518] 0 pages HighMem/MovableOnly
[11975.499413] 43453 pages reserved
[11975.502668] 16384 pages cma reserved
[11975.506261] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[11975.514859] [ 304] 0 304 5228 2139 13 3 0 0 systemd-journal
[11975.524336] [ 309] 0 309 19420 42 6 3 0 0 lvmetad
…
[11976.530228] [ 5410] 1001 5410 1966 449 7 3 0 0 bash
[11976.538778] [ 6192] 1001 6192 5210212 560544 1587 10 0 0 python
[11976.547532] Out of memory: Kill process 6192 (python) score 279 or sacrifice child
[11976.555183] Killed process 6192 (python) total-vm:20840848kB, anon-rss:1945836kB, file-rss:297816kB
Interesting point in this is that the SIGKILL is delivered only when resize_image() is called with different min_side, max_side value than the default one, (800, 1333):
image, scale = resize_image(image) <-- This is OK though the memory ran out all
image, scale = resize_image(image, 400, 600) <-- This produces OOM killer deliver SIGKILL
To overcome this problem what am I supposed to do?
Is this behavior a normal situation for Jetson TX2 board with keras-ratinanet?
Am I just supposed to use default min_side, max_side value for resize_image()?
Thank you very much!