Python triggered kernel's OOM killer

Hello,

With Jetson TX2 board, I am trying to run object detection program written in python.
And the python program is based on keras-ratinanet(GitHub - fizyr/keras-retinanet: Keras implementation of RetinaNet object detection.).
The Python used is version 2.7.

As you might know doing object detection using keras-ratinanet is committed via calls as shown below:

image = preprocess_image(image)                                              
image, scale = resize_image(image)                                           
image1 = np.expand_dims(image, axis=0)                                       
boxes, scores, labels = model.predict_on_batch(image1) 

While doing “model.predict_on_batch(image1)”, kernel kills python because python triggered OOM killer.
Please look dmesg log below:

[11975.204122] python invoked oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0
[11975.211813] python cpuset=/ mems_allowed=0
[11975.216023] CPU: 0 PID: 6207 Comm: python Not tainted 4.4.38-tegra #1
[11975.222451] Hardware name: quill (DT)
[11975.226106] Call trace:
[11975.228554] [] dump_backtrace+0x0/0x100
[11975.233944] [] show_stack+0x14/0x1c
[11975.238989] [] dump_stack+0x98/0xc0
[11975.244035] [] dump_header.isra.9+0x60/0x1a4
[11975.249860] [] oom_kill_process+0x26c/0x44c
[11975.255594] [] out_of_memory+0x2e0/0x328
[11975.261070] [] __alloc_pages_nodemask+0x93c/0xa60
[11975.267324] [] filemap_fault+0x1a4/0x490
[11975.272800] [] ext4_filemap_fault+0x34/0x50
[11975.278536] [] __do_fault+0x3c/0xb4
[11975.283578] [] handle_mm_fault+0xb18/0x15b0
[11975.289312] [] do_page_fault+0x1c8/0x444
[11975.294787] [] do_mem_abort+0x40/0xa0
[11975.300001] [] do_el0_ia_bp_hardening+0x58/0x60
[11975.306083] [] el0_ia+0x18/0x1c
[11975.310928] Mem-Info:
[11975.313224] active_anon:634550 inactive_anon:3164 isolated_anon:0
active_file:342 inactive_file:423 isolated_file:7
unevictable:4 dirty:0 writeback:0 unstable:0
slab_reclaimable:8763 slab_unreclaimable:11339
mapped:2139 shmem:3749 pagetables:3909 bounce:0
free:9643 free_pcp:61 free_cma:4954
[11975.346389] DMA free:26432kB min:2888kB low:3608kB high:4332kB active_anon:653744kB inactive_anon:3588kB active_file:192kB inactive_file:1768kB unevictable:0kB isolated(anon):0kB isolated(file):28kB present:2078720kB managed:2050448kB mlocked:0kB dirty:0kB writeback:0kB mapped:1784kB shmem:4112kB slab_reclaimable:7276kB slab_unreclaimable:8448kB kernel_stack:2128kB pagetables:4104kB unstable:0kB bounce:0kB free_pcp:164kB local_pcp:0kB free_cma:19816kB writeback_tmp:0kB pages_scanned:12172 all_unreclaimable? yes
[11975.391647] lowmem_reserve: 0 5843 5843
[11975.395716] Normal free:12296kB min:8444kB low:10552kB high:12664kB active_anon:1884456kB inactive_anon:9068kB active_file:320kB inactive_file:2284kB unevictable:16kB isolated(anon):0kB isolated(file):0kB present:6129664kB managed:5984124kB mlocked:16kB dirty:0kB writeback:0kB mapped:5452kB shmem:10884kB slab_reclaimable:27776kB slab_unreclaimable:36908kB kernel_stack:6896kB pagetables:11532kB unstable:0kB bounce:0kB free_pcp:776kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:19332 all_unreclaimable? yes
[11975.441543] lowmem_reserve: 0 0 0
[11975.445082] DMA: 494kB (UEC) 338kB (UEC) 4716kB (UMEC) 3332kB (UMEC) 8764kB (UMC) 33128kB (MC) 18256kB (C) 15512kB (C) 21024kB (C) 02048kB 04096kB = 26396kB
[11975.460455] Normal: 17
4kB (E) 48kB (UE) 4116kB (UE) 432kB (ME) 1064kB (UME) 54128kB (M) 0256kB 0512kB 01024kB 02048kB 14096kB (H) = 12532kB
[11975.474336] 4647 total pagecache pages
[11975.478105] 0 pages in swap cache
[11975.481488] Swap cache stats: add 0, delete 0, find 0/0
[11975.486729] Free swap = 0kB
[11975.489629] Total swap = 0kB
[11975.492538] 2052096 pages RAM
[11975.495518] 0 pages HighMem/MovableOnly
[11975.499413] 43453 pages reserved
[11975.502668] 16384 pages cma reserved
[11975.506261] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[11975.514859] [ 304] 0 304 5228 2139 13 3 0 0 systemd-journal
[11975.524336] [ 309] 0 309 19420 42 6 3 0 0 lvmetad

[11976.530228] [ 5410] 1001 5410 1966 449 7 3 0 0 bash
[11976.538778] [ 6192] 1001 6192 5210212 560544 1587 10 0 0 python
[11976.547532] Out of memory: Kill process 6192 (python) score 279 or sacrifice child
[11976.555183] Killed process 6192 (python) total-vm:20840848kB, anon-rss:1945836kB, file-rss:297816kB

Interesting point in this is that the SIGKILL is delivered only when resize_image() is called with different min_side, max_side value than the default one, (800, 1333):

image, scale = resize_image(image) <-- This is OK though the memory ran out all
image, scale = resize_image(image, 400, 600) <-- This produces OOM killer deliver SIGKILL

To overcome this problem what am I supposed to do?
Is this behavior a normal situation for Jetson TX2 board with keras-ratinanet?
Am I just supposed to use default min_side, max_side value for resize_image()?

Thank you very much!

Let me write answer myself.

I asked same question on keras-retinanet slack channel.
Someone, I guess the author or major contributor of keras-retinanet, provided helpful advice.

TF_CUDNN_USE_AUTOTUNE=0

Running python script with this environment variable resolves this issue.
With this variable set, the system memory don’t run out. So the OOM situation doesn’t meet at all.

This is actually keras-retinanet issue I think though…

Thank you…