UART Irq/17-3100000.: page allocation failure

45317607465000 --- irq/17-3100000.: page allocation failure: order:0, mode:0x800(GFP_NOWAIT), nodemask=(null),cpuset=/,mems_allowed=0
45317607910000 --- CPU: 0 PID: 70936 Comm: irq/17-3100000. Tainted: G O 5.10.120-rt70-l4t-r35.4.ga+g76678311c10b #1
45317608042000 --- Hardware name: Unknown Zipline P2 Orin ZIP Compute Rev F/Zipline P2 Orin ZIP Compute Rev F, BIOS v35.4.1 08/04/2023
45317608128000 --- Call trace:
45317608179000 --- dump_backtrace+0x0/0x1d4
45317608254000 --- show_stack+0x30/0x3c
45317608328000 --- dump_stack+0xc4/0x120
45317608376000 --- warn_alloc+0xec/0x184
45317608441000 --- __alloc_pages_nodemask+0x5bc/0xb10
45317608488000 --- alloc_slab_page+0x34/0x74
45317608578000 --- allocate_slab+0xdc/0x2f0
45317608619000 --- ___slab_alloc.constprop.0+0xb4/0x314
45317608660000 --- __slab_alloc.constprop.0+0x70/0xb8
45317608699000 --- __kmalloc+0x140/0x26c
45317608860000 --- kzalloc.constprop.0+0x10/0x18
45317608920000 --- tegra_dma_prep_slave_sg+0x16c/0x2d0
45317609381000 --- dmaengine_prep_slave_single.constprop.0+0x74/0xa8
45317609515000 --- tegra_uart_start_rx_dma.isra.0+0x38/0xa4
45317609729000 --- tegra_uart_isr+0x12c/0x334
45317609867000 --- irq_forced_thread_fn+0x44/0xa4
45317609967000 --- irq_thread+0x10c/0x1b8
45317610080000 --- kthread+0x12c/0x13c
45317610130000 --- ret_from_fork+0x10/0x18
45317610232000 --- Mem-Info:
45317610336000 --- active_anon:138382 inactive_anon:787999 isolated_anon:0 active_file:124857 inactive_file:607770 isolated_file:0 unevictable:31368 dirty:2718 writeback:0 slab_reclaimable:48907 slab_unreclaimable:26405 mapped:118837 shmem:531890 pagetables:3470 bounce:0 free:10416 free_pcp:2036 free_cma:0
45317610425000 --- Node 0 active_anon:553528kB inactive_anon:3151996kB active_file:499428kB inactive_file:2431080kB unevictable:125472kB isolated(anon):0kB isolated(file):0kB mapped:475348kB dirty:10872kB writeback:0kB shmem:2127560kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:9968kB all_unreclaimable? no
45317610500000 --- DMA free:26408kB min:2684kB low:4516kB high:6348kB reserved_highatomic:2048KB active_anon:2380kB inactive_anon:119832kB active_file:37400kB inactive_file:1557956kB unevictable:0kB writepending:8032kB present:2097152kB managed:1834880kB mlocked:0kB pagetables:360kB bounce:0kB free_pcp:3544kB local_pcp:716kB free_cma:0kB
45317610554000 --- lowmem_reserve[]: 0 0 5504 5504
45317610633000 --- Normal free:15256kB min:14388kB low:20024kB high:25660kB reserved_highatomic:2048KB active_anon:551148kB inactive_anon:3032164kB active_file:462460kB inactive_file:872292kB unevictable:125472kB writepending:2584kB present:6039680kB managed:5670576kB mlocked:125472kB pagetables:13520kB bounce:0kB free_pcp:4600kB local_pcp:1016kB free_cma:0kB
45317610685000 --- lowmem_reserve[]: 0 0 0 0
45317610770000 --- DMA: 123*4kB (UMEH) 38*8kB (UMEH) 166*16kB (UEH) 259*32kB (UMEH) 149*64kB (EH) 14*128kB (UEH) 3*256kB (UEH) 3*512kB (MH) 1*1024kB (H) 0*2048kB 0*4096kB = 26396kB
45317610848000 --- Normal: 254*4kB (UMEH) 186*8kB (UH) 102*16kB (UEH) 155*32kB (UEH) 75*64kB (UH) 8*128kB (UMH) 1*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15176kB
45317610896000 --- Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
45317610971000 --- Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=32768kB
45317611072000 --- Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
45317611120000 --- Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=64kB
45317611195000 --- 1265816 total pagecache pages
45317611242000 --- 0 pages in swap cache
45317611311000 --- Swap cache stats: add 0, delete 0, find 0/0
45317611354000 --- Free swap = 0kB
45317611408000 --- Total swap = 0kB
45317611449000 --- 2034208 pages RAM
45317611656000 --- 0 pages HighMem/MovableOnly
45317611720000 --- 157844 pages reserved
45317611903000 --- 65536 pages cma reserved
45317620065000 --- 0 pages hwpoisoned
45317620327000 --- serial-tegra 3100000.serial: Not able to get desc for Rx
45317627107000 --- tegra-gpcdma 2600000.gpcdma: slave id already in use
45317628204000 --- serial-tegra 3100000.serial: Not able to get desc for Rx

We are seeing this crash on the UART DMA

We have ~3gb of “available memory” but our “free memory” gets pretty low (close to ~50mb), which seemingly causes this issue.

This is almost the exact issue here: Serial-Tegra DMA Driver Bug

But we have the fix recommend (we are on r35.4.ga)

How can we mitigate this issue?

Ideally we are able to avoid the kzalloc( … GFP_NOWAIT) as our system heavily uses page cache

We are going to attempt to mitigate the issue by keeping free memory around with

sysctl -w vm.min_free_kbytes=102400

Please let us know if there is a better fix here, thanks!

Hi akhil.veeraghanta,

Are you using the devkit or custom board for Orin NX?
Please share the result of cat /etc/nv_boot_control.conf on your board.

Have you also verified with the latest R35.6.0?
Or have you tried using UART with PIO mode instead of DMA?

We are using a custom board

$  cat /etc/nv_boot_control.conf
TNSPEC 3767-300-0001-N.2-1-0-p2-compute-orin-zip-revf-nvme0n1p1
COMPATIBLE_SPEC 3767--0001--1--p2-compute-orin-zip-revf-
TEGRA_CHIPID 0x23
TEGRA_OTA_BOOT_DEVICE /dev/mtdblock0
TEGRA_OTA_GPT_DEVICE /dev/mtdblock0

Or have you tried using UART with PTO mode instead of DMA?

We can’t afford to use the UART in PTO mode, we have a lot of traffic coming through.

Have you also verified with the latest R35.6.0?

We can’t move to 35.6.0 release yet unfortunately, but we do see that there are some DMA improvements that we will pull in

This seems to reduce the number of DMA RX interrupts which is great, but doesn’t touch the dma’s kzalloc(… GFP_NOWAIT).

From the documentation: Memory Allocation Guide — The Linux Kernel documentation

  • If the allocation is performed from an atomic context, e.g interrupt handler, use GFP_NOWAIT. This flag prevents direct reclaim and IO or filesystem operations. Consequently, under memory pressure GFP_NOWAIT allocation is likely to fail. Users of this flag need to provide a suitable fallback to cope with such failures where appropriate.

Maybe we can fallback to GFP_ATOMIC if GFP_NOWAIT fails? Or have some other graceful recovery mechanism, right now the behaviour is that the driver crashes

The old driver for kernel 4.9 does not use GFP_NOWAIT and relys on GFP_ATOMIC and preallocation which seems more enticing

Is this a drop in replacement to forward port this? Or do we need to be on the new driver?

Is there any issue if you use PIO mode?

Could you help to get a devkit and try to verify with the latest R35.6.0?
If it works with R35.6.0, we can help to clarify and find out if there’s possible patches helping for this issue.

Or please help to provide the detailed reproduce steps for us to verify.

We have a lot of traffic coming through, DMA is definetly prefered. Because PIO mode bypasses DMA it definetly won’t hit this issue

We’ve had good success with increasing to sysctl -w vm.min_free_kbytes=102400 and we will call that a win, thanks for you suggestions!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.