Jetson crashes

JetPack 4.6.2 L4T 32.7.2

Jan 23 17:14:41  kernel: [558679.096812] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x22b800000, fsynr=0x80011, cb=8, sid=56(0x38 - Unassigned SID), pgd=0, pud=0, pmd=0, p
te=0
Jan 23 17:14:41  kernel: [558679.111100] mc-err: unknown mcerr fault, int_status=0x00001040, ch_int_status=0x00000000, hubc_int_status=0x00000000
Jan 23 17:14:41  kernel: [558679.121723] mc-err: unknown mcerr fault, int_status=0x00001040, ch_int_status=0x00000000, hubc_int_status=0x00000000
Jan 23 17:14:41  kernel: [558679.132354] mc-err: unknown mcerr fault, int_status=0x00001040, ch_int_status=0x00000000, hubc_int_status=0x00000000
Jan 23 17:14:41  kernel: [558679.142972] mc-err: unknown mcerr fault, int_status=0x00001040, ch_int_status=0x00000000, hubc_int_status=0x00000000
Jan 23 17:14:41  kernel: [558679.153585] mc-err: Too many MC errors; throttling prints
Jan 23 17:14:41  kernel: [558679.287410] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x12cb20000, fsynr=0x80003, cb=8, sid=56(0x38 - Unassigned SID), pgd=15b895003, pud=15b
895003, pmd=128222003, pte=0
Jan 23 17:14:41  kernel: [558679.391163] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x22c7f0c00, fsynr=0x80001, cb=8, sid=56(0x38 - Unassigned SID), pgd=0, pud=0, pmd=0, p
te=0
Jan 23 17:14:41  kernel: [558679.408512] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x22abe0000, fsynr=0x80001, cb=8, sid=56(0x38 - Unassigned SID), pgd=0, pud=0, pmd=0, p
te=0
Jan 23 17:14:41  kernel: [558679.598031] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x12b200000, fsynr=0x80003, cb=8, sid=56(0x38 - Unassigned SID), pgd=15b895003, pud=15b
895003, pmd=128e5a003, pte=0
Jan 23 17:14:41  kernel: [558679.713468] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x247fee980, fsynr=0x80001, cb=8, sid=56(0x38 - Unassigned SID), pgd=0, pud=0, pmd=0, p
te=0
Jan 23 17:14:41  kernel: [558679.729072] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x22cbe0000, fsynr=0x80001, cb=8, sid=56(0x38 - Unassigned SID), pgd=0, pud=0, pmd=0, p
te=0
Jan 23 17:14:41  kernel: [558679.801980] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x12b200000, fsynr=0x80003, cb=8, sid=56(0x38 - Unassigned SID), pgd=15b895003, pud=15b
895003, pmd=128e5a003, pte=0
Jan 23 17:14:42  kernel: [558680.109047] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x22b800000, fsynr=0x80011, cb=8, sid=56(0x38 - Unassigned SID), pgd=0, pud=0, pmd=0, p
te=0
Jan 23 17:14:42  kernel: [558680.222425] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x12cb20000, fsynr=0x80003, cb=8, sid=56(0x38 - Unassigned SID), pgd=15b895003, pud=15b
895003, pmd=128222003, pte=0
Jan 23 17:14:45  kernel: [558683.625856] irq 66: nobody cared (try booting with the "irqpoll" option)
Jan 23 17:14:45  kernel: [558683.632640] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.9.253 #1
Jan 23 17:14:45  kernel: [558683.632643] Hardware name: LONTRA-0000 (DT)
Jan 23 17:14:45  kernel: [558683.632646] Call trace:
Jan 23 17:14:45  kernel: [558683.632656] [<ffffff800808b768>] dump_backtrace+0x0/0x1a0
Jan 23 17:14:45  kernel: [558683.632661] [<ffffff800808b98c>] show_stack+0x24/0x30
Jan 23 17:14:45  kernel: [558683.632669] [<ffffff8008cccb90>] dump_stack+0x90/0xb4
Jan 23 17:14:45  kernel: [558683.632675] [<ffffff80081213bc>] __report_bad_irq+0x3c/0xf8
Jan 23 17:14:45  kernel: [558683.632679] [<ffffff80081217fc>] note_interrupt+0x2ac/0x2f8
Jan 23 17:14:45  kernel: [558683.632683] [<ffffff800811e668>] handle_irq_event_percpu+0x50/0x60
Jan 23 17:14:45  kernel: [558683.632686] [<ffffff800811e6c8>] handle_irq_event+0x50/0x80
Jan 23 17:14:45  kernel: [558683.632690] [<ffffff80081227e4>] handle_fasteoi_irq+0xec/0x1d8
Jan 23 17:14:45  kernel: [558683.632694] [<ffffff800811d8e8>] generic_handle_irq+0x38/0x50
Jan 23 17:14:45  kernel: [558683.632697] [<ffffff800811d98c>] __handle_domain_irq+0x8c/0xf8
Jan 23 17:14:45  kernel: [558683.632701] [<ffffff8008081124>] gic_handle_irq+0x5c/0xb0
Jan 23 17:14:45  kernel: [558683.632704] [<ffffff8008082c28>] el1_irq+0xe8/0x194
Jan 23 17:14:45  kernel: [558683.632710] [<ffffff8008ba9208>] skb_try_coalesce+0x140/0x348
Jan 23 17:14:45  kernel: [558683.632715] [<ffffff8008c213a4>] tcp_try_coalesce.part.22+0x3c/0x108
Jan 23 17:14:45  kernel: [558683.632719] [<ffffff8008c215ac>] tcp_queue_rcv+0x13c/0x178
Jan 23 17:14:45  kernel: [558683.632724] [<ffffff8008c26ae0>] tcp_rcv_established+0x378/0x7c0
Jan 23 17:14:45  kernel: [558683.632728] [<ffffff8008c30d74>] tcp_v4_do_rcv+0x11c/0x298
Jan 23 17:14:45  kernel: [558683.632731] [<ffffff8008c33c78>] tcp_v4_rcv+0xa20/0xb48
Jan 23 17:14:45  kernel: [558683.632735] [<ffffff8008c0a75c>] ip_local_deliver_finish+0xe4/0x218
Jan 23 17:14:45  kernel: [558683.632738] [<ffffff8008c0ae04>] ip_local_deliver+0x54/0xf0
Jan 23 17:14:45  kernel: [558683.632741] [<ffffff8008c0a9b4>] ip_rcv_finish+0x124/0x3c0
Jan 23 17:14:45  kernel: [558683.632744] [<ffffff8008c0b128>] ip_rcv+0x288/0x408
Jan 23 17:14:45  kernel: [558683.632748] [<ffffff8008bb2840>] __netif_receive_skb_core+0x2a8/0x878
Jan 23 17:14:45  kernel: [558683.632752] [<ffffff8008bb3900>] __netif_receive_skb+0x28/0x78
Jan 23 17:14:45  kernel: [558683.632755] [<ffffff8008bb397c>] netif_receive_skb_internal+0x2c/0xb0
Jan 23 17:14:45  kernel: [558683.632759] [<ffffff8008bb3a3c>] napi_gro_complete+0x3c/0xe0
Jan 23 17:14:45  kernel: [558683.632762] [<ffffff8008bb3d7c>] dev_gro_receive+0x29c/0x400
Jan 23 17:14:45  kernel: [558683.632766] [<ffffff8008bb8ed0>] napi_gro_receive+0x40/0x188
Jan 23 17:14:45  kernel: [558683.632772] [<ffffff800893bc80>] eqos_napi_poll_rx+0x368/0x4f8
Jan 23 17:14:45  kernel: [558683.632776] [<ffffff8008bb9730>] net_rx_action+0x100/0x368
Jan 23 17:14:45  kernel: [558683.632779] [<ffffff8008081420>] __do_softirq+0x128/0x398
Jan 23 17:14:45  kernel: [558683.632784] [<ffffff80080b5cbc>] irq_exit+0xb4/0xf8
Jan 23 17:14:45  kernel: [558683.632787] [<ffffff800811d990>] __handle_domain_irq+0x90/0xf8
Jan 23 17:14:45  kernel: [558683.632790] [<ffffff8008081124>] gic_handle_irq+0x5c/0xb0
Jan 23 17:14:45  kernel: [558683.632793] [<ffffff8008082c28>] el1_irq+0xe8/0x194
Jan 23 17:14:45  kernel: [558683.632798] [<ffffff80080d6b4c>] kthread_should_stop+0x1c/0x28
Jan 23 17:14:45  kernel: [558683.632802] [<ffffff80080d661c>] kthread+0xec/0xf0
Jan 23 17:14:45  kernel: [558683.632805] [<ffffff80080838a0>] ret_from_fork+0x10/0x30
Jan 23 17:14:45  kernel: [558683.632808] handlers:
Jan 23 17:14:45  kernel: [558683.635166] [<ffffff8008ac1130>] tegra_mcerr_hard_irq threaded [<ffffff8008ac12e0>] tegra_mcerr_thread
Jan 23 17:14:45  kernel: [558683.644566] Disabling IRQ #66
Jan 23 17:14:46  kernel: [558684.061438] __arm_smmu_context_fault: 2629 callbacks suppressed
Jan 23 17:14:46  kernel: [558684.061452] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x12cb20000, fsynr=0x80003, cb=8, sid=56(0x38 - Unassigned SID), pgd=15b895003, pud=15b895003, pmd=128222003, pte=0
Jan 23 17:14:46  kernel: [558684.141812] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x22c7eca80, fsynr=0x80001, cb=8, sid=56(0x38 - Unassigned SID), pgd=0, pud=0, pmd=0, pte=0
Jan 23 17:14:46  kernel: [558684.233817] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x22abe0000, fsynr=0x80001, cb=8, sid=56(0x38 - Unassigned SID), pgd=0, pud=0, pmd=0, pte=0

I am seeing this error while running an application that transcodes video from the network and sends it to the network.

The application is unstable, the module (jetson tx 2) restarts from time to time.

Could this error be the reason for this behavior of the module?

The error looks like something hardware related.

I found questions with similar errors from other authors:

What is the cause of this error and how can I fix it?

Also

Jan 17 17:28:35  kernel: [41108.838164] irq 66: nobody cared (try booting with the "irqpoll" option)
Jan 17 17:28:35  kernel: [41108.844859] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.253 #1
Jan 17 17:28:35  kernel: [41108.844862] Hardware name: LONTRA-0000 (DT)
Jan 17 17:28:35  kernel: [41108.844865] Call trace:
old/.log-20230118.gz:Jan 17 17:28:35  kernel: [41108.844875] [<ffffff800808b768>] dump_backtrace+0x0/0x1a0
Jan 17 17:28:35  kernel: [41108.844880] [<ffffff800808b98c>] show_stack+0x24/0x30
Jan 17 17:28:35  kernel: [41108.844886] [<ffffff8008cccb90>] dump_stack+0x90/0xb4
Jan 17 17:28:35  kernel: [41108.844892] [<ffffff80081213bc>] __report_bad_irq+0x3c/0xf8
Jan 17 17:28:35  kernel: [41108.844896] [<ffffff80081217fc>] note_interrupt+0x2ac/0x2f8
Jan 17 17:28:35  kernel: [41108.844900] [<ffffff800811e668>] handle_irq_event_percpu+0x50/0x60
Jan 17 17:28:35  kernel: [41108.844903] [<ffffff800811e6c8>] handle_irq_event+0x50/0x80
Jan 17 17:28:35  kernel: [41108.844907] [<ffffff80081227e4>] handle_fasteoi_irq+0xec/0x1d8
Jan 17 17:28:35  kernel: [41108.844911] [<ffffff800811d8e8>] generic_handle_irq+0x38/0x50
Jan 17 17:28:35  kernel: [41108.844914] [<ffffff800811d98c>] __handle_domain_irq+0x8c/0xf8
Jan 17 17:28:35  kernel: [41108.844917] [<ffffff8008081124>] gic_handle_irq+0x5c/0xb0
Jan 17 17:28:35  kernel: [41108.844920] [<ffffff8008082c28>] el1_irq+0xe8/0x194
Jan 17 17:28:35  kernel: [41108.844925] [<ffffff80080b5cbc>] irq_exit+0xb4/0xf8
Jan 17 17:28:35  kernel: [41108.844928] [<ffffff800811d990>] __handle_domain_irq+0x90/0xf8
Jan 17 17:28:35  kernel: [41108.844931] [<ffffff8008081124>] gic_handle_irq+0x5c/0xb0
Jan 17 17:28:35  kernel: [41108.844933] [<ffffff8008082c28>] el1_irq+0xe8/0x194
Jan 17 17:28:35  kernel: [41108.844939] [<ffffff8008a38190>] cpuidle_enter_state+0xb8/0x380
Jan 17 17:28:35  kernel: [41108.844942] [<ffffff8008a384cc>] cpuidle_enter+0x34/0x48
Jan 17 17:28:35  kernel: [41108.844947] [<ffffff800810dd84>] call_cpuidle+0x44/0x70
Jan 17 17:28:35  kernel: [41108.844951] [<ffffff800810e104>] cpu_startup_entry+0x1a4/0x1f0
Jan 17 17:28:35  kernel: [41108.844954] [<ffffff8008ccf5b4>] rest_init+0x84/0x90
Jan 17 17:28:35  kernel: [41108.844961] [<ffffff8009100c00>] start_kernel+0x368/0x380
Jan 17 17:28:35  kernel: [41108.844965] [<ffffff8009100204>] __primary_switched+0x80/0x94
Jan 17 17:28:35  kernel: [41108.844967] handlers:
Jan 17 17:28:35  kernel: [41108.847237] [<ffffff8008ac1130>] tegra_mcerr_hard_irq threaded [<ffffff8008ac12e0>] tegra_mcerr_thread
Jan 17 17:28:35  kernel: [41108.856551] Disabling IRQ #66
Jan 17 17:28:35  kernel: [41108.859573] mc-err: unknown mcerr fault, int_status=0x00001040, ch_int_status=0x00000000, hubc_int_status=0x00000000
Jan 10 22:44:45  kernel: [61429.302365] irq 66: nobody cared (try booting with the "irqpoll" option)
Jan 10 22:44:45  kernel: [61429.309066] CPU: 0 PID: 4082 Comm: NVMDecBufProcT Not tainted 4.9.253 #1
Jan 10 22:44:45  kernel: [61429.309068] Hardware name: LONTRA-0000 (DT)
Jan 10 22:44:45  kernel: [61429.309071] Call trace:
Jan 10 22:44:45  kernel: [61429.309082] [<ffffff800808b768>] dump_backtrace+0x0/0x1a0
Jan 10 22:44:45  kernel: [61429.309087] [<ffffff800808b98c>] show_stack+0x24/0x30
Jan 10 22:44:45  kernel: [61429.309093] [<ffffff8008cccb90>] dump_stack+0x90/0xb4
Jan 10 22:44:45  kernel: [61429.309099] [<ffffff80081213bc>] __report_bad_irq+0x3c/0xf8
Jan 10 22:44:45  kernel: [61429.309103] [<ffffff80081217fc>] note_interrupt+0x2ac/0x2f8
Jan 10 22:44:45  kernel: [61429.309107] [<ffffff800811e668>] handle_irq_event_percpu+0x50/0x60
Jan 10 22:44:45  kernel: [61429.309110] [<ffffff800811e6c8>] handle_irq_event+0x50/0x80
Jan 10 22:44:45  kernel: [61429.309113] [<ffffff80081227e4>] handle_fasteoi_irq+0xec/0x1d8
Jan 10 22:44:45  kernel: [61429.309118] [<ffffff800811d8e8>] generic_handle_irq+0x38/0x50
Jan 10 22:44:45  kernel: [61429.309121] [<ffffff800811d98c>] __handle_domain_irq+0x8c/0xf8
Jan 10 22:44:45  kernel: [61429.309124] [<ffffff8008081124>] gic_handle_irq+0x5c/0xb0
Jan 10 22:44:45  kernel: [61429.309127] [<ffffff8008082c28>] el1_irq+0xe8/0x194
Jan 10 22:44:45  kernel: [61429.309132] [<ffffff80080b5cbc>] irq_exit+0xb4/0xf8
Jan 10 22:44:45  kernel: [61429.309135] [<ffffff800811d990>] __handle_domain_irq+0x90/0xf8
Jan 10 22:44:45  kernel: [61429.309138] [<ffffff8008081124>] gic_handle_irq+0x5c/0xb0
Jan 10 22:44:45  kernel: [61429.309140] [<ffffff8008082c28>] el1_irq+0xe8/0x194
Jan 10 22:44:45  kernel: [61429.309145] [<ffffff8008148038>] futex_wait_queue_me+0x48/0x138
Jan 10 22:44:45  kernel: [61429.309149] [<ffffff8008148de4>] futex_wait+0xd4/0x1c8
Jan 10 22:44:45  kernel: [61429.309152] [<ffffff800814b420>] do_futex+0x508/0xcd8
Jan 10 22:44:45  kernel: [61429.309156] [<ffffff800814bcb8>] SyS_futex+0xc8/0x168
Jan 10 22:44:45  kernel: [61429.309159] [<ffffff8008083900>] el0_svc_naked+0x34/0x38
Jan 10 22:44:45  kernel: [61429.309161] handlers:
Jan 10 22:44:45  kernel: [61429.311432] [<ffffff8008ac1130>] tegra_mcerr_hard_irq threaded [<ffffff8008ac12e0>] tegra_mcerr_thread
Jan 10 22:44:45  kernel: [61429.320744] Disabling IRQ #66
Jan 10 22:44:48  kernel: [61432.004074] __arm_smmu_context_fault: 12718 callbacks suppressed

The arm-smmu unhandled context mostly happens due to the usage of physical addresses for DMA operation expecting IOMMU - passthrough.
These occur in code which was written for x86 platforms and then tried on an ARM platform.

Linux kernel easily support IOMMU - passthrough on x86 platforms.
But for ARM platforms, enabling IOMMU-passthrough has been made tough to force programmers using the correct way i.e., using DMA addresses (IOMMU virtual addresses) instead of physical addresses.

Hence for ARM platforms, wherever physical addresses are used, we need to replace them with dma_alloc_coherent() created DMA addresses which will setup the needed IOMMU address space.

dma_alloc_coherent()

As far as I know, this function is used in the development of Linux kernel modules (not for userspace). I don’t use it in my application.

dma_alloc_coherent is used in kernel.

What i am trying to say is, there is some driver in your system which is causing this. Due to this driver issue, the application might not be working properly. The driver issue needs to be fixed. My explanation is for fixing the driver issue.

Is there any PCIe card or any card connected to Jetson ?

Is there any PCIe card or any card connected to Jetson ?

No.

Ok,
May i know which application are you running ?
Did you load any kernel module or driver using insmod or modprobe ?

Did you load any kernel module or driver using insmod or modprobe ?

No

Hi,
Do you use gstreamer or jetson_multimedia_api in the transcoding?

Hi. I’m using jetson_multimedia_api, not gstreamer.