Jetson Orin NX unhandled context fault when streaming from v4l2 device

Hello,

I’m using JetPack 5.1.4, Jetson L4T 35.6.0 on a custom board, based on the Orin NX with 2 Intel realsense D457 cameras connected over GMSL links. I’m encountering the following unhandled context fault whenever I start a stream using v4l2-ctl:

[  371.233362] d4xx 9-000a: ds5_dfu_device_release(): no communication with d4xx
[  371.604867] d4xx 9-003a: start streaming failed, exit on timeout
[  371.605448] d4xx 9-003a: IMU stream toggle failed! 0 status 0x0000
[  371.606655] ------------[ cut here ]------------
[  371.606662] WARNING: CPU: 4 PID: 1669 at /drivers/media/common/videobuf2/videobuf2-core.c:1568 vb2_start_streaming+0xd0/0x154
[  371.606729] Modules linked in: xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c xt_addrtype iptable_filter ip_tables x_tables br_netfilter overlay cfg80211 spi_nor aes_ce_blk crypto_simd cryptd aes_ce_cipher ghash_ce sha2_ce sha256_arm64 sha1_ce fusb301 mttcan userspace_alert tegra_bpmp_thermal can_dev d4xx at24 spi_tegra114 r8168 nfsd sbd(O) pwm_fan nvgpu nvmap ina3221 fuse nfnetlink
[  371.606890] CPU: 4 PID: 1669 Comm: python3 Tainted: G        W  O      5.10.120-rt70-l4t-r35.4.ga+g76678311c10b #1
[  371.606899] Hardware name: NVIDIA Orin NX SoM, Mezzanine and Baseboard/Jetson, BIOS v35.6.0 09/17/2024
[  371.606904] pstate: 20400009 (nzCv daif +PAN -UAO -TCO BTYPE=--)
[  371.606909] pc : vb2_start_streaming+0xd0/0x154
[  371.606912] lr : vb2_start_streaming+0x6c/0x154
[  371.606914] sp : ffff800017e3bb40
[  371.606916] x29: ffff800017e3bb40 x28: ffffcbcafc3afb58 
[  371.606918] x27: 0000000000000000 x26: 0000000000000000 
[  371.606920] x25: ffff35448f7d2c00 x24: ffff35448299e1f8 
[  371.606923] x23: 0000000000000001 x22: ffff800017e3bcb8 
[  371.606925] x21: ffffcbcafc3afb58 x20: ffff35448299e938 
[  371.606926] x19: 00000000fffffff5 x18: ffffffffffffffff 
[  371.606929] x17: 0000000000000000 x16: ffffcbcafc207d0c 
[  371.606931] x15: 0000000000000004 x14: 0000000000000000 
[  371.606934] x13: 0000000000000000 x12: 071c71c71c71c71c 
[  371.606938] x11: 0000000000000040 x10: 0000000000000ab0 
[  371.606939] x9 : ffff800017e3b890 x8 : ffff800017e3b890 
[  371.606942] x7 : 00000000ffffffff x6 : 002ab7884435ffff 
[  371.606943] x5 : e64184fd6396ee6f x4 : ffff354480f88000 
[  371.606945] x3 : 0000000100000000 x2 : 0000000000000000 
[  371.606947] x1 : ffffcbcafd164000 x0 : ffff35448f7999f0 
[  371.606950] Call trace:
[  371.606956]  vb2_start_streaming+0xd0/0x154
[  371.606960]  vb2_core_streamon+0x94/0x19c
[  371.606964]  vb2_ioctl_streamon+0x58/0xa0
[  371.606971]  v4l_streamon+0x40/0x50
[  371.606982]  __video_do_ioctl+0x330/0x400
[  371.606985]  video_usercopy+0x174/0x5bc
[  371.606989]  video_ioctl2+0x40/0xc0
[  371.606991]  v4l2_ioctl+0x68/0x90
[  371.606994]  __arm64_sys_ioctl+0xb0/0xf0
[  371.607012]  el0_svc_common.constprop.0+0x80/0x1c0
[  371.607028]  do_el0_svc+0x38/0xa0
[  371.607031]  el0_svc+0xc/0x14
[  371.607048]  el0_sync_handler+0x100/0x10c
[  371.607050]  el0_sync+0x16c/0x180
[  371.607057] ---[ end trace 0000000000000003 ]---
[  372.628212] d4xx 9-001a: set pipe 1, data_type1: 0x1e, 			 data_type2: 0x12, vc_id: 1
[  373.149376] d4xx 9-000a: set pipe 0, data_type1: 0x1e, 			 data_type2: 0x12, vc_id: 0
[  373.298118] arm-smmu 10000000.iommu: Unhandled context fault: fsr=0x402, iova=0x60090000, fsynr=0x790011, cbfrsynra=0x402, cb=0
[  373.298228] mc-err: unknown mcerr fault, int_status=0x00101000, ch_int_status=0x00000000, hubc_int_status=0x00000000 sbs_int_status=0x00000000, hub_int_status=0x00000000
[  373.355059] arm-smmu 10000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x60090000, fsynr=0x790001, cbfrsynra=0x402, cb=0
[  373.355089] mc-err: unknown mcerr fault, int_status=0x00101040, ch_int_status=0x00000000, hubc_int_status=0x00000000 sbs_int_status=0x00000000, hub_int_status=0x00000000
[  373.355192] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 0, flags: 0, err_data 131072, vc: 1
[  373.355482] (NULL device *): vi_capture_control_message: NULL VI channel received
[  373.355495] t194-nvcsi 13e40000.host1x:nvcsi@15a00000: csi5_stream_close: Error in closing stream_id=0, csi_port=0
[  373.355512] (NULL device *): vi_capture_control_message: NULL VI channel received
[  373.355514] t194-nvcsi 13e40000.host1x:nvcsi@15a00000: csi5_stream_open: VI channel not found for stream- 0 vc- 1
[  373.362730] arm-smmu 10000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x600d0000, fsynr=0x790001, cbfrsynra=0xc02, cb=0
[  373.362760] arm-smmu 10000000.iommu: Unhandled context fault: fsr=0x402, iova=0x60090000, fsynr=0x790011, cbfrsynra=0x402, cb=0
[  373.362765] mc-err: unknown mcerr fault, int_status=0x00101040, ch_int_status=0x00000000, hubc_int_status=0x00000000 sbs_int_status=0x00000000, hub_int_status=0x00000000
[  373.364801] arm-smmu 10000000.iommu: Unhandled context fault: fsr=0x402, iova=0x600f0000, fsynr=0x790011, cbfrsynra=0x2, cb=0
[  373.364822] mc-err: unknown mcerr fault, int_status=0x00101000, ch_int_status=0x00000000, hubc_int_status=0x00000000 sbs_int_status=0x00000000, hub_int_status=0x00000000
[  373.389770] arm-smmu 10000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x600f0000, fsynr=0x790001, cbfrsynra=0x2, cb=0
[  373.389796] mc-err: Too many MC errors; throttling prints
[  373.389856] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 0, flags: 0, err_data 131072, vc: 0
[  373.390400] (NULL device *): vi_capture_control_message: NULL VI channel received
[  373.390404] t194-nvcsi 13e40000.host1x:nvcsi@15a00000: csi5_stream_close: Error in closing stream_id=0, csi_port=0
[  373.390409] (NULL device *): vi_capture_control_message: NULL VI channel received
[  373.390410] t194-nvcsi 13e40000.host1x:nvcsi@15a00000: csi5_stream_open: VI channel not found for stream- 0 vc- 0
[  373.396855] arm-smmu 10000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x60100000, fsynr=0x790001, cbfrsynra=0x802, cb=0
[  373.396892] arm-smmu 10000000.iommu: Unhandled context fault: fsr=0x402, iova=0x600f0000, fsynr=0x790011, cbfrsynra=0x2, cb=0
[  373.421783] arm-smmu 10000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x60090000, fsynr=0x790011, cbfrsynra=0x402, cb=0
[  373.423074] arm-smmu 10000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x600f0000, fsynr=0x790011, cbfrsynra=0x2, cb=0
[  378.320248] arm_smmu_context_fault: 680 callbacks suppressed

Based on the error message, something is accessing a memory mapped I/O address that was not allocated to 10000000.iommu. Checking the device tree, the only nodes using this IOMMU unit are the video input nodes vi0 and vi1. However, the IOMMU groups they were assigned to appear to be part of a different IOMMU unit - 8000000.iommu.

	iommu@10000000 {
		compatible = "arm,mmu-500\0nvidia,smmu-500";
		reg = <0x00 0x10000000 0x00 0x1000000>;
		#global-interrupts = <0x01>;
		interrupts = <0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04 0x00 0xf0 0x04>;
		stream-match-mask = <0x7f80>;
		#iommu-cells = <0x01>;
		tlb-inv-throttle;
		status = "okay";
		phandle = <0x05>;
	};

	iommu@8000000 {
		compatible = "arm,mmu-500\0nvidia,tegra194-smmu";
		reg = <0x00 0x8000000 0x00 0x1000000 0x00 0x7000000 0x00 0x1000000>;
		#global-interrupts = <0x02>;
		interrupts = <0x00 0xee 0x04 0x00 0xf2 0x04 0x00 0xee 0x04 0x00 0xf2 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04 0x00 0xee 0x04>;
		stream-match-mask = <0x7f80>;
		#iommu-cells = <0x01>;
		status = "okay";
		phandle = <0x56>;
	};

		vi0@15c00000 {
			compatible = "nvidia,tegra234-vi";
			clocks = <0x02 0xa6>;
			clock-names = "vi";
			nvidia,vi-falcon-device = <0x43>;
			iommus = <0x05 0x02>;
			non-coherent;
			status = "okay";
			phandle = <0x73>;
		};

		vi1@14c00000 {
			compatible = "nvidia,tegra234-vi";
			clocks = <0x02 0xa6>;
			clock-names = "vi";
			nvidia,vi-falcon-device = <0x44>;
			iommus = <0x05 0x04>;
			non-coherent;
			status = "okay";
			phandle = <0x75>;
		};

dmesg shows the following

[    1.939356] tegra194-vi5 13e40000.host1x:vi0@15c00000: Adding to iommu group 38
[    1.939992] tegra194-vi5 13e40000.host1x:vi1@14c00000: Adding to iommu group 39

and the following for the reserved regions

cat /sys/kernel/iommu_groups/38/reserved_regions 
0x0000000008000000 0x00000000080fffff msi

cat /sys/kernel/iommu_groups/39/reserved_regions 
0x0000000008000000 0x00000000080fffff msi

Here is the relevant content of /proc/iomem

08000000-08ffffff : 8000000.iommu iommu@8000000
10000000-10ffffff : 10000000.iommu iommu@10000000

My current assessment is, for whatever reason, the video input units are assigned the wrong memory regions, despite what the device tree indicates. Could I get support in understanding 1) is my assessment correct, 2) why this situation happened and 3) how I can fix it ?

Thank you

hello tvu4,

please see-also Topic 305007 to apply kernel panic fix when VI trying to recover camera stream.
besides.. you may refer to developer guide, To verify the port binding result.

Thank you Jerry for your reply.

I tried applying the two patches you mentioned but unfortunately it did not fix the unhandled context fault. I’m still seeing the same error message in dmesg.

[  285.327936] d4xx 9-000a: set pipe 0, data_type1: 0x1e, 			 data_type2: 0x12, vc_id: 0
[  285.544844] arm-smmu 10000000.iommu: Unhandled context fault: fsr=0x402, iova=0x60090000, fsynr=0x770011, cbfrsynra=0x402, cb=0
[  285.544976] mc-err: unknown mcerr fault, int_status=0x00101000, ch_int_status=0x00000000, hubc_int_status=0x00000000 sbs_int_status=0x00000000, hub_int_status=0x00000000
[  285.569569] arm-smmu 10000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x60090000, fsynr=0x770001, cbfrsynra=0x402, cb=0
[  285.569594] mc-err: unknown mcerr fault, int_status=0x00101040, ch_int_status=0x00000000, hubc_int_status=0x00000000 sbs_int_status=0x00000000, hub_int_status=0x00000000
[  285.569661] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 0, flags: 0, err_data 131072, vc: 0
[  285.576619] arm-smmu 10000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x600d0000, fsynr=0x770001, cbfrsynra=0xc02, cb=0
[  285.576647] mc-err: unknown mcerr fault, int_status=0x00101040, ch_int_status=0x00000000, hubc_int_status=0x00000000 sbs_int_status=0x00000000, hub_int_status=0x00000000
[  285.576660] arm-smmu 10000000.iommu: Unhandled context fault: fsr=0x402, iova=0x60090000, fsynr=0x770011, cbfrsynra=0x402, cb=0
[  285.576679] mc-err: unknown mcerr fault, int_status=0x00101000, ch_int_status=0x00000000, hubc_int_status=0x00000000 sbs_int_status=0x00000000, hub_int_status=0x00000000
[  285.602878] arm-smmu 10000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x60090000, fsynr=0x770011, cbfrsynra=0x402, cb=0
[  285.602904] mc-err: Too many MC errors; throttling prints
[  285.610302] arm-smmu 10000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x600d0000, fsynr=0x770001, cbfrsynra=0xc02, cb=0
[  285.610341] arm-smmu 10000000.iommu: Unhandled context fault: fsr=0x402, iova=0x60090000, fsynr=0x770011, cbfrsynra=0x402, cb=0
[  285.636189] arm-smmu 10000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x60090000, fsynr=0x770011, cbfrsynra=0x402, cb=0

I checked the output of media-ctl like you mentioned and it looks fine to me; nothing obvious stands out.

Do you have any advice on how to troubleshoot this issue further?

Thank you

hello tvu4,

that arm-smmu fault might be due to mapped iova goes beyond 4GB,
please debug into below for confirmation.

static void vi5_setup_surface(...)
{
...
        if (chan->fmtinfo->fourcc == V4L2_PIX_FMT_NV16) {
                desc_memoryinfo->surface[1].base_address = offset + chan->format.sizeimage / 2; 

Hi JerryChang,

The base_address does go above 4GB; but doesn’t match the offending IOVA. Also, the CPU and kernel are both 64-bit so this shouldn’t be an issue, right?

[   73.626990] tegra-camrtc-capture-vi tegra-capture-vi: desc_memoryinfo->surface[0].base_address = 7fffe00000, size = 1c2000
[   73.627026] tegra-camrtc-capture-vi tegra-capture-vi: desc_memoryinfo->surface[VI_ATOMP_SURFACE_EMBEDDED].base_address = 7fff7ff000, size = a00
[   73.627081] tegra-camrtc-capture-vi tegra-capture-vi: desc_memoryinfo->surface[0].base_address = 7fffc00000, size = 1c2000
[   73.627083] tegra-camrtc-capture-vi tegra-capture-vi: desc_memoryinfo->surface[VI_ATOMP_SURFACE_EMBEDDED].base_address = 7fff7ff000, size = a00
[   73.627089] tegra-camrtc-capture-vi tegra-capture-vi: desc_memoryinfo->surface[0].base_address = 7fffa00000, size = 1c2000
[   73.627090] tegra-camrtc-capture-vi tegra-capture-vi: desc_memoryinfo->surface[VI_ATOMP_SURFACE_EMBEDDED].base_address = 7fff7ff000, size = a00
[   73.627091] tegra-camrtc-capture-vi tegra-capture-vi: desc_memoryinfo->surface[0].base_address = 7fff800000, size = 1c2000
[   73.627092] tegra-camrtc-capture-vi tegra-capture-vi: desc_memoryinfo->surface[VI_ATOMP_SURFACE_EMBEDDED].base_address = 7fff7ff000, size = a00

hello tvu4,

BTW,
is it possible to validate this on Orin NX developer kit?

Unfortunately it is not possible to validate on dev kit; there is no equivalent hardware compatible with the dev kit available.

hello tvu4,

all right.. it’s hard to debug if you cannot verify that on developer kit.
is this due to number of cameras? please try reduce the cameras for testing.
is this due to preview frames rendering? you may try fakesinkto disable preview and shows frame-rate only.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.