Orin R35.1 kernel crash when capture camera image failed

Hi everyone,
I use Orin module the Bsp version is 35.1. Sometimes when capture camera image failed the kernel crash the oops log as below. How to fix this issue? Thanks.
kerncrash.log (32.8 KB)

hello kenny_234,

it’s due to channel context at 0 is busy.
may I know what’s the capture pipeline you’re using? is there stability issue on the camera stream?

I use VI Stream 0 + CSI Port 0 as video0, VI Stream 2 + CSI Port 2 as video1, VI Stream 4 + CSI Port 4 as video2, VI Stream 5 + CSI Port 6 as video3. I use 2 separate processes to capture video2 and video3 image.
Do you mean the channel context at 0 is busy when I call stream on ioctl?

hello kenny_234,

did you meant these video2 and video3 streams were captured by different process?
can you please also confirm you can have single process to capture the frame successfully.

HI Jerry,
Yes, I capterue video2 and video3 by different 2 process. And this issue occer also in sinagle process capture video2 and video3.

hello kenny_234,

you may narrow down the issue for using V4L2 IOCTL to verify basic functionality,
could you please refer to developer guide, Applications Using V4L2 IOCTL Directly for sample command-line.
thanks

Hi Jerry,

My test case is to execute in a loop as below: open video device → VIDIOC_STREAMON ->capture 5 image ->VIDIOC_STREAMOFF → close video device.
I think the iusse occur when two video device exute VIDIOC_STREAMON or VIDIOC_STREAMOFF command .

hello kenny_234,

could you please try include this kernel patch to VI-5 driver for confirmation,
0001-vi5-fix-v4l2-VI-driver-channel-open.patch (4.8 KB)

Hi Jerry,
Thank you for your reply, in this patch call filp = filp_open(chanFilePath, O_RDONLY, 0);when close the chanFile? Thanks.

Hi Jerry,
I use this patch the issue also occur.

Hi Jerry,
When crash the oops as below,

[38460.892762] ------------[ cut here ]------------
[38460.892965] WARNING: CPU: 0 PID: 1605 at /home/ubuntu/JetPack5.0.2/Linux_for_Tegra/Orin_kernel_35.1/kernel/nvidia/drivers/platform/tegra/rtcpu/capture-ivc.c:178 tegra_capture_ivc_notify_chan_id+0xd4/0x1e0
[38460.893881] ---[ end trace c1e49d9f8543d9d0 ]---
[38460.894039] tegra194-vi5 13e40000.host1x:vi1@14c00000: failed to update control callback
[38460.894314] tegra-camrtc-capture-vi tegra-capture-vi: vi capture setup failed
[38461.917043] tegra194-vi5 13e40000.host1x:vi1@14c00000: capture control message timed out
[38462.876790] tegra194-vi5 13e40000.host1x:vi0@15c00000: capture control message timed out
[38462.877248] tegra-camrtc-capture-vi tegra-capture-vi: vi capture setup failed
[38462.942233] tegra194-vi5 13e40000.host1x:vi1@14c00000: capture control message timed out
[38462.942528] tegra194-vi5 13e40000.host1x:vi1@14c00000: csi_stream_release: failed to disable nvcsi tpg on stream 2 virtual channel 0
[38463.197182] tegra194-vi5 13e40000.host1x:vi0@15c00000: capture control message timed out
[38463.197534] tegra-camrtc-capture-vi tegra-capture-vi: vi capture setup failed
[38463.677062] tegra194-vi5 13e40000.host1x:vi1@14c00000: capture control message timed out
[38463.677466] tegra-camrtc-capture-vi tegra-capture-vi: vi capture setup failed
[38463.900782] tegra194-vi5 13e40000.host1x:vi0@15c00000: capture control message timed out
[38463.901093] tegra194-vi5 13e40000.host1x:vi0@15c00000: csi_stream_release: failed to disable nvcsi tpg on stream 0 virtual channel 0
[38463.965009] tegra194-vi5 13e40000.host1x:vi1@14c00000: capture control message timed out
[38463.965270] tegra194-vi5 13e40000.host1x:vi1@14c00000: vi_capture_release: release channel IVC failed
[38463.965621] WARNING: CPU: 1 PID: 1605 at /home/ubuntu/JetPack5.0.2/Linux_for_Tegra/Orin_kernel_35.1/kernel/nvidia/drivers/media/platform/tegra/camera/fusa-capture/capture-vi.c:961 vi_capture_release+0x2a8/0x2d0
[38463.966679] ---[ end trace c1e49d9f8543d9d1 ]---
[38464.028208] ------------[ cut here ]------------
[38464.028399] WARNING: CPU: 1 PID: 1605 at /home/ubuntu/JetPack5.0.2/Linux_for_Tegra/Orin_kernel_35.1/kernel/nvidia/drivers/platform/tegra/rtcpu/capture-ivc.c:311 tegra_capture_ivc_unregister_capture_cb+0xd0/0xf0
[38464.029354] ---[ end trace c1e49d9f8543d9d2 ]---
[38464.029514] ------------[ cut here ]------------
[38464.029692] WARNING: CPU: 1 PID: 1605 at /home/ubuntu/JetPack5.0.2/Linux_for_Tegra/Orin_kernel_35.1/kernel/nvidia/drivers/platform/tegra/rtcpu/capture-ivc.c:264 tegra_capture_ivc_unregister_control_cb+0x11c/0x140
[38464.030713] ---[ end trace c1e49d9f8543d9d3 ]---
[38464.030875] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[38464.031144] Mem abort info:
[38464.031231]   ESR = 0x96000006
[38464.031323]   EC = 0x25: DABT (current EL), IL = 32 bits
[38464.032155]   SET = 0, FnV = 0
[38464.032690]   EA = 0, S1PTW = 0
[38464.033274] Data abort info:
[38464.035531]   ISV = 0, ISS = 0x00000006
[38464.039483]   CM = 0, WnR = 0
[38464.042547] user pgtable: 4k pages, 48-bit VAs, pgdp=00000001329dc000
[38464.049020] [0000000000000000] pgd=00000001326bc003, p4d=00000001326bc003, pud=000000012ab71003, pmd=0000000000000000
[38464.059791] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[38464.065548] Modules linked in: nvidia_modeset(OE) fuse(E) lzo_rle(E) lzo_compress(E) zram(E) ramoops(E) reed_solomon(E) snd_soc_tegra186_asrc(E) snd_soc_tegra186_arad(E) snd_soc_tegra210_iqc(E) snd_soc_tegra186_dspk(E) snd_soc_tegra210_ope(E) snd_soc_tegra210_dmic(E) snd_soc_tegra210_mvc(E) snd_soc_tegra210_afc(E) snd_soc_tegra210_adx(E) snd_soc_tegra210_amx(E) snd_soc_tegra210_admaif(E) snd_soc_tegra210_i2s(E) snd_soc_tegra_pcm(E) snd_soc_tegra210_sfc(E) snd_soc_tegra210_mixer(E) aes_ce_blk(E) crypto_simd(E) cryptd(E) aes_ce_cipher(E) ghash_ce(E) sha2_ce(E) loop(E) sha256_arm64(E) snd_soc_tegra210_adsp(E) sha1_ce(E) snd_soc_tegra_machine_driver(E) snd_soc_tegra_utils(E) nvadsp(E) ofpart(E) snd_hda_codec_hdmi(E) snd_soc_spdif_tx(E) snd_soc_simple_card_utils(E) leds_gpio(E) snd_soc_tegra210_ahub(E) cmdlinepart(E) userspace_alert(E) camera_control(OE) tegra210_adma(E) snd_hda_tegra(E) nct1008(E) tegra_bpmp_thermal(E) snd_hda_codec(E) qspi_mtd(E) snd_hda_core(E) mtd(E) i40e(E) nvidia(OE)
[38464.065632]  binfmt_misc(E) ina3221(E) pwm_fan(E) nvgpu(E) nvmap(E) ip_tables(E) x_tables(E)
[38464.160834] CPU: 1 PID: 1605 Comm: Test Tainted: G        W  OE     5.10.104-tegra #1007.14
[38464.169494] Hardware name:  /, BIOS 1.0-d7fb19b 08/10/2022
[38464.175007] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=--)
[38464.181046] pc : vi_capture_request_unpin+0x40/0xd0
[38464.186034] lr : vi_capture_request_unpin+0x30/0xd0
[38464.191018] sp : ffff80001c14ba00
[38464.194430] x29: ffff80001c14ba00 x28: ffff2fbe0158f700 
[38464.199943] x27: 0000000040045612 x26: 0000000000000000 
[38464.205457] x25: ffff80001c14bd08 x24: ffff2fc3ff8b2680 
[38464.210969] x23: ffff2fc3ff8b2570 x22: ffff2fbd53743a88 
[38464.216482] x21: ffff2fbd53743800 x20: 0000000000000000 
[38464.220759] tegra194-vi5 13e40000.host1x:vi0@15c00000: capture control message timed out
[38464.221907] x19: 0000000000000000 x18: 0000000000000000 
[38464.229886] tegra194-vi5 13e40000.host1x:vi0@15c00000: csi_stream_release: failed to disable nvcsi tpg on stream 4 virtual channel 0
[38464.235382] x17: 0000000000000000 x16: 0000000000000000 
[38464.235386] x15: ffff2fbd0fe35c70 x14: ffffffffffffffff 
[38464.235390] x13: ffffbfbff1deade8 x12: ffffbfbff1deaa18 
[38464.235394] x11: 64695f6e61686320 x10: 3a62635f65727574 
[38464.264783] video4linux video2: work in tegra_channel_close
[38464.269070] x9 : ffff80001c14b890 x8 : 6c64692073692030 
[38464.269074] x7 : 206c656e6e616863 x6 : c000000100018bdc 
[38464.269078] x5 : ffff2fc42c52f958 x4 : ffffbfbff1ae7968 
[38464.290682] x3 : 0000000000000000 x2 : ffff2fbd0fe35700 
[38464.296194] x1 : 0000000000000000 x0 : 0000000000000000 
[38464.301533] Call trace:
[38464.304157]  vi_capture_request_unpin+0x40/0xd0
[38464.308881]  vi_capture_shutdown+0x84/0xf0
[38464.312994]  vi_channel_close_ex+0x2c/0x80
[38464.317196]  vi5_channel_start_streaming+0x164/0x3b0
[38464.322008]  tegra_channel_start_streaming+0x74/0x8c
[38464.326821]  vb2_start_streaming+0x6c/0x150
[38464.330847]  vb2_core_streamon+0x98/0x1a0
[38464.334871]  vb2_streamon+0x30/0x80
[38464.338544]  vb2_ioctl_streamon+0x54/0x60
[38464.342569]  v4l_streamon+0x3c/0x50
[38464.346070]  __video_do_ioctl+0x180/0x3f0
[38464.350269]  video_usercopy+0x27c/0x790
[38464.354033]  video_ioctl2+0x3c/0x180
[38464.357533]  v4l2_ioctl+0x64/0x90
[38464.360945]  __arm64_sys_ioctl+0xa8/0xf0
[38464.364709]  el0_svc_common.constprop.0+0x7c/0x1c0
[38464.369346]  do_el0_svc+0x34/0xa0
[38464.372757]  el0_svc+0x1c/0x30
[38464.375734]  el0_sync_handler+0xa8/0xb0
[38464.379582]  el0_sync+0x16c/0x180
[38464.382910] Code: 52801901 f94156a0 9ba17e61 8b010014 (b8616800) 
[38464.389223] ---[ end trace c1e49d9f8543d9d4 ]---
[38464.405416] Kernel panic - not syncing: Oops: Fatal exception
[38464.405584] SMP: stopping secondary CPUs
[38464.405889] Kernel Offset: 0x3fbfdfe20000 from 0xffff800010000000
[38464.409418] PHYS_OFFSET: 0xffffd04400000000
[38464.413448] CPU features: 0x0040006,4a80aa38
[38464.417732] Memory Limit: none
[38464.432181] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---

hello kenny_234,

please refer to Topic 187824 and check you’ve this fix included for NULL pointer dereference.

besides,
please check whether you’re having different process access to the same stream?
do you see the system stuck after kernel panic?
if yes, it looks like a known bug (we’re having internal tracking, no fixes yet). you cannot have different apps access to the same camera stream.

Hi Jerry,
I have checked that vi_capture_shutdown() fucntion fix NULL pointer, and our different process access diffent stram. I saw cpu reboot after kernel panic the .

hello kenny_234,

how about enable terminals for running V4L2 IOCTL to verify basic functionality,
for example,
$ v4l2-ctl -d /dev/video0 --set-fmt-video=width=1920,height=1080,pixelformat=RG10 --set-ctrl bypass_mode=0 --stream-mmap --stream-count=100

Hi Jerry,
This issue same as this link.

hello kenny_234,

as you can see, we had fix some known issue of multiple cameras stream-off process, and this change also included with r35.1 code-line.
could you please test with v4l standard utility to see you still able to repo the stability issue?
please also share your test procedure and also test results for reference, thanks

Hi Jerry,
I use v4l standard utility test, I wrote 4 shell shell scripts and run those scripts in 4 terminals. The script as below, in script file the /dev/videox is /dev/video0~3.
I tested that the kernel did not crash within 2 hours, when one or more cameras fail to capture, it will continue until all four cameras are stop capturing.

# !/bin/bash
count=1
while true; do
    echo $count
    count=$((count + 1))
    ./v4l2-ctl -d /dev/videox --set-fmt-video=width=8192,height=6144,pixelformat=RG10 --set-ctrl bypass_mode=0  --stream-mmap --stream-count=30

done

hello kenny_234,

is your camera use-case launching four cameras with image resolution at 8192x6144? also, what’s their frame-rate is?
I’m afraid this is not supported since we test multi-cam use-case with dual 4K at 60-fps.

Hi Jerry,
Yes our use-case is capture four cameras with image resolution at 8192x6144, the frame rate is 20fps.

hello kenny_234,

this is beyond software claim features, could you please try reduce number of cameras.
for example, can this failure seen with single cam, or dual cam?