Hi,
This is a follow up of this topic.
The patch does help to reduce the panic rate, but seems that it now panics with other messages related to the RTCPU.
Here is one example:
[11189.814840] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 0, flags: 0, vi-output, ar0234 10-0066: corr_er2
��BUG: camera-ip/vi5/vi5.c:415 [vi5_check_falcon_failure] "VI FALCON FAILURE: 0x40000000"
[ 11211.494570] Camera-FW on t194-rce-safe started
TCU early console enabled.
[ 11211.561272] Camera-FW on t194-rce-safe ready SHA1=571b1d9f (crt 0.742 ms, total boot 67.473 ms)
��[11192.331737] tegra194-vi5 15c10000.vi: capture status timed out
[11192.331934] tegra-camrtc-capture-vi tegra-capture-vi: vi-output, ar0234 10-0066: uncorr_err: request timed out after 2500 ms
[11192.587726] tegra194-vi5 15c10000.vi: capture status timed out
[11192.587972] tegra-camrtc-capture-vi tegra-capture-vi: vi-output, ar0234 9-0067: uncorr_err: request timed out after 2500 ms
[11192.591774] tegra194-vi5 15c10000.vi: failed to update control callback
[11192.592059] tegra-camrtc-capture-vi tegra-capture-vi: vi capture setup failed
[11193.615754] tegra194-vi5 15c10000.vi: capture control message timed out
[11194.639802] tegra194-vi5 15c10000.vi: capture control message timed out
[11194.640018] tegra194-vi5 15c10000.vi: csi_stream_release: failed to disable nvcsi tpg on stream 2 virtual channel 3
[11194.891751] tegra194-vi5 15c10000.vi: capture status timed out
[11194.892010] tegra-camrtc-capture-vi tegra-capture-vi: vi-output, ar0234 10-0066: uncorr_err: request timed out after 2500 ms
[11195.147748] tegra194-vi5 15c10000.vi: capture status timed out
[11195.147968] tegra-camrtc-capture-vi tegra-capture-vi: vi-output, ar0234 9-0067: uncorr_err: request timed out after 2500 ms
[11195.659815] tegra194-vi5 15c10000.vi: capture control message timed out
[11195.660089] tegra194-vi5 15c10000.vi: vi_capture_release: release channel IVC failed
[11195.660402] WARNING: CPU: 1 PID: 143673 at /home/aartavia/nvidia/jp_symb/Linux_for_Tegra/sources/kernel/nvidia/drivers/media/pla0
[11195.661780] ---[ end trace 34c9af4bd827e0b2 ]---
��[ 11217.235011] Camera-FW on t194-rce-safe started
TCU early console enabled.
[ 11217.314872] Camera-FW on t194-rce-safe ready SHA1=571b1d9f (crt 0.741 ms, total boot 80.631 ms)
��[11195.744776] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[11195.745005] Mem abort info:
[11195.745117] ESR = 0x96000004
[11195.745203] EC = 0x25: DABT (current EL), IL = 32 bits
[11195.745327] SET = 0, FnV = 0
[11195.745419] EA = 0, S1PTW = 0
[11195.745494] Data abort info:
[11195.745566] ISV = 0, ISS = 0x00000004
[11195.745658] CM = 0, WnR = 0
[11195.745767] user pgtable: 4k pages, 48-bit VAs, pgdp=000000025f82c000
[11195.745922] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
[11195.746098] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[11195.746247] Modules linked in: xt_nat(E) xt_tcpudp(E) veth(E) xt_conntrack(E) xt_MASQUERADE(E) nf_conntrack_netlink(E) nfnetlink)
[11195.746569] snd_soc_simple_card_utils(E) tegra210_adma(E) nvadsp(E) userspace_alert(E) snd_soc_tegra210_ahub(E) snd_hda_codec_h]
[11195.830969] CPU: 1 PID: 143673 Comm: capture_source: Tainted: G W E 5.10.192-tegra #20
[11195.840142] Hardware name: Symbot (DT)
[11195.844086] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=--)
[11195.850133] pc : vi_capture_request_unpin+0x44/0xd0
[11195.854855] lr : vi_capture_request_unpin+0x34/0xd0
[11195.860094] sp : ffff8000238c39f0
[11195.863514] x29: ffff8000238c39f0 x28: ffff17ac1f6d7200
[11195.868929] x27: 0000000040045612 x26: 0000000000000000
[11195.874444] x25: ffff8000238c3d08 x24: ffff17acea193698
[11195.879696] x23: ffff17acea193588 x22: ffff17abcac56288
[11195.885209] x21: ffff17abcac56000 x20: 0000000000000000
[11195.890978] x19: 0000000000000000 x18: 0000000000000000
[11195.896234] x17: 0000000000000000 x16: 0000000000000000
[11195.901658] x15: 000000000000004a x14: 000000000000004b
[11195.907341] x13: 0000000000000001 x12: 0000000000000500
[11195.912421] x11: ffffda9dd2ef2bc8 x10: 0000000000000ab0
[11195.917934] x9 : ffff8000238c3370 x8 : ffff17ac6d1a4610
[11195.923613] x7 : 0000000000000001 x6 : 000000519de002af
[11195.929041] x5 : ffffda9dd231f968 x4 : 0000000000000001
[11195.934466] x3 : 0000000000000000 x2 : ffff17ac6d1a3b00
[11195.939546] x1 : 0000000000000000 x0 : 0000000000000000
[11195.945141] Call trace:
[11195.947626] vi_capture_request_unpin+0x44/0xd0
[11195.952147] vi_capture_shutdown+0x8c/0x100
[11195.956346] vi_channel_close_ex+0x30/0x90
[11195.960145] vi5_channel_start_streaming+0x1a8/0x3d0
[11195.964929] tegra_channel_start_streaming+0x54/0x80
[11195.969764] vb2_start_streaming+0x74/0x160
[11195.974019] vb2_core_streamon+0x9c/0x1a0
[11195.978072] vb2_streamon+0x34/0x80
[11195.981719] vb2_ioctl_streamon+0x58/0x70
[11195.985490] v4l_streamon+0x40/0x50
[11195.988988] __video_do_ioctl+0x188/0x400
[11195.993188] video_usercopy+0x280/0x7e0
[11195.997233] video_ioctl2+0x40/0x100
[11196.000708] v4l2_ioctl+0x68/0x90
[11196.003865] __arm64_sys_ioctl+0xac/0xf0
[11196.007629] el0_svc_common.constprop.0+0x80/0x1d0
[11196.012520] do_el0_svc+0x38/0xc0
[11196.015680] el0_svc+0x1c/0x30
[11196.018908] el0_sync_handler+0xa8/0xb0
[11196.022875] el0_sync+0x16c/0x180
[11196.025922] Code: 52801901 f94156a0 9ba17e61 8b010014 (b8616800)
[11196.032476] ---[ end trace 34c9af4bd827e0b3 ]---
[11196.055043] Kernel panic - not syncing: Oops: Fatal exception
[11196.055241] SMP: stopping secondary CPUs
[11196.055434] Kernel Offset: 0x5a9dc0e60000 from 0xffff800010000000
[11196.055627] PHYS_OFFSET: 0xffffe85580000000
[11196.056711] CPU features: 0x48240002,03802a30
[11196.060733] Memory Limit: none
[11196.075954] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---
Notice the ‘Camera-FW’ prints before the crash. We have not seen those before.
We have also encountered, albeit way less frequent, the “RTCPU gone bad”, like this one.
Also we do see a reduced rate, but besides the above mentioned ‘new’ errors we still see from time to time the errors discussed on the previous topic.
From the 0001-Camera-fix-kernel-warning-after-VI-timeout.patch we see that the intent is to reduce the amount of error cases and thus avoiding restarting the vi-channel. Could you help use get some clarity on the following:
- Why are we getting nulls/invalid data from the rtcpu in the first place, could we resend the capture request multiple times until we get a valid response? or perhaps some kind of flag to indicate that the rtcpu perhaps is not ready and needs some time?
- If we rate limit restarts further, could it help the rtcpu errors?
- Is there a rtcpu firmware that behaves better with multi-cam restarts?
Currently we are on Jetpack 5.1.3. We didn’t see improvements moving to 5.1.4. We have yet to try 5.1.5 and 5.1.6. Are they multi-cam improvements there? to see if we can push for testing the newer Jetpacks.
Regards,
Andres
Embedded SW Team Lead at RidgeRun
Contact us: support@ridgerun.com
Developers wiki: https://developer.ridgerun.com/
Website: www.ridgerun.com