JP 5.1 Frame drop issues

Hi ,

I have recently ported form L4T 32.5 to 35.1. I have a Ti IWR6843 connected through FPGA board. Everything is working fine with old BSP version. But with 35.1 I am seeing frame discarding issues. I have not done any changes everything is same as old version, only minor device tree changes.
This issue is very critical, at some time capture seems fine. Most of the times its failing. There is no consistency… Please help on resolving this issue.

Dmesg log:
[ +0.388101] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 1, flags: 0, err_data 512
[ +0.225822] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 1, flags: 0, err_data 512
[ +0.225059] tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 1, flags: 0, err_data 512
[ +0.088986] tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: flags 2, err_data 0
[ +0.042756] [RCE] VM0 deactivating.VM0 activating.ERROR: camera-ip/vi5/vi5.c:745 [vi5_handle_eof] “General error queue is out of sync with frame queue. ts=1578797785024 sof_ts=1579068182560 gerror_code=2 gerror_data=800062 notify_bits=40000”
[ +2.639956] tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: request timed out after 2500 ms
[ +0.000283] tegra-camrtc-capture-vi tegra-capture-vi: err_rec: attempting to reset the capture channel
[ +0.001008] (NULL device *): vi_capture_control_message: NULL VI channel received
[ +0.000210] t194-nvcsi 13e10000.host1x:nvcsi@15a00000: csi5_stream_close: Error in closing stream_id=2, csi_port=2
[ +0.000254] (NULL device *): vi_capture_control_message: NULL VI channel received
[ +0.000281] t194-nvcsi 13e10000.host1x:nvcsi@15a00000: csi5_stream_open: VI channel not found for stream- 2 vc- 0
[ +0.000495] tegra-camrtc-capture-vi tegra-capture-vi: err_rec: successfully reset the capture channel

Trace log after boosting the clocks:
trace_log.txt (742.2 KB)

it’s unsuccessful capture, VI driver reported warnings to requeue the buffer and dropping current frames.
you may see-also Camera Driver Porting section to review your driver since kernel version is now moving to 5.10 for Jetpack-5.

Hi @JerryChang ,

I have gone through the camera Porting document.
I have no changes to be done in VI or NVCSI level. My Radar device is connected over i2c and is very minimal driver. Everything is working well with 4.9 kernel and there were no much changes to be done in the driver level with 5.10 kernel except some minimal changes in the device tree. Some times the capture is proper but in some boots I am seeing this frame corruption issue. This will remain the same for every capture I do.

Thanks,
Shreyas

And also one more thing to add on, when I remove my kernel driver module first time I am seeing a crash in the dmesg. This is observed only when I rmmod for the first time and is not seen later.
Log:
kernel_crash_rmmod.txt (3.9 KB)

may I know what’s your radar frames looks like?
you may see-also Topic 252910 for running v4l pipeline to access radar frames with 16384x1.

Its just Raw ADC data we are capturing into .bin file using gstreamerr command. We are capturing for 2048X1024 resolution.
The same dts and kernel driver as kernel 4.9. The captured images has some valid data, but in between some frames are 0’s. Then again some valid data and 0’s.

hello shreyas.pa,

is the frame rate output as expected from low-level driver?

please refer to Applications Using V4L2 IOCTL Directly by using V4L2 IOCTL to verify basic camera functionality.
the sample pipeline dump only 1 frame as raw file, you may revise stream-count to fetch the stream to examine the real frame-rate.
for example, $ v4l2-ctl -d /dev/video0 --set-fmt-video=width=1920,height=1080,pixelformat=RG10 --set-ctrl bypass_mode=0 --stream-mmap --stream-count=100

Hi @JerryChang , My gstreamer command is :
gst-launch-1.0 v4l2src device=/dev/video1 num-buffers=6 ! ‘video/x-bayer,format=bggr,width=2048,height=1024’ ! filesink location=/home/plato/Ms_Plato_RBB_V8.0/bin/app/…/…/imgFiles/image_20230628_1687937828.bin &. But if I modify the v4l2 command accordingly gives “The pixelformat ‘BGGR16’ is invalid”. and tried with RG10 & RG16 both gives invalid format.
Also, gave a try with - v4l2-ctl -d /dev/video1 --set-fmt-video=width=2048,height=1024,pixelformat=BYR2 --set-ctrl bypass_mode=0 --stream-mmap --stream-count=100. Does not display anything, waited for sometime and closed ctrl+c.

Can you provide a proper command to run using v4l2 util.

hello shreyas.pa,

may I know what’s the sensor format dumps, $ v4l2-ctl -d /dev/video0 --list-formats-ext

Hi @JerryChang ,
ioctl: VIDIOC_ENUM_FMT
Type: Video Capture

[0]: 'BYR2' (16-bit Bayer BGBG/GRGR (Exp.))
	Size: Discrete 2048x1024
		Interval: Discrete 0.067s (15.000 fps)

v4l2-ctl -d /dev/video1 --list-formats-ext
ioctl: VIDIOC_ENUM_FMT
Type: Video Capture

[0]: 'BYR2' (16-bit Bayer BGBG/GRGR (Exp.))
	Size: Discrete 2048x1024
		Interval: Discrete 0.067s (15.000 fps)

is there any failures from kernel side?
please setup a terminal to gather the kernel logs, i.e.$ dmesg --follow

Hello @JerryChang ,
No failures from dmesg. I will attach my complete dmesg log here.
dmesg.log (63.7 KB)

Thanks,
Shreyas

here’re failures.
it’s discarding frame error report by VI, capture engine is in error and it must reset.

 tegra-camrtc-capture-vi tegra-capture-vi: corr_err: discarding frame 1, flags: 0, err_data 512
 tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: flags 2, err_data 0
 [RCE] ERROR: camera-ip/vi5/vi5.c:745 [vi5_handle_eof] "General error queue is out of sync with frame queue. ts=96360432320 sof_ts=96360559808 gerror_code=2 gerror_data=800062 notify_bits=40000"
 tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: request timed out after 2500 ms
 tegra-camrtc-capture-vi tegra-capture-vi: err_rec: attempting to reset the capture channel
 (NULL device *): vi_capture_control_message: NULL VI channel received
 t194-nvcsi 13e10000.host1x:nvcsi@15a00000: csi5_stream_close: Error in closing stream_id=2, csi_port=2
 (NULL device *): vi_capture_control_message: NULL VI channel received
 t194-nvcsi 13e10000.host1x:nvcsi@15a00000: csi5_stream_open: VI channel not found for stream- 2 vc- 0
 tegra-camrtc-capture-vi tegra-capture-vi: err_rec: successfully reset the capture channel

BTW,
JetPack-5.0.2 / l4t-r35.1 isn’t quite stable, and we’ve revise several camera bugs.
is it possible for moving to the latest Jetpack release version, such as JP-5.1.1/l4t-r35.3.1 for confirmation?

Recently we have moved from 32.5 to 35.2.1 L4T. So it will be very difficult to move again to 35.3.1 as of now. I will check. But from the dmesg log can you point out what could have gone wrong? Anything suspicious?

hello shreyas.pa,

there’s signaling sending to CSI channel, however, it’s something wrong for capture channel to encountered an unrecoverable state.

BTW,
are you using GPIOs? please aware k-5.10 having different allocation ranges,
you may see-also GPIO Changes section.

Yes I have already ported the GPIO’s. GPIo mappings have been changed from 4.9 to 5.10.
Sometimes I am able to capture properly, but most of the times seeing this issue. But if it exists then it will always show up.

hello shreyas.pa,

may I know what’s the scenarios between these success/failure captures?
please see-also Topic 253615, it’s also report the issue with FPGA device that can only capture frames occasionally.
is it possible to program a software reset, or toggle power-supply before capture the frames.

Hello @JerryChang , I tried with a different setup and seems to be working fine here. Tried multiple times, occasionally seeing the frame drop prints (mostly during the first boot, when I run my app after booting up Jetson for the first time).
But I want to resolve the issue while removing my driver module (this usually comes for the first time while rmmod iwr6843isk):
attached log:
[ +0.000435] ------------[ cut here ]------------
[ +0.000125] refcount_t: underflow; use-after-free.
[ +0.000077] WARNING: CPU: 3 PID: 7017 at lib/refcount.c:28 refcount_warn_saturate+0xec/0x140
[ +0.000201] Modules linked in: fuse(E) xt_conntrack(E) xt_MASQUERADE(E) nf_conntrack_netlink(E) nfnetlink(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) xt_addrtype(E) iptable_filter(E) br_netfilter(E) lzo_rle(E) lzo_compress(E) zram(E) overlay(E) bnep(E) snd_soc_tegra186_asrc(E) snd_soc_tegra186_arad(E) snd_soc_tegra210_ope(E) snd_soc_tegra210_mvc(E) snd_soc_tegra186_dspk(E) snd_soc_tegra210_iqc(E) snd_soc_tegra210_afc(E) snd_soc_tegra210_dmic(E) snd_soc_tegra210_adx(E) snd_soc_tegra210_mixer(E) snd_soc_tegra210_amx(E) snd_soc_tegra210_admaif(E) snd_soc_tegra210_i2s(E) snd_soc_tegra210_sfc(E) snd_soc_tegra_pcm(E) aes_ce_blk(E) crypto_simd(E) cryptd(E) input_leds(E) aes_ce_cipher(E) ghash_ce(E) sha2_ce(E) sha256_arm64(E) sha1_ce(E) rtk_btusb(E) btusb(E) btrtl(E) realtek(E) btbcm(E) btintel(E) snd_soc_spdif_tx(E) snd_soc_tegra_machine_driver(E) leds_gpio(E) snd_soc_tegra210_adsp(E) snd_soc_tegra_utils(E) snd_soc_simple_card_utils(E)
[ +0.000217] snd_soc_tegra210_ahub(E) nvadsp(E) rtl8822ce(E) iwr6843isk(-) tegra_bpmp_thermal(E) nv_imx219(E) max77620_thermal(E) snd_hda_codec_hdmi(E) userspace_alert(E) snd_hda_tegra(E) tegra210_adma(E) snd_hda_codec(E) snd_hda_core(E) cfg80211(E) spi_tegra114(E) loop(E) binfmt_misc(E) ina3221(E) pwm_fan(E) nvgpu(E) nvmap(E) ip_tables(E) x_tables(E) [last unloaded: mtd]
[ +0.000100] CPU: 3 PID: 7017 Comm: rmmod Tainted: G OE 5.10.104-tegra #5
[ +0.000005] Hardware name: Unknown NVIDIA Jetson Xavier NX Developer Kit/NVIDIA Jetson Xavier NX Developer Kit, BIOS 2.1-32413640 01/24/2023
[ +0.000007] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=–)
[ +0.000009] pc : refcount_warn_saturate+0xec/0x140
[ +0.000006] lr : refcount_warn_saturate+0xec/0x140
[ +0.000005] sp : ffff80001b1f3bf0
[ +0.000013] x29: ffff80001b1f3bf0 x28: ffff10c300166580
[ +0.000020] x27: 0000000000000000 x26: 0000000000000000
[ +0.000037] x25: 0000000000000000 x24: 0000000000000000
[ +0.000017] x23: 0000000000000000 x22: ffffcbcf020edf70
[ +0.000018] x21: ffff10c304d29400 x20: 0000000000000000
[ +0.000016] x19: ffff10c307dcc618 x18: 0000000000000010
[ +0.000037] x17: 0000000000000000 x16: ffffcbcf00a563a0
[ +0.000018] x15: ffff10c300166af0 x14: ffffffffffffffff
[ +0.000037] x13: ffff80009b1f3827 x12: ffff80001b1f382f
[ +0.000012] x11: 0000000000000000 x10: 0000000000000000
[ +0.000026] x9 : ffff80001b1f3bf0 x8 : 75203b776f6c6672
[ +0.000012] x7 : 65646e75203a745f x6 : c0000000ffffefff
[ +0.000011] x5 : ffff10c47fe0b958 x4 : ffffcbcf01dc7968
[ +0.000010] x3 : 0000000000000001 x2 : ffff10c47fe0b960
[ +0.000011] x1 : 0000000000000000 x0 : 0000000000000000
[ +0.000010] Call trace:
[ +0.000008] refcount_warn_saturate+0xec/0x140
[ +0.000009] kobject_put+0xfc/0x110
[ +0.000006] kset_unregister+0x34/0x50
[ +0.000009] class_unregister+0x34/0x40
[ +0.000006] class_destroy+0x30/0x40
[ +0.000013] iwr6843isk_remove+0x94/0xd0 [iwr6843isk]
[ +0.000013] i2c_device_remove+0x5c/0xe0
[ +0.000006] device_release_driver_internal+0x11c/0x200
[ +0.000006] driver_detach+0x5c/0xf0
[ +0.000007] bus_remove_driver+0x60/0xb0
[ +0.000006] driver_unregister+0x38/0x60
[ +0.000006] i2c_del_driver+0x34/0x40
[ +0.000007] iwr6843isk_i2c_driver_exit+0x18/0xfc8 [iwr6843isk]
[ +0.000009] __arm64_sys_delete_module+0x18c/0x260
[ +0.000009] el0_svc_common.constprop.0+0x80/0x1d0
[ +0.000007] do_el0_svc+0x38/0xb0
[ +0.000009] el0_svc+0x1c/0x30
[ +0.000006] el0_sync_handler+0xa8/0xb0
[ +0.000006] el0_sync+0x16c/0x180
[ +0.000006] —[ end trace 55843198b7e5f4b8 ]—
This issue is not seen in the kernel 4.9 (L4T 35.2).

Thanks,
Shreyas

this topic has been solved.
please submit a new thread for tracking your follow-up question.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.