Vi-output, tc35 100% loading and No Video Stream with tc358840 4K HDMI to MIPI

Hi Sir,

The environment: Xavier NX module with our carrier board.
Software: Jetpack 4.6

We use the gstreamer to get the video stream from Toshiba TC358840 4K
The test command:

gst-launch-1.0 v4l2src device=/dev/video0 \
  ! 'video/x-raw, width=3840, height=2160, framerate=30/1, format=UYVY' \
  ! fpsdisplaysink text-overlay=false video-sink=xvimagesink -v

At the beginning, it runs very well,
But at the end of it, the video stream suddenly stops a few hours later. (Time is not fixed.)
We use the top command and it’s loading is 100%.

The CPU temperature is 4X.

Kernel log

[26592.593412] tegra194-vi5 15c10000.vi: no reply from camera processor
[26592.593571] tegra194-vi5 15c10000.vi: uncorr_err: request timed out after 2500 ms
[26592.593715] tegra194-vi5 15c10000.vi: err_rec: attempting to reset the capture channel
[26592.599329] ------------[ cut here ]------------
[26592.599465] WARNING: CPU: 3 PID: 8706 at /mnt/hd2/sk_nx/32.6.1/Linux_for_Tegra/source/public/kernel_src/kernel/nvidia/drivers/media/platform/tegra/camera/vi/capture_vi_channel.c:452 vi_channel_close_ex+0x80/0x88
[26592.599775] Modules linked in: bnep fuse xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack br_netfilter zram overlay userspace_alert nvgpu ip_tables x_tables

[26592.599877] CPU: 3 PID: 8706 Comm: vi-output, tc35 Not tainted 4.9.253-tegra #4
[26592.599882] Hardware name: NVIDIA Jetson Xavier NX Developer Kit (DT)
[26592.599888] task: ffffffc1e2854600 task.stack: ffffffc1bc7dc000
[26592.599894] PC is at vi_channel_close_ex+0x80/0x88
[26592.599900] LR is at vi_channel_close_ex+0x40/0x88
[26592.599905] pc : [<ffffff8008b39a88>] lr : [<ffffff8008b39a48>] pstate: a0c00045
[26592.599914] sp : ffffffc1bc7dfd10
[26592.599918] x29: ffffffc1bc7dfd10 x28: 0000000000000000 
[26592.599930] x27: 0000000000000000 x26: 0000000000000000 
[26592.599940] x25: 0000000000000000 x24: 0000000000000000 
[26592.599952] x23: ffffffc1e0550048 x22: ffffffc1e04a9228 
[26592.599962] x21: ffffffc1e04a9218 x20: ffffffc1979e9b00 
[26592.599972] x19: 0000000000000000 x18: 000000000000834f 
[26592.599982] x17: 0000000000000002 x16: 0000000000000003 
[26592.599992] x15: 0000000000000255 x14: 0000000000000000 
[26592.600002] x13: 0000000000000000 x12: 0000000000000400 
[26592.600013] x11: 0000000000000400 x10: 0000000000000000 
[26592.600028] x9 : ffffffc1bc7dfa80 x8 : 0000000000000001 
[26592.600042] x7 : ffffffc171146880 x6 : ffffffc1979e9cc0 
[26592.600056] x5 : ffffff8008b393fc x4 : ffffffbf06679d10 
[26592.600070] x3 : 000000010010000c x2 : ffffffc199e77400 
[26592.600084] x1 : 0000000000000000 x0 : 0000000000000009 

[26592.600100] ---[ end trace 45f70b62c5844dd7 ]---
[26592.600185] Call trace:
[26592.600194] [<ffffff8008b39a88>] vi_channel_close_ex+0x80/0x88
[26592.600201] [<ffffff8008b3af48>] vi5_channel_error_recover+0x48/0x1c8
[26592.600210] [<ffffff8008b30018>] tegra_channel_error_recover+0x58/0x90
[26592.600216] [<ffffff8008b3b808>] tegra_channel_kthread_capture_dequeue+0xf8/0x1c0
[26592.600225] [<ffffff80080db09c>] kthread+0xec/0xf0
[26592.600232] [<ffffff80080838a0>] ret_from_fork+0x10/0x30
[26592.602121] tegra194-vi5 15c10000.vi: fatal: error recovery failed

Can you give us some suggestions?
kernel_log.txt (86.4 KB)

Yours Sincerely,
S.K.

hello sk1977.huang,

please refer to Topic 187824 to include the fixes for NULL pointer dereference,
thanks

Hi Jerry,

My kernel source code has the patch now.

My kernel log doesn’t have this line.
[ 473.935223] Unable to handle kernel NULL pointer dereference at virtual address 00000010

Yours Sincerely,
S.K.

It already exists.

Hi Jerry,

Could I ask you two questions for Debugging direction?
I saw other topics and it can reset the capture channel successfully.
1.Does it include Toshiba TC358840 Hardware block or just NVIDIA software side?
2.When it can’t reset successfully, will I get Vi-output, tc35 100% loading?

tegra194-vi5 15c10000.vi: no reply from camera processor
[26592.593571] tegra194-vi5 15c10000.vi: uncorr_err: request timed out after 2500 ms
[26592.593715] tegra194-vi5 15c10000.vi: err_rec: attempting to reset the capture channel

Thanks.
Yours Sincerely,
S.K.

hello sk1977.huang,

VI engine expect the camera stream coming continuously without failures.
this error, no reply from camera processor, it means the intermittently sensor signaling.

did you verify the stream with previous release image? had you confirm this is a regression with JP-4.6?
could you please also boost the system as performance mode to verify the status.

Hi Jerry,

  1. We focus on the VI engine now.
    I try to enable RCE traces and get the log.

    Another thing, If the Eye Diagram of MIPI CSI-2 D-PHY between Toshiba TC358840 and Nvidia Xavier NX is dirty, is it possible to stop the VI engine?

2 Because we build our kernel, I am sure it uses Jetpack 4.6 and boost the system as performance mode.

Thanks.
Yours Sincerely,
S.K.

hello sk1977.huang,

what do you mean “Eye Diagram is dirty”?
there’s error handling mechanism in JetPack-4.6 release, you should check kernel logs and check the behaviors.
thanks

Hi Jerry,

  1. what do you mean “ Eye Diagram is dirty ”?
    → I think it’s the bus signal quality. We just doubt it.

160dmesg.txt (91.6 KB)

From our 3rd parity log,
I saw the string " Unable to handle kernel NULL pointer dereference at virtual address 00000000" this time.

It had patch originally in l4t 32.6.1 kernel source code and I didn’t change it.
3rd parity used the same kernel image.

[   70.229674] tegra194-vi5 15c10000.vi: no reply from camera processor
[   70.229822] tegra194-vi5 15c10000.vi: uncorr_err: request timed out after 2500 ms
[   70.229960] tegra194-vi5 15c10000.vi: err_rec: attempting to reset the capture channel
[   70.236268] ------------[ cut here ]------------
[   70.236398] WARNING: CPU: 0 PID: 7950 at /mnt/hd2/sk_nx/32.6.1/Linux_for_Tegra/source/public/kernel_src/kernel/nvidia/drivers/media/platform/tegra/camera/vi/capture_vi_channel.c:452 vi_channel_close_ex+0x80/0x88
[   70.236694] Modules linked in: bnep fuse xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack br_netfilter zram overlay cdc_acm uvcvideo userspace_alert nvgpu ip_tables x_tables

[   70.236765] CPU: 0 PID: 7950 Comm: vi-output, tc35 Tainted: G        W       4.9.253-tegra #4
[   70.236769] Hardware name: NVIDIA Jetson Xavier NX Developer Kit (DT)
[   70.236773] task: ffffffc1f2d3aa00 task.stack: ffffffc179a1c000
[   70.236778] PC is at vi_channel_close_ex+0x80/0x88
[   70.236782] LR is at vi_channel_close_ex+0x40/0x88
[   70.236796] pc : [<ffffff8008b39a88>] lr : [<ffffff8008b39a48>] pstate: a0c00045
[   70.236800] sp : ffffffc179a1fd10
[   70.236805] x29: ffffffc179a1fd10 x28: 0000000000000000 
[   70.236813] x27: 0000000000000000 x26: 0000000000000000 
[   70.236820] x25: 0000000000000000 x24: 0000000000000000 
[   70.236827] x23: ffffffc1f3c3e048 x22: ffffffc1f08b9628 
[   70.236834] x21: ffffffc1f08b9618 x20: ffffffc1743cab00 
[   70.236841] x19: 0000000000000000 x18: 0000000000005f2a 
[   70.236848] x17: 0000000000000002 x16: 0000000000000003 
[   70.236855] x15: 0000000000000109 x14: 000000000000239b 
[   70.236863] x13: 0000000000002cfa x12: 0000000000000400 
[   70.236870] x11: 0000000000000400 x10: 0000000000000000 
[   70.236878] x9 : ffffffc179a1fa80 x8 : 0000000000000000 
[   70.236885] x7 : 0000000000000000 x6 : ffffffc1acfb9a18 
[   70.236892] x5 : ffffff8008b393fc x4 : ffffffbf06b2b610 
[   70.236900] x3 : 000000018010000f x2 : ffffffc1acad9000 
[   70.236908] x1 : 0000000000000000 x0 : 0000000000000009 

[   70.236918] ---[ end trace e2194609ea65de03 ]---
[   70.236998] Call trace:
[   70.237003] [<ffffff8008b39a88>] vi_channel_close_ex+0x80/0x88
[   70.237008] [<ffffff8008b3af48>] vi5_channel_error_recover+0x48/0x1c8
[   70.237014] [<ffffff8008b30018>] tegra_channel_error_recover+0x58/0x90
[   70.237019] [<ffffff8008b3b808>] tegra_channel_kthread_capture_dequeue+0xf8/0x1c0
[   70.237026] [<ffffff80080db09c>] kthread+0xec/0xf0
[   70.237032] [<ffffff80080838a0>] ret_from_fork+0x10/0x30
[   70.238367] tegra194-vi5 15c10000.vi: fatal: error recovery failed
[   86.571265] tc358840 0-001f: tc358840_process_isr:
[   86.571512] tc358840 0-001f: tc358840_isr: IntStatus = 0x0200
[   86.575457] tc358840 0-001f: tc358840_hdmi_clk_int_handler: CLK_INT = 0x20
[   86.575465] tc358840 0-001f: tc358840_hdmi_clk_int_handler: DE size or position has changed
[   86.581786] tc358840 0-001f: tc358840_format_change: New format: 3840x2160p30.0 (4400x2250)
[   86.581794] tc358840 0-001f: tc358840_s_dv_timings():
[   86.581801] tc358840 0-001f: tc358840_s_dv_timings: 3840x2160p30.0 (4400x2250)
[   86.581806] tc358840 0-001f: tc358840_s_dv_timings: no change
[   95.598913] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[   95.599084] Mem abort info:
[   95.599615]   ESR = 0x96000005
[   95.599677]   Exception class = DABT (current EL), IL = 32 bits
[   95.599775]   SET = 0, FnV = 0
[   95.599851]   EA = 0, S1PTW = 0
[   95.599906] Data abort info:
[   95.599959]   ISV = 0, ISS = 0x00000005
[   95.600026]   CM = 0, WnR = 0
[   95.600086] user pgtable: 4k pages, 39-bit VAs, pgd = ffffffc1edd43000
[   95.600191] [0000000000000000] *pgd=0000000000000000, *pud=0000000000000000
[   95.600341] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[   95.600439] Modules linked in: bnep fuse xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack br_netfilter zram overlay cdc_acm uvcvideo userspace_alert nvgpu ip_tables x_tables
[   95.601339] CPU: 2 PID: 7942 Comm: queue4:src Tainted: G        W       4.9.253-tegra #4
[   95.601934] Hardware name: NVIDIA Jetson Xavier NX Developer Kit (DT)
[   95.606492] task: ffffffc1e589f000 task.stack: ffffffc1cf400000
[   95.612532] PC is at exit_creds+0x2c/0x78
[   95.616726] LR is at __put_task_struct+0x4c/0x140
[   95.621188] pc : [<ffffff80080de15c>] lr : [<ffffff80080aee9c>] pstate: 60400045
[   95.628709] sp : ffffffc1cf403a00
[   95.631688] x29: ffffffc1cf403a00 x28: ffffffc1e589f000 
[   95.637118] x27: ffffff8008f72000 x26: ffffffc1eb314200 
[   95.642629] x25: ffffffc1f0177868 x24: ffffffc1f09fa2e0 
[   95.648231] x23: 0000000000000001 x22: ffffffc1f69de018 
[   95.653647] x21: ffffffc1f2d3aa30 x20: 0000000000000000 
[   95.658983] x19: ffffffc1f2d3aa00 x18: 00000000ffffffe4 
[   95.664758] x17: 0000000000000000 x16: 000000000000000e 
[   95.670532] x15: 0000000000000000 x14: 000000000000ac3a 
[   95.676220] x13: 0000000000000000 x12: 0000000000008bb0 
[   95.681480] x11: 000000000000ac3a x10: 0000000000000a10 
[   95.687513] x9 : ffffffc1cf403850 x8 : 0000000000000008 
[   95.693282] x7 : ffffff800954c000 x6 : 0000000000000001 
[   95.698544] x5 : 0000000000000800 x4 : 0000000000000000 
[   95.704133] x3 : 00000000000000d0 x2 : 0000000000000000 
[   95.709470] x1 : 0000000000000000 x0 : 00000000ffffffff 

[   95.716211] Process queue4:src (pid: 7942, stack limit = 0xffffffc1cf400000)
[   95.722865] Call trace:
[   95.725228] [<ffffff80080de15c>] exit_creds+0x2c/0x78
[   95.729779] [<ffffff80080aee9c>] __put_task_struct+0x4c/0x140
[   95.735119] [<ffffff80080dbbfc>] kthread_stop+0x1e4/0x1e8
[   95.740454] [<ffffff8008b3b160>] vi5_channel_stop_kthreads+0x40/0x58
[   95.746317] [<ffffff8008b3b244>] vi5_channel_stop_streaming+0xcc/0xd0
[   95.752441] [<ffffff8008b2ceb4>] tegra_channel_stop_streaming+0x34/0x48
[   95.758919] [<ffffff8008b251cc>] __vb2_queue_cancel+0x34/0x188
[   95.764254] [<ffffff8008b266f4>] vb2_core_queue_release+0x2c/0x58
[   95.770376] [<ffffff8008b28d74>] _vb2_fop_release+0x84/0xa0
[   95.775454] [<ffffff8008b2e84c>] tegra_channel_close+0x64/0x140
[   95.781146] [<ffffff8008b02140>] v4l2_release+0x48/0xa0
[   95.786652] [<ffffff800825dcb8>] __fput+0x90/0x1d0
[   95.791461] [<ffffff800825de70>] ____fput+0x20/0x30
[   95.796276] [<ffffff80080d8dac>] task_work_run+0xbc/0xd8
[   95.801529] [<ffffff80080b8408>] do_exit+0x2e0/0xa88
[   95.806687] [<ffffff80080b8c40>] do_group_exit+0x40/0xa8
[   95.811767] [<ffffff80080c64ec>] get_signal+0xbc/0x750
[   95.816933] [<ffffff800808add8>] do_signal+0x130/0x500
[   95.822437] [<ffffff800808b320>] do_notify_resume+0x90/0xb0
[   95.828039] [<ffffff800808379c>] work_pending+0x8/0x10
[   95.832858] ---[ end trace e2194609ea65de04 ]---
[   95.847142] Fixing recursive fault but reboot is needed!

Yours Sincerely,
S.K.

hello sk1977.huang,

there’s error handling mechanism as you can see via kernel logs,
for example,

BTW,
may I know how you apply the kernel image updates, did you load the kernel image via partition or file system?
please refer to CBoot session and check the [Kernel Boot Sequence Using extlinux.conf], it loads the kernel binary file from the file system if you had LINUX entry.

Hi Jerry,

I use the file system to update the kernel image & DTB via extlinux.conf (/boot/extlinux/).

I also ignore .sig error flow without any sig file from the debug console.

Yours Sincerely,
S.K.

hello sk1977.huang,

those files load from extlinux.conf won’t be sign/encrypted.
you should ensure the fixes for NULL pointer dereference has included,
please also add some debug logs to narrow down the root cause,
thanks

test1208.txt (36.8 MB)
Hi Jerry,

Could you give us some suggestions for the RCE traces log?
(after 10 hrs, we got the same bug issue.)

echo 10000 > /sys/kernel/debug/tracing/buffer_size_kb
echo 1 > /sys/kernel/debug/tracing/tracing_on
echo 1 > /sys/kernel/debug/tracing/events/trgra_rtcpu/enable
echo 3 > /sys/kernel/debug/camrtc/log-level

cat /sys/kernel/debug/tracing/trace

Please see the test1208.txt


When I ran the commands, I can’t get any log message.

echo > /sys/kernel/debug/tracing/trace
cat /sys/kernel/debug/tracing/trace

Please see the test1208-3.txt.
test1208-3.txt (432 Bytes)

  1. I wll addd the new string for the kernel.

Yours Sincerely,
S.K.

hello sk1977.huang,

are you having gang mode? I see couple of SOF/EOF sending to VI engine at a time.
BTW, there’s CHANSEL_NOMATCH failures.
for example,

rtcpu_vinotify_error: tstamp:3019212127550 tag:CHANSEL_NOMATCH channel:0x01 frame:0 vi_tstamp:3019212124561 data:0x000003c9
rtcpu_vinotify_error: tstamp:3019212128223 tag:CHANSEL_NOMATCH channel:0x04 frame:0 vi_tstamp:3019212124562 data:0x000003c9

may I also know what’s your pipeline to access the camera stream,
could you please narrow down this by testing with v4l2 standard controls.
thanks

Hi Jerry,

Yes, we are having gang mode.
Because Toshiba TC358840 4K resolution need to use the gang mode.
l4t 32.5.2 don’t support it, but l4t 32.6.1 support it.

may I also know what’s your pipeline to access the camera stream,
→ We use gstreamer pipeline.
For example,
gst-launch-1.0 v4l2src device=/dev/video0
! ‘video/x-raw, width=3840, height=2160, framerate=30/1, format=UYVY’
! fpsdisplaysink text-overlay=false video-sink=fakevideosink -v

gst-launch-1.0 v4l2src device=/dev/video0
! ‘video/x-raw, width=3840, height=2160, framerate=30/1, format=UYVY’
! fpsdisplaysink text-overlay=false video-sink=xvimagesink -v

Yours Sincerely,
S.K.

hello sk1977.huang,

could you please have a try configure negative values for CAPTURE_TIMEOUT_MS, which means it keeps waiting for coming frames forever.
for example,

static void vi5_capture_dequeue(struct tegra_channel *chan,
        struct tegra_channel_buffer *buf)
{
...
                /* Dequeue a frame and check its capture status */
                err = vi_capture_status(chan->tegra_vi_channel[vi_port], CAPTURE_TIMEOUT_MS);

Hi Jerry,

  1. After changing CAPTURE_TIMEOUT_MS to -1, these log files are the latest status.
    The video stream still stopped.
    The kernel error message is different.
[23684.618451] INFO: task vi-output, tc35:8523 blocked for more than 120 seconds.
[23684.618630]       Not tainted 4.9.253-tegra #5
[23684.618709] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[23684.618840] vi-output, tc35 D    0  8523      2 0x00000000
[23684.618855] Call trace:
[23684.618879] [<ffffff800808604c>] __switch_to+0x9c/0xc0
[23684.618894] [<ffffff8008f60bb0>] __schedule+0x238/0x7d0
[23684.618903] [<ffffff8008f61188>] schedule+0x40/0xa8
[23684.618914] [<ffffff8008f6432c>] schedule_timeout+0x1ec/0x420
[23684.618953] [<ffffff8008f61dd8>] wait_for_common+0xa0/0x140
[23684.618960] [<ffffff8008f61ea0>] wait_for_completion+0x28/0x38
[23684.618969] [<ffffff8008b396c8>] vi_capture_status+0xc0/0x170
[23684.618976] [<ffffff8008b3a9d0>] vi5_capture_dequeue+0x110/0x400
[23684.618982] [<ffffff8008b3b7a0>] tegra_channel_kthread_capture_dequeue+0x90/0x1c0
[23684.618990] [<ffffff80080db09c>] kthread+0xec/0xf0
[23684.618997] [<ffffff80080838a0>] ret_from_fork+0x10/0x30

The RCE traces log is still the main string"tag:CHANSEL_NOMATCH".

     kworker/0:0-12387 [000] .... 59276.404803: rtcpu_vinotify_error: tstamp:1853055816022 tag:CHANSEL_NOMATCH channel:0x04 frame:0 vi_tstamp:1853055813039 data:0x000003c9
     kworker/0:0-12387 [000] .... 59276.404804: rtcpu_vinotify_error: tstamp:1853055816689 tag:CHANSEL_NOMATCH channel:0x01 frame:0 vi_tstamp:1853055813039 data:0x000003c9
     kworker/0:0-12387 [000] .... 59276.404805: rtcpu_vinotify_event: tstamp:1853056352737 tag:CHANSEL_NOMATCH channel:0x04 frame:0 vi_tstamp:1853055813039 data:0x000003c9
     kworker/0:0-12387 [000] .... 59276.404805: rtcpu_vinotify_event: tstamp:1853056352895 tag:CHANSEL_NOMATCH channel:0x01 frame:0 vi_tstamp:1853055813039 data:0x000003c9
     kworker/0:0-12387 [000] .... 59276.404805: rtcpu_vinotify_error: tstamp:1853056858734 tag:CHANSEL_NOMATCH channel:0x01 frame:0 vi_tstamp:1853056855751 data:0x000003c9
     kworker/0:0-12387 [000] .... 59276.404806: rtcpu_vinotify_error: tstamp:1853056859428 tag:CHANSEL_NOMATCH channel:0x04 frame:0 vi_tstamp:1853056855752 data:0x000003c9

dmesg1209.txt (85.7 KB)
test1209.txt (36.4 MB)

  1. Sorry about that. I checked the power mode again. 10W Desktop is not the performance level. I already changed it to 20W 6CORE.
    Do the test again now.

Thanks.
Yours Sincerely,
S.K.

hello sk1977.huang,

it looks like issue happened after steaming for 5-hours, and VI waits till callstack.
may I know what’s your actual use-case, are you keep sending data to this HDMI2CSI bridge? or, you’re having plug/unplug to test the signaling?

Hi Jerry,

[Our actual use-case]
We are keep sending data to this HDMI2CSI bridge.
We don’t plug/unplug to test the signaling.(It always plug the HDMI)

Hardware Block
Sony camera(4K) → HDMI2CSI(TC3588840 4K) → NVIDIA Xavier NX

Thanks.
Yours Sincerely,
S.K.

Hi Jerry,

After 24 hrs, the video stream is still alive with the power mode 20W 6CORE.

Thanks.
Yours Sincerely,
S.K.