Handling link lost/acquired events during streaming from v4l2 device

I have a custom sensor and driver which support streaming to memory via v4l2:

v4l2-ctl --set-fmt-video=width=2880,height=1089,pixelformat=RG10 --stream-mmap --set-ctrl=sensor_mode=3,bypass_mode=0 -d /dev/video0

Infrequently the serdes link is lost, but is quickly reacquired. This causes my streaming application to fail hard and I see a warning in my dmesg log. I would like to know what my deserializer driver should be communicating to v4l2 when the link is lost to prevent this from happening and recover gracefully.

------------[ cut here ]------------
Sep 14 17:09:07 aft-stereo kernel: [ 440.598565] WARNING: CPU: 3 PID: 6387 at /home/charles/nvidia/nvidia_sdk/JetPack_4.3_Linux_JETSON_TX2/Lin
ux_for_Tegra/sources/kernel/kernel-4.9/drivers/media/v4l2-core/v4l2-ioctl.c:1308 v4l_enum_fmt+0x1238/0x15a0
Sep 14 17:09:07 aft-stereo kernel: [ 440.617397] Modules linked in: v4l2loopback tmp102 zram overlay at24 ub960 chpfpga ub953 imx296 spidev nv
mem_core nvgpu bluedroid_pm ip_tables
Sep 14 17:09:07 aft-stereo kernel: [ 440.617441]
Sep 14 17:09:07 aft-stereo kernel: [ 440.617449] CPU: 3 PID: 6387 Comm: gst-plugin-scan Not tainted 4.9.140-tegra #2
Sep 14 17:09:07 aft-stereo kernel: [ 440.617453] Hardware name: quill (DT)
Sep 14 17:09:07 aft-stereo kernel: [ 440.617457] task: ffffffc1e47c1c00 task.stack: ffffffc1e0098000
Sep 14 17:09:07 aft-stereo kernel: [ 440.617463] PC is at v4l_enum_fmt+0x1238/0x15a0
Sep 14 17:09:07 aft-stereo kernel: [ 440.617468] LR is at v4l_enum_fmt+0x1238/0x15a0
Sep 14 17:09:07 aft-stereo kernel: [ 440.617473] pc : [] lr : [] pstate: 40400145
Sep 14 17:09:07 aft-stereo kernel: [ 440.617476] sp : ffffffc1e009bbc0
Sep 14 17:09:07 aft-stereo kernel: [ 440.617479] x29: ffffffc1e009bbc0 x28: 0000000000000000
Sep 14 17:09:07 aft-stereo kernel: [ 440.617488] x27: ffffffc1c6f33800 x26: 0000000000000000
Sep 14 17:09:07 aft-stereo kernel: [ 440.617496] x25: ffffff8009fdd738 x24: 0000000000000002
Sep 14 17:09:07 aft-stereo kernel: [ 440.617503] x23: 0000007fe9078738 x22: ffffffc1e4da1240
Sep 14 17:09:07 aft-stereo kernel: [ 440.617510] x21: 0000000000000000 x20: 0000000000000000
Sep 14 17:09:07 aft-stereo kernel: [ 440.617517] x19: ffffffc1e009bd10 x18: 0000000000000030
Sep 14 17:09:07 aft-stereo kernel: [ 440.617524] x17: 0000000000000000 x16: 0000000000000000
Sep 14 17:09:07 aft-stereo kernel: [ 440.617531] x15: ffffffffffffffff x14: ffffff800a158260
Sep 14 17:09:07 aft-stereo kernel: [ 440.617538] x13: 0000000000000000 x12: 0000000000000006
Sep 14 17:09:07 aft-stereo kernel: [ 440.617545] x11: 0000000000000002 x10: 000000000000034e
Sep 14 17:09:07 aft-stereo kernel: [ 440.617552] x9 : 0000000000000001 x8 : ffffffc1f70154c0
Sep 14 17:09:07 aft-stereo kernel: [ 440.617559] x7 : 0000000000000000 x6 : ffffffc1f7093bf0
Sep 14 17:09:07 aft-stereo kernel: [ 440.617566] x5 : ffffffc1f7093bf0 x4 : 0000000000000000
Sep 14 17:09:07 aft-stereo kernel: [ 440.617573] x3 : ffffffc1f70997f8 x2 : ffffffc1f7093bf0
Sep 14 17:09:07 aft-stereo kernel: [ 440.617580] x1 : ffffffc1e47c1c00 x0 : 000000000000001e
Sep 14 17:09:07 aft-stereo kernel: [ 440.617586]
Sep 14 17:09:07 aft-stereo kernel: [ 440.617590] —[ end trace 38bc3467d8c62eab ]—
Sep 14 17:09:07 aft-stereo kernel: [ 440.622210] Call trace:
Sep 14 17:09:07 aft-stereo kernel: [ 440.622217] [] v4l_enum_fmt+0x1238/0x15a0
Sep 14 17:09:07 aft-stereo kernel: [ 440.622224] [] __video_do_ioctl+0x204/0x2c8
Sep 14 17:09:07 aft-stereo kernel: [ 440.622230] [] video_usercopy+0x2a0/0x6a0
Sep 14 17:09:07 aft-stereo kernel: [ 440.622236] [] video_ioctl2+0x3c/0x50
Sep 14 17:09:07 aft-stereo kernel: [ 440.622241] [] v4l2_ioctl+0xd0/0x118
Sep 14 17:09:07 aft-stereo kernel: [ 440.622251] [] do_vfs_ioctl+0xb0/0x8d8
Sep 14 17:09:07 aft-stereo kernel: [ 440.622256] [] SyS_ioctl+0x8c/0xa8
Sep 14 17:09:07 aft-stereo kernel: [ 440.622263] [] el0_svc_naked+0x34/0x38

1 Like

Looks like the driver report the pixelformat have problem.
What the v4l2-ctl --list-formats-ext

Looks like the driver report the pixelformat have problem.
What the v4l2-ctl --list-formats-ext

NOTE: we only observe this issue when our serdes link goes down and comes back up again, which happens infrequently, but we need more robust handling of this event. I’m mostly unsure of how to notify v4l2 that the device is gone

ioctl: VIDIOC_ENUM_FMT
Index : 0
Type : Video Capture
Pixel Format: ‘RG10’
Name : 10-bit Bayer RGRG/GBGB
Size: Discrete 1280x480
Interval: Discrete 0.067s (15.000 fps)
Size: Discrete 1280x480
Interval: Discrete 0.067s (15.000 fps)
Size: Discrete 2880x1089
Interval: Discrete 0.033s (30.000 fps)
Size: Discrete 2880x1089
Interval: Discrete 0.033s (30.000 fps)
Size: Discrete 2880x1080
Interval: Discrete 0.033s (30.000 fps)

1 Like

I’m still trying to figure this out and made some progress by modifying my deserializer to clean up by removing the serializer and devices behind the serializer in the link down handler. After this the stack is more stable to link lost events, but still not perfect. I’m now seeing:

[ +0.005349] ------------[ cut here ]------------
[ +0.004632] WARNING: CPU: 0 PID: 7467 at /home/charles/nvidia/nvidia_sdk/JetPack_4.3_Linux_JETSON_TX2/Linux_for_Tegra/sources/kernel/kernel-4.9/kernel/module.c:1108 module_put+0x14c/0x158
[ +0.016698] Modules linked in: zram tmp102 overlay at24 ub960 chpfpga ub953 imx296 nvmem_core spidev nvgpu bluedroid_pm ip_tables

[ +0.000063] CPU: 0 PID: 7467 Comm: python3 Tainted: G W 4.9.140-tegra #2
[ +0.000004] Hardware name: quill (DT)
[ +0.000006] task: ffffffc0f2dce200 task.stack: ffffffc0e5f04000
[ +0.000008] PC is at module_put+0x14c/0x158
[ +0.000011] LR is at v4l2sd_stream+0xbc/0x210
[ +0.000007] pc : [] lr : [] pstate: 80400045
[ +0.000004] sp : ffffffc0e5f07990
[ +0.000005] x29: ffffffc0e5f07990 x28: 0000000000000000
[ +0.000012] x27: ffffffc1e9690020 x26: ffffffc1ddecf850
[ +0.000011] x25: ffffff80011e5ec8 x24: ffffffc1dec610a8
[ +0.000010] x23: ffffffc1e9385318 x22: 00000000fffffff2
[ +0.000010] x21: ffffffc1dec600a0 x20: ffffff8008b4a5ec
[ +0.000010] x19: ffffff80011e7440 x18: 0000000000000010
[ +0.000010] x17: 0000000000000000 x16: 0000000000000000
[ +0.000009] x15: ffffffffffffffff x14: ffffff808a157a82
[ +0.000010] x13: ffffff800a157a90 x12: 0000000000000006
[ +0.000010] x11: 0000000005f5e0ff x10: 0000000000000494
[ +0.000010] x9 : 00000000ffffffd0 x8 : ffffff80083d47b0
[ +0.000010] x7 : ffffff8009ec49d8 x6 : ffffffc1f7048bf0
[ +0.000009] x5 : ffffffc1f7048bf0 x4 : ffffff80011e7748
[ +0.000010] x3 : 0000000000000000 x2 : 00000000ffffffff
[ +0.000010] x1 : 0000000000000000 x0 : ffffff80011e7748

[ +0.000014] —[ end trace 97b489f98ac70502 ]—
[ +0.004629] Call trace:
[ +0.000010] [] module_put+0x14c/0x158
[ +0.000007] [] v4l2sd_stream+0xbc/0x210
[ +0.000010] [] tegra_channel_set_stream+0x94/0x4c8
[ +0.000009] [] vi4_channel_stop_streaming+0x7c/0x490
[ +0.000010] [] tegra_channel_stop_streaming+0x34/0x48
[ +0.000009] [] __vb2_queue_cancel+0x34/0x188
[ +0.000007] [] vb2_core_streamoff+0x54/0xb8
[ +0.000008] [] vb2_streamoff+0x54/0x88
[ +0.000008] [] vb2_ioctl_streamoff+0x54/0x60
[ +0.000007] [] v4l_streamoff+0x3c/0x50
[ +0.000009] [] __video_do_ioctl+0x204/0x2c8
[ +0.000006] [] video_usercopy+0x2a0/0x6a0
[ +0.000007] [] video_ioctl2+0x3c/0x50
[ +0.000006] [] v4l2_ioctl+0x88/0x118
[ +0.000009] [] do_vfs_ioctl+0xb0/0x8d8
[ +0.000006] [] SyS_ioctl+0x8c/0xa8
[ +0.000010] [] el0_svc_naked+0x34/0x38

The following is a trace around when I injected a link down event on the SerDes:

vi-output, tc_u-7493 [000] … 752.023402: tegra_channel_capture_frame: sof:751.913258224
kworker/3:2-3816 [003] … 752.070119: rtos_queue_peek_from_isr_failed: tstamp:23763726140 queue:0x0b4b4500
kworker/3:2-3816 [003] … 752.070123: rtcpu_vinotify_event: tstamp:23763955132 tag:ATOMP_FS channel:0x00 frame:0 vi_tstamp:23763954759 data:0x00000000
kworker/3:2-3816 [003] … 752.070124: rtcpu_vinotify_event: tstamp:23763963213 tag:CHANSEL_PXL_SOF channel:0x00 frame:0 vi_tstamp:23763962851 data:0x00000001
kworker/3:2-3816 [003] … 752.070124: rtcpu_vinotify_event: tstamp:23764698034 tag:CHANSEL_PXL_EOF channel:0x00 frame:0 vi_tstamp:23764697624 data:0x04400002
kworker/3:2-3816 [003] … 752.070125: rtcpu_vinotify_event: tstamp:23765177553 tag:ATOMP_FE channel:0x00 frame:0 vi_tstamp:23765177094 data:0x00000000
vi-output, tc_u-7493 [004] … 752.141514: tegra_channel_capture_frame: sof:752.31370448
kworker/3:2-3816 [003] … 752.182113: rtcpu_vinotify_event: tstamp:23767646154 tag:ATOMP_FS channel:0x00 frame:0 vi_tstamp:23767645766 data:0x00000000
kworker/3:2-3816 [003] … 752.182116: rtcpu_vinotify_event: tstamp:23767654231 tag:CHANSEL_PXL_SOF channel:0x00 frame:0 vi_tstamp:23767653858 data:0x00000001
kworker/3:2-3816 [003] … 752.182116: rtcpu_vinotify_event: tstamp:23768389040 tag:CHANSEL_PXL_EOF channel:0x00 frame:0 vi_tstamp:23768388630 data:0x04400002
kworker/3:2-3816 [003] … 752.182119: rtos_queue_peek_from_isr_failed: tstamp:23768726146 queue:0x0b4b4500
kworker/3:2-3816 [003] … 752.182119: rtcpu_vinotify_event: tstamp:23768868545 tag:ATOMP_FE channel:0x00 frame:0 vi_tstamp:23768868096 data:0x00000000
kworker/3:2-3816 [003] … 752.350110: rtcpu_vinotify_event: tstamp:23772566243 tag:CSIMUX_FRAME channel:0x00 frame:0 vi_tstamp:23772565854 data:0x000000a4
kworker/3:2-3816 [003] … 752.350114: rtcpu_vinotify_event: tstamp:23772566733 tag:CHANSEL_LOAD_FRAMED channel:0x10 frame:0 vi_tstamp:23772566401 data:0x08000000
kworker/3:2-3816 [003] … 752.350114: rtcpu_vinotify_event: tstamp:23772567671 tag:CHANSEL_FAULT_FE channel:0x10 frame:0 vi_tstamp:23772567100 data:0x00000001
kworker/3:2-3816 [003] … 752.350115: rtcpu_vinotify_event: tstamp:23772567816 tag:ATOMP_FE channel:0x00 frame:0 vi_tstamp:23772567103 data:0x00000001
kworker/3:2-3816 [003] … 752.350117: rtos_queue_peek_from_isr_failed: tstamp:23773726157 queue:0x0b4b4500
vi-output, tc_u-7493 [005] … 752.414611: tegra_channel_capture_setup: vnc_id 0 W 2880 H 1089 fmt 20
vi-output, tc_u-7493 [005] … 752.414651: tegra_channel_capture_frame: sof:0.-266627765760
vi-output, tc_u-7493 [004] … 752.417074: tegra_channel_capture_frame: sof:752.306960376
vi-output, tc_u-7493 [004] … 752.456472: tegra_channel_capture_frame: sof:752.346331096
kworker/3:2-3816 [003] … 752.462098: rtos_queue_send_from_isr_failed: tstamp:23776173585 queue:0x0b4a7258
kworker/3:2-3816 [003] … 752.462102: rtos_queue_send_from_isr_failed: tstamp:23776173703 queue:0x0b4aad68
kworker/3:2-3816 [003] … 752.462102: rtos_queue_send_from_isr_failed: tstamp:23776173809 queue:0x0b4ac998
kworker/3:2-3816 [003] … 752.462103: rtos_queue_send_from_isr_failed: tstamp:23776173915 queue:0x0b4ae518
kworker/3:2-3816 [003] … 752.462104: rtos_queue_send_from_isr_failed: tstamp:23776174020 queue:0x0b4af2d8
kworker/3:2-3816 [003] … 752.462104: rtos_queue_send_from_isr_failed: tstamp:23776174125 queue:0x0b4b0098
kworker/3:2-3816 [003] … 752.462105: rtos_queue_send_from_isr_failed: tstamp:23776174230 queue:0x0b4b0e58
kworker/3:2-3816 [003] … 752.462105: rtos_queue_send_from_isr_failed: tstamp:23776174334 queue:0x0b4b1c18
kworker/3:2-3816 [003] … 752.462107: rtos_queue_send_failed: tstamp:23776174934 queue:0x0b4a7258
kworker/3:2-3816 [003] … 752.462107: rtos_queue_send_from_isr_failed: tstamp:23776180264 queue:0x0b4a7258
kworker/3:2-3816 [003] … 752.462108: rtos_queue_send_from_isr_failed: tstamp:23776180382 queue:0x0b4aad68
kworker/3:2-3816 [003] … 752.462109: rtos_queue_send_from_isr_failed: tstamp:23776180488 queue:0x0b4ac998
kworker/3:2-3816 [003] … 752.462109: rtos_queue_send_from_isr_failed: tstamp:23776180596 queue:0x0b4ae518
kworker/3:2-3816 [003] … 752.462110: rtos_queue_send_from_isr_failed: tstamp:23776180701 queue:0x0b4af2d8
kworker/3:2-3816 [003] … 752.462110: rtos_queue_send_from_isr_failed: tstamp:23776180816 queue:0x0b4b0098
kworker/3:2-3816 [003] … 752.462111: rtos_queue_send_from_isr_failed: tstamp:23776180921 queue:0x0b4b0e58
kworker/3:2-3816 [003] … 752.462112: rtos_queue_send_from_isr_failed: tstamp:23776181030 queue:0x0b4b1c18
kworker/3:2-3816 [003] … 752.462112: rtos_queue_send_failed: tstamp:23776181485 queue:0x0b4a7258
kworker/3:2-3816 [003] … 752.462113: rtos_queue_send_from_isr_failed: tstamp:23776185707 queue:0x0b4a7258
kworker/3:2-3816 [003] … 752.462113: rtos_queue_send_from_isr_failed: tstamp:23776185829 queue:0x0b4aad68
kworker/3:2-3816 [003] … 752.462114: rtos_queue_send_from_isr_failed: tstamp:23776185935 queue:0x0b4ac998
kworker/3:2-3816 [003] … 752.462115: rtos_queue_send_from_isr_failed: tstamp:23776186042 queue:0x0b4ae518
kworker/3:2-3816 [003] … 752.462115: rtos_queue_send_from_isr_failed: tstamp:23776186147 queue:0x0b4af2d8
kworker/3:2-3816 [003] … 752.462116: rtos_queue_send_from_isr_failed: tstamp:23776186253 queue:0x0b4b0098
kworker/3:2-3816 [003] … 752.462116: rtos_queue_send_from_isr_failed: tstamp:23776186358 queue:0x0b4b0e58
kworker/3:2-3816 [003] … 752.462117: rtos_queue_send_from_isr_failed: tstamp:23776186463 queue:0x0b4b1c18
kworker/3:2-3816 [003] … 752.462118: rtos_queue_send_failed: tstamp:23776186911 queue:0x0b4a7258
kworker/3:2-3816 [003] … 752.462118: rtos_queue_send_from_isr_failed: tstamp:23776188394 queue:0x0b4a7258
kworker/3:2-3816 [003] … 752.462119: rtos_queue_send_from_isr_failed: tstamp:23776188512 queue:0x0b4aad68
kworker/3:2-3816 [003] … 752.462120: rtos_queue_send_from_isr_failed: tstamp:23776188618 queue:0x0b4ac998
kworker/3:2-3816 [003] … 752.462120: rtos_queue_send_from_isr_failed: tstamp:23776188725 queue:0x0b4ae518
kworker/3:2-3816 [003] … 752.462121: rtos_queue_send_from_isr_failed: tstamp:23776188828 queue:0x0b4af2d8
kworker/3:2-3816 [003] … 752.462121: rtos_queue_send_from_isr_failed: tstamp:23776188941 queue:0x0b4b0098
kworker/3:2-3816 [003] … 752.462122: rtos_queue_send_from_isr_failed: tstamp:23776189046 queue:0x0b4b0e58
kworker/3:2-3816 [003] … 752.462122: rtos_queue_send_from_isr_failed: tstamp:23776189150 queue:0x0b4b1c18
kworker/3:2-3816 [003] … 752.462123: rtos_queue_send_failed: tstamp:23776190085 queue:0x0b4a7258
kworker/3:2-3816 [003] … 752.462125: rtcpu_vinotify_event: tstamp:23776258508 tag:ATOMP_FS channel:0x00 frame:0 vi_tstamp:23776258115 data:0x00000000
kworker/3:2-3816 [003] … 752.462125: rtcpu_vinotify_event: tstamp:23776266577 tag:CHANSEL_PXL_SOF channel:0x00 frame:0 vi_tstamp:23776266208 data:0x00000001
kworker/3:2-3816 [003] … 752.462126: rtcpu_vinotify_event: tstamp:23776269218 tag:CHANSEL_LOAD_FRAMED channel:0x10 frame:0 vi_tstamp:23776268854 data:0x08000000
kworker/3:2-3816 [003] … 752.462126: rtcpu_vinotify_event: tstamp:23777001398 tag:CHANSEL_PXL_EOF channel:0x00 frame:0 vi_tstamp:23777000980 data:0x04400002
kworker/3:2-3816 [003] … 752.462126: rtcpu_vinotify_event: tstamp:23777480899 tag:ATOMP_FE channel:0x00 frame:0 vi_tstamp:23777480449 data:0x00000000
kworker/3:2-3816 [003] … 752.462127: rtcpu_vinotify_event: tstamp:23777488813 tag:ATOMP_FS channel:0x00 frame:0 vi_tstamp:23777488451 data:0x00000000
kworker/3:2-3816 [003] … 752.462127: rtcpu_vinotify_event: tstamp:23777496906 tag:CHANSEL_PXL_SOF channel:0x00 frame:0 vi_tstamp:23777496543 data:0x00000001
kworker/3:2-3816 [003] … 752.462128: rtcpu_vinotify_event: tstamp:23777500179 tag:CHANSEL_LOAD_FRAMED channel:0x10 frame:0 vi_tstamp:23777499811 data:0x08000000
vi-output, tc_u-7493 [004] … 752.495839: tegra_channel_capture_frame: sof:752.385701816

How do you inject a link down event? Is this a v4l2 function call or a tegracam function call?

Neither, it’s out of band via hardware

I have the same type of problem you do so I’m interested in a solution also. It doesn’t seem like there is a graceful mechanism to inform the tegracam architecture or nvargus-daemon through either a Linux kernel call(s) or a userspace monitoring with IPC mechanism. It appears the system assumes a static configuration that will never have errors and cameras are never removed/added after initialization.

1 Like

So, the kernel crash if the serdes disconnect or stopping output data to MIPI?

I’ll only answer for my situation, the kernel does not crash. nvargus-daemon usually crashes or has an issue when the serdes has an issue. For my serdes driver, I have the ability to detect when an error/issue occurs. However, I don’t have anything in the tegracam or v4l2 APIs that I know to do when an error occurs. See Crashing camera stream brings down entire video pipeline for more details.

Is there a way to gracefully notify nvargus-daemon that there is an issue?

Are there improvements to the Linux kernel or nvargus-daemon so that when a single instance of a camera has an issue it does not bring down all of the rest of the camera streams?

I wouldn’t say the kernel crashes, but the vi driver and/or v4l2-core crashes.

For the argus crash current we have implement an error handling sample code in the userAutoExposure app, Please reference to it to add EVENT_TYPE_ERROR to terminal the APP.