VI Engine crashing when camera source not delivered

Hello,

I am trying to develop an application based on the example 12_camera_V4L2_cuda. For this application, I am routing some streams via MIPI-CSI2 interface to the Jetson Xavier. At this stage, I am trying to implement a security function, which basically will render a test pattern to the display when there is no data coming to the jetson.

However, it is not possible to do this. My application is polling all the time /dev/video0, and when I unplug the camera in order to go out of the polling and simulate the use case, cleanup the buffers and render a test pattern, the VI engine is actually crashing and sending the typical messages that it sends when there is no image data coming into the engine (i.e no reply from camera processor) and therefore a reboot is needed. I am pretty sure that this is caused because the stop_stream command is not called, and therefore the VI engine is still expecting some data.

How am I supposed to tell the VI engine to stop the streaming, if I do not even know when the streaming is not going to stop? In the end, the VI engine driver should be able to handle this use case, as it is a really basic security feature. I assume that changing something in the VI engine driver this could be solved. Could you help in the topic?

L4T 32.6.1, JetPack 4.6

Thanks

Hi,
There is error handling in VI engine and certain error cases can be detected. But it is possible certain corner cases cannot be detected. We would suggest try latest release like Jetpack 4.6.3 and 5.1.1. And check if the condition can be detected and exit gracefully.

Hi,

Thanks for reply. However, in this case it is not really possible to migrate to latest release, due to time constraints. Also, would it be guaranteed that that use case is supported?

Is there no possible way to “trick” the VI engine when there is a timeout event and stop the stream then? This would require to edit that driver, but I guess source code is available right? Could you support in this workaround? In the end this use case is not that uncommon, as it is a really basic security feature, and at the moment the kernel is basically panicking.

hello borjabasket14,

you may access Jetson Linux Archive | NVIDIA Developer to download [Driver Package (BSP) Sources] with respect to your L4T release version.
VI driver sources is available here…
$public_sources/kernel_src/kernel/nvidia/drivers/media/platform/tegra/camera/vi/channel.c
and…
here’s operation driver, Xavier series is using VI-5
$public_sources/kernel_src/kernel/nvidia/drivers/media/platform/tegra/camera/vi/vi5_fops.c

as you can see, VI layer has its own error_recover mechanism, vi5_channel_error_recover().
other than no reply from camera processor, is there any kernel message reported to indicate the failure? how about also gather the kernel messages i.e. $ dmesg --follow and reproduce the failure for reference.

BTW,
you may also refer to Topic 243051, comment#28 for the our test approaches to test error recovery mechanism.
it’s suggest to use application (i.e. argus_userAutoExposure) to report EVENT_TYPE_ERROR and terminate the pipeline gracefully.

Hi,

Thanks for the reply. Here I attach the trace log from the application. More down I will atatch the dmesg command messages, but they should be the same.

Steps to reproduce it:
My application is using the V4l2 interface and I am able to see the images correcly. At some point, I “cut” the image stream unexpectedly (i.e I unconnect the camera to simulate a possible security problem, or that the camera is broken). I therefore expect the pollin event to not detect anything and the application will try to stop the stream. As you can see, I print a message “Stopping stream…” . This is just a normal printf message, right before I really call the stop stream command. It can be seen that the program is literally stopping the streams after the no reply from camera processor and all those errors thrown by the VI engine (INFO: stop_stream_thread(): (line:882) Camera video streaming off …)


INFO: camera_initialize(): (line:174) Camera 1 ouput format: (1920 x 1080)  stride: 3840, imagesize: 4147200, frate: 0 / 0
INFO: camera_initialize(): (line:197) Camera 2 ouput format: (1920 x 1080)  stride: 3840, imagesize: 4147200, frate: 0 / 0
[INFO] (NvEglRenderer.cpp:110) <renderer0> Setting Screen width 1920 height 1080
INFO: init_components(): (line:255) Initialize v4l2 components successfully
WARN: request_camera_buff(): (line:304) Camera v4l2 buf length is not expected
WARN: request_camera_buff(): (line:304) Camera v4l2 buf length is not expected
WARN: request_camera_buff(): (line:304) Camera v4l2 buf length is not expected
WARN: request_camera_buff(): (line:304) Camera v4l2 buf length is not expected
WARN: request_camera_buff(): (line:343) Camera v4l2 buf length is not expected
WARN: request_camera_buff(): (line:343) Camera v4l2 buf length is not expected
WARN: request_camera_buff(): (line:343) Camera v4l2 buf length is not expected
WARN: request_camera_buff(): (line:343) Camera v4l2 buf length is not expected
INFO: prepare_buffers(): (line:617) Succeed in preparing stream buffers
INFO: start_stream(): (line:645) Camera 1 video streaming on ...
Camera 1 acquisition parameters configured
Camera 2 acquisition parameters configured
start_capture camera 1 acquisition thread launched
Either an input switch or camera disconnected has happned
Cleanup has started...
Stopping stream...
[  502.099197] tegra194-vi5 15c10000.vi: no reply from camera processor
[  502.100009] tegra194-vi5 15c10000.vi: uncorr_err: request timed out after 10000 ms
[  502.100601] tegra194-vi5 15c10000.vi: err_rec: attempting to reset the capture channel
[  502.109296] Unable to handle kernel NULL pointer dereference at virtual address 00000010
[  502.109821] Mem abort info:
[  502.109878]   ESR = 0x96000005
[  502.109935]   Exception class = DABT (current EL), IL = 32 bits
[  502.110052]   SET = 0, FnV = 0
[  502.110107]   EA = 0, S1PTW = 0
[  502.110170] Data abort info:
[  502.110225]   ISV = 0, ISS = 0x00000005
[  502.110298]   CM = 0, WnR = 0
[  502.110357] user pgtable: 4k pages, 39-bit VAs, pgd = ffffffc3cd35b000
[  502.110472] [0000000000000010] *pgd=0000000000000000, *pud=0000000000000000
[  502.110613] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[  502.110712] Modules linked in: xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack br_netfilter zram overlay userspace_alert nvgpu ip_tables x_tables
[  502.111852] CPU: 0 PID: 6557 Comm: vi-output, ov56 Not tainted 4.9.253-tegra #15
[  502.119108] Hardware name: NVIDIA Jetson Xavier NX Developer Kit (DT)
[  502.125842] task: ffffffc3ea3b4600 task.stack: ffffffc3a53d8000
[  502.131801] PC is at _raw_write_lock+0x30/0x58
[  502.135826] LR is at destroy_buffer_table+0x40/0xd8
[  502.141065] pc : [<ffffff8008f70b88>] lr : [<ffffff8008b51680>] pstate: 20c00045
[  502.148326] sp : ffffffc3a53dbc70
[  502.151564] x29: ffffffc3a53dbc70 x28: 0000000000000000
[  502.157161] x27: 0000000000000000 x26: 0000000000000000
[  502.162433] x25: 0000000000000010 x24: 0000000000000098
[  502.168201] x23: 0000000000000018 x22: ffffff8009087f38
[  502.173274] x21: 0000000000000000 x20: ffffffc3a6bad800
[  502.178860] x19: 0000000000000010 x18: 0000000000000b1c
[  502.184207] x17: 0000000000000002 x16: 0000000000000003
[  502.190235] x15: 0000000000000361 x14: 00050000000bfede
[  502.195927] x13: 0005000000000000 x12: ffffff800c000064
[  502.201355] x11: 0000000000000400 x10: 0000000000000000
[  502.207296] x9 : ffffffc3a53dba80 x8 : fffffffffffffffe
[  502.213071] x7 : ffffffc3a6bb2280 x6 : ffffffc3ea3b4600
[  502.218583] x5 : ffffffc3a6bb21c0 x4 : ffffffc3ffd15160
[  502.223668] x3 : 0000000000000000 x2 : ffffffc3a6bb2240
[  502.229260] x1 : 0000000000000000 x0 : 0000000080000000
[  502.234597]
[  502.236004] Process vi-output, ov56 (pid: 6557, stack limit = 0xffffffc3a53d8000)
[  502.242999] Call trace:
[  502.245539] [<ffffff8008f70b88>] _raw_write_lock+0x30/0x58
[  502.250612] [<ffffff8008b51680>] destroy_buffer_table+0x40/0xd8
[  502.256214] [<ffffff8008b438a4>] vi_capture_shutdown+0xd4/0x130
[  502.261814] [<ffffff8008b43efc>] vi_channel_close_ex+0x34/0x88
[  502.267416] [<ffffff8008b45408>] vi5_channel_error_recover+0x48/0x1c8
[  502.273366] [<ffffff8008b3a4d8>] tegra_channel_error_recover+0x58/0x90
[  502.279230] [<ffffff8008b45cc8>] tegra_channel_kthread_capture_dequeue+0xf8/0x1c0
[  502.286402] [<ffffff80080db09c>] kthread+0xec/0xf0
[  502.291040] [<ffffff80080838a0>] ret_from_fork+0x10/0x30
[  502.296291] ---[ end trace 79ce96888cd4f578 ]---
[  502.308124] note: vi-output, ov56[6557] exited with preempt_count 1
[  502.315169] tegra194-vi5 15c10000.vi: vi_capture_release: setup channel first
[  502.315325] video4linux video0: vi capture release failed
INFO: stop_stream_thread(): (line:882) Camera video streaming off ...
INFO: stop_stream_context(): (line:897) Camera 1 (context_t ctx) video streaming off ...


Here is also a small snippet of the code of the application for better understanding:

/* If a timeout is not registered there are two options: Camera not delivering image or changing of the input is happening
        Either way, queues are then cleaned and created again to make sure a clean closing and opening of the device is happening.*/

        printf("Either an input switch or camera disconnected has happned\n");
        printf("Cleanup has started...\n");

        /* Stop streams */

        printf("Stopping stream... \n");

        if (!stop_stream_thread(cameraAcquireParams))
            printf("Failed to stop stream...");

        if (!stop_stream_context(ctx))
            printf("Failed to stop stream...");

where stop_stream functions are:

bool
stop_stream_thread(acquireThreadParams* cameraAcquireParams)
{
    enum v4l2_buf_type type;

    /* Stop v4l2 streaming */
    type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    if (ioctl(cameraAcquireParams->cam_fd, VIDIOC_STREAMOFF, &type))
        ERROR_RETURN("Failed to stop streaming: %s (%d)",
            strerror(errno), errno);

    INFO("Camera video streaming off ...");
    return true;
}

Output of dmesg --follow is (it is the same as my application output though, I attach it just inc ase for your understanding):

[  461.354751] [RCE] vi5_hwinit: firmware CL2018101701 protocol version 2.2
[  502.099197] tegra194-vi5 15c10000.vi: no reply from camera processor
[  502.100009] tegra194-vi5 15c10000.vi: uncorr_err: request timed out after 10000 ms
[  502.100601] tegra194-vi5 15c10000.vi: err_rec: attempting to reset the capture channel
[  502.109296] Unable to handle kernel NULL pointer dereference at virtual address 00000010
[  502.109821] Mem abort info:
[  502.109878]   ESR = 0x96000005
[  502.109935]   Exception class = DABT (current EL), IL = 32 bits
[  502.110052]   SET = 0, FnV = 0
[  502.110107]   EA = 0, S1PTW = 0
[  502.110170] Data abort info:
[  502.110225]   ISV = 0, ISS = 0x00000005
[  502.110298]   CM = 0, WnR = 0
[  502.110357] user pgtable: 4k pages, 39-bit VAs, pgd = ffffffc3cd35b000
[  502.110472] [0000000000000010] *pgd=0000000000000000, *pud=0000000000000000
[  502.110613] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[  502.110712] Modules linked in: xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack br_netfilter zram overlay userspace_alert nvgpu ip_tables x_tables
[  502.111852] CPU: 0 PID: 6557 Comm: vi-output, ov56 Not tainted 4.9.253-tegra #15
[  502.119108] Hardware name: NVIDIA Jetson Xavier NX Developer Kit (DT)
[  502.125842] task: ffffffc3ea3b4600 task.stack: ffffffc3a53d8000
[  502.131801] PC is at _raw_write_lock+0x30/0x58
[  502.135826] LR is at destroy_buffer_table+0x40/0xd8
[  502.141065] pc : [<ffffff8008f70b88>] lr : [<ffffff8008b51680>] pstate: 20c00045
[  502.148326] sp : ffffffc3a53dbc70
[  502.151564] x29: ffffffc3a53dbc70 x28: 0000000000000000
[  502.157161] x27: 0000000000000000 x26: 0000000000000000
[  502.162433] x25: 0000000000000010 x24: 0000000000000098
[  502.168201] x23: 0000000000000018 x22: ffffff8009087f38
[  502.173274] x21: 0000000000000000 x20: ffffffc3a6bad800
[  502.178860] x19: 0000000000000010 x18: 0000000000000b1c
[  502.184207] x17: 0000000000000002 x16: 0000000000000003
[  502.190235] x15: 0000000000000361 x14: 00050000000bfede
[  502.195927] x13: 0005000000000000 x12: ffffff800c000064
[  502.201355] x11: 0000000000000400 x10: 0000000000000000
[  502.207296] x9 : ffffffc3a53dba80 x8 : fffffffffffffffe
[  502.213071] x7 : ffffffc3a6bb2280 x6 : ffffffc3ea3b4600
[  502.218583] x5 : ffffffc3a6bb21c0 x4 : ffffffc3ffd15160
[  502.223668] x3 : 0000000000000000 x2 : ffffffc3a6bb2240
[  502.229260] x1 : 0000000000000000 x0 : 0000000080000000

[  502.236004] Process vi-output, ov56 (pid: 6557, stack limit = 0xffffffc3a53d8000)
[  502.242999] Call trace:
[  502.245539] [<ffffff8008f70b88>] _raw_write_lock+0x30/0x58
[  502.250612] [<ffffff8008b51680>] destroy_buffer_table+0x40/0xd8
[  502.256214] [<ffffff8008b438a4>] vi_capture_shutdown+0xd4/0x130
[  502.261814] [<ffffff8008b43efc>] vi_channel_close_ex+0x34/0x88
[  502.267416] [<ffffff8008b45408>] vi5_channel_error_recover+0x48/0x1c8
[  502.273366] [<ffffff8008b3a4d8>] tegra_channel_error_recover+0x58/0x90
[  502.279230] [<ffffff8008b45cc8>] tegra_channel_kthread_capture_dequeue+0xf8/0x1c0
[  502.286402] [<ffffff80080db09c>] kthread+0xec/0xf0
[  502.291040] [<ffffff80080838a0>] ret_from_fork+0x10/0x30
[  502.296291] ---[ end trace 79ce96888cd4f578 ]---
[  502.308124] note: vi-output, ov56[6557] exited with preempt_count 1
[  502.315169] tegra194-vi5 15c10000.vi: vi_capture_release: setup channel first
[  502.315325] video4linux video0: vi capture release failed

Also, another remark from my side:

When this happends, and you try to capture again a camera source, the same messages will appear and then a reboot is needed in order to make it work again.

Thanks for suupport

Hi, any update/suggestion on the topic?

[Edit] I just realize you’re using the V4l2 interface, Argus doesn’t support with that.

this exmaple, 12_camera_V4L2_cuda doesn’t contain error handling mechanism. please refer to argus_userAutoExposure to report EVENT_TYPE_ERROR and terminate the pipeline gracefully.

hence,
this certain corner error cases may not be detected.
please make sure the camera connector is mounted correctly.

Hi,we also meet the save issue,we use v4l2 interface 。 what’s “please make sure the camera connector is mounted correctly.” we should do. I think the connection problem of peripherals should not cause system crash。

Hi,
Please check if there is a way to replicate the phenomenon on Xavier NX developer kit. We can set up to replicate the issue. And then try to enhance error resilience in future releases.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.