System crashes when doing a repeated camera capture

Hello!

I have a hardware setup with ov10635 -> serializer -> deserializer -> AGX Xavier with L4T 32.7.2 kernel. I have been working on validating the image path illustrated above. However, I encounter that running subsequent camera captures lead to RTCPU crash. I have attached the error log in this post. Can you help us with understanding how to go on debugging this issue?

Please find the error log attached:
crash_dump.log (34.1 KB)

Let me know if you need any other information.

hello atharvanandanwar,

may I know what’s the test procedure, for example, are you doing stream-on/off continuously? is there a test script for reference.
furthermore, what’s the failure rate, can you reproduce the issue easily?

Hey Jerry,

I’m using yavta -c /dev/video0 to start a stream, interrupting it using Ctrl-C and then starting again using yavta -c /dev/video0 and repeat until the failure occurs. I have encountered that failure occurs in 2-20 tries, not sure what it depends on. However, failure does occur.

It can be reproduced very easily and the failure rate is quite high if you keep on pushing.

I have tried using while true; do timeout 0.5 yavta -c /dev/video0; done - which does run for a while. However, if you STOP that and rerun it - you encounter the issue again.

Let me know how else I can help! Thanks!

Additional log with tp_printk and enabling tracing for rtcpu.

yavta -c=5 /dev/video0
[   78.443571] tegra_channel_open: vi-output, ov10635 2-0009
[   78.445960] tegra_channel_set_power: ov10635 2-0009 : 0x1
[   78.446189] camera_common_s_power: status : 0x1
[   78.446484] ov10635 2-0009: Power on (          (null))
[   78.451985] tegra_channel_set_power: 15a00000.nvcsi--6 : 0x1
[   78.452098] csi_s_power: enable : 0x1
Device /dev/video0 opened.
Device `vi-output, ov10635 2-0009' on `platform:15c10000.vi:0' is a video output (without mplanes) device.
Video format: UYVY (59565955) 1280x800 (stride 2560) field none buffer size 2048000
8 buffers requested.
length: 2048000 offset: 0 timestamp type/source: mono/EoF
Buffer 0/0 mapped at address 0x7f8d091000.
length: 2048000 offset: 2048000 timestamp type/source: mono/EoF
Buffer 1/0 mapped at address 0x7f8ce9d000.
length: 2048000 offset: 4096000 timestamp type/source: mono/EoF
Buffer 2/0 mapped at address 0x7f8cca9000.
length: 2048000 offset: 6144000 timestamp type/source: mono/EoF
Buffer 3/0 mapped at address 0x7f8cab5000.
length: 2048000 offset: 8192000 timestamp type/source: mono/EoF
Buffer 4/0 mapped at address 0x7f8c8c1000.
length: 2048000 offset: 10240000 timestamp type/source: mono/EoF
Buffer 5/0 mapped at address 0x7f8c6cd000.
length: 2048000 offset: 12288000 timestamp type/source: mono/EoF
Buffer 6/0 mapped at address 0x7f8c4d9000.
length: 2048000 offset: 14336000 timestamp type/source: mono/EoF
Buffer 7/0 mapped at address 0x7f8c2e5000.
[   78.480925] tegra_channel_capture_setup: vnc_id 0 W 1280 H 800 fmt 13
[   78.521128] rtcpu_string: tstamp:2856436914 id:0x04010000 str:"Configuring VI GoS.
[   78.521128] "
[   78.521334] [RCE] Configuring VI GoS.
[   78.521449] rtcpu_string: tstamp:2856437132 id:0x04010000 str:"VM GOS[#0] addr=0xc2100000
[   78.521449] "
[   78.521634] [RCE] VM GOS[#0] addr=0xc2100000
[   78.521719] rtcpu_string: tstamp:2856437395 id:0x04010000 str:"VM GOS[#1] addr=0xc2101000
[   78.521719] "
[   78.525006] [RCE] VM GOS[#1] addr=0xc2101000
[   78.529375] rtcpu_string: tstamp:2856437648 id:0x04010000 str:"VM GOS[#2] addr=0xc2102000
[   78.529375] "
[   78.539469] [RCE] VM GOS[#2] addr=0xc2102000
[   78.543629] rtcpu_string: tstamp:2856437887 id:0x04010000 str:"VM GOS[#3] addr=0xc2103000
[   78.543629] "
[   78.553451] [RCE] VM GOS[#3] addr=0xc2103000
[   78.558006] rtcpu_string: tstamp:2856438147 id:0x04010000 str:"VM GOS[#4] addr=0xc2104000
[   78.558006] "
[   78.567598] [RCE] VM GOS[#4] addr=0xc2104000
[   78.572257] rtcpu_string: tstamp:2856438387 id:0x04010000 str:"VM GOS[#5] addr=0xc2105000
[   78.572257] "
[   78.581834] CPU:0, Error:RCE-NOC@0xbe00000,irq=492
[   78.586855] **************************************
[   78.591754] * For more Internal Decode Help
[   78.595519] *     http://nv/cbberr
[   78.599190] * NVIDIA userID is required to access
[   78.603914] **************************************
[   78.608557] CPU:0, Error:RCE-NOC
[   78.611797]  Error Logger            : 1
[   78.614954]  ErrLog0                 : 0x80030600
[   78.618880]    Transaction Type      : RD  - Read, Incrementing

This is right before the crash.

Looking and reading through the code. Here’s what I found:

  1. Something went wrong with the RTCPU right after the following stack dump:
[   43.912141] [<ffffff8008b45d28>] vi_capture_ivc_send_control.isra.4+0xd0/0x218
[   43.919313] [<ffffff8008b46490>] vi_capture_setup+0x3e0/0x6c8
[   43.924913] [<ffffff8008b48d5c>] tegra_channel_capture_setup+0x164/0x2b8
[   43.931563] [<ffffff8008b49468>] vi5_channel_start_streaming+0x270/0x560
[   43.938042] [<ffffff8008b3b3c4>] tegra_channel_start_streaming+0x3c/0x58
[   43.944865] [<ffffff8008b33654>] vb2_start_streaming+0x6c/0x148
[   43.950727] [<ffffff8008b35380>] vb2_core_streamon+0x98/0x190
[   43.956244] [<ffffff8008b38140>] vb2_streamon+0x30/0x78
[   43.961405] [<ffffff8008b381dc>] vb2_ioctl_streamon+0x54/0x60
[   43.967008] [<ffffff8008b13c44>] v4l_streamon+0x3c/0x50
[   43.972426] [<ffffff8008b17850>] __video_do_ioctl+0x1e8/0x298
[   43.977593] [<ffffff8008b17244>] video_usercopy+0x2c4/0x698
[   43.983368] [<ffffff8008b17654>] video_ioctl2+0x3c/0x50
[   43.988791] [<ffffff8008b12948>] v4l2_ioctl+0xf0/0x108
[   43.994042] [<ffffff8008276c70>] do_vfs_ioctl+0xb0/0x8e8
[   43.999464] [<ffffff8008277538>] SyS_ioctl+0x90/0xa8
[   44.004107] [<ffffff8008083900>] el0_svc_naked+0x34/0x38

Crash happens right after we call tegra_capture_ivc_control_submit function. IVC could be Interrupt Vector Controller (?).

Description about tegra_capture_ivc_control_submit function states -
Submit the control message binary blob to capture-IVC driver, which is to be transfered over control IVC channel to RTCPU.

Looks like the function vi_capture_setup fails with “no reply from camera processor”, but the crash occurs before that.

Can you help with understand what would be leading to the crash?

Hi,
Please try this sample and see if it is more stable:
Nvv4l2decoder plugin is not shutting down cleanly and leaking memory - #8 by DaneLLL

You can replace the piepline with v4l2src ! fakesink. It sends EoS(End of Stream) before terminating the pipeline. Please try this method and see if it can exit gracefully and have better stability.

1 Like

Thanks a lot DaneLLL,

I’ll give this a try! :)

This doesn’t seem like it worked. Same crash occurs. Sometimes it occurs if I launch the next capture without 5 seconds of the first.

Hi,
It seems to be an issue in the sensor driver. Could you clarify if it fails in initializing(power on) the camera or termination(power off)? Proboably the camera is not successfully powered or or off in certain condition.

@DaneLLL and @JerryChang

I found that this was happening because linux-tegra-4.9/device-group.c at 28d9fddded2c498439b985d34ab57de4bd4c13f0 · OE4T/linux-tegra-4.9 · GitHub isn’t returning -EPROBE_DEFER. In our code, we are trying to delay the VI initialization. However, tegra-camera-rtcpu driver thinks that it is already initialized. This would cause the issue I’m mentioned.

I was able to fix it by returning -EPROBE_DEFER instead of returning 0. What that does is delays the tegra-camera-rtcpu until vi5 driver is ready… now everything works nicely.

1 Like

thanks for sharing, glad to know it works.

I’m not sure what the tegra-camera-rtcpu is trying to do, however it must wait in case VI driver didn’t initialize.

NOTE: While following the solution, please know that this error was introduced due to delaying VI driver initialize in the base kernel.

Thanks for your help here!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.