Jetson Corrupted Frame Camera Driver

I am using the Jetson Orin NX (JetPack-5.1.1/l4t-r35.3.1) with two IMX219 cameras. Thereby as soon as I get a corrupted frame, the whole application stops working. I have a Robotics application where I am sure to get a corrupted frame within an hour.

To easily reproduce the problem of getting a corrupted frame in the desktop setup, you can either move the cables around a lot or short-circuit two cables at the camera connector. As an application, I use a custom gstreamer pipeline. However, the problem can also be observed when using nvgstcapture-1.0 or the 13_multi_camera binary from the jetson_multimedia_api (libargus).

For my application it would not be a problem to ignore a corrupted frame once in a while. However, it is not a solution if the application crashes completely. After restarting nvargus-daemon I am able to restart the application.

What I have tried so far:

  • I tested (l4t-r35.5.0 / jetpack-5.1.3) and got the same problem.
  • I tried the solution suggested in these two posts (post 1 & post 2) and applied the binaries from topic243051.
    • As a result, nvargus-daemon no longer crashes (although I sometimes get a segfault). However, the nvcamerasrc from gstreamer and the jetson_multimedia_api binaries still crash. I tried to rewrite them so that they would not abort on a frame timeout, but was unsuccessful. There was also no online documentation to help with the implementation.
  • I tried to use v4l2. This seemed to not be supported for the IMX219 csi camera.

It would be great if you could provide a way to handle corrupt frames with a gstreamer application that does not abort when a frame becomes corrupted.

Thanks for your help

Someone else will have to provide the answer, but nobody will be able to answer without some added information: If you have a full serial console boot log (one usually needs to see everything prior to the failure), and up to the point where the error occurs. The logs usually give a stack frame and driver information. I don’t work with camera drivers, but this would help someone narrow down the issue.

I agree that a corrupted frame should not cause a crash, but if moving cables around can do this, then there might be some other electrical issues. It seems there is some noise or interference, perhaps connectors not being “solid” in their connection. Don’t know, but even with this issue out of the way, I’ll recommend you still figure out what is going on with the cables.

hello david.mueller1,

are you based-on the same L4T version to apply those update?
note, L4T version mismatch might introduce unexpected failures.


BTW,

is it due to intermittently MIPI signaling?
let’s narrow down the issue by v4l2 IOCTL to check sensor basic functionality.
please refer to Approaches for Validating and Testing the V4L2 Driver.
for instance,
$ v4l2-ctl -d /dev/video0 --set-fmt-video=width=1920,height=1080,pixelformat=RG10 --set-ctrl bypass_mode=0 --stream-mmap --stream-count=100

Thanks for your reply.

are you based-on the same L4T version to apply those update?
note, L4T version mismatch might introduce unexpected failures.

Yes I am. So I am running L4T-r35.3.1 as well. So no difference there.

Then the instructions which provided for the desktop setup are here in order to easily reproduce the problems, so yes it if for getting corrupted MIPI signaling.

I did runt the command you did send. Thereby the command works but I do not get any output. I thus adjusted it to save the stream to file using:

v4l2-ctl -d /dev/video0 --set-fmt-video=width=1920,height=1080,pixelformat=RG10 --set-ctrl bypass_mode=0 --stream-mmap --stream-count=1 --stream-to=image.test

However trying to opening the image I get the following error:

Fatal error reading PNG image file: NOT a PNG file

For your reference I did upload the output if I try to open the image:

hello david.mueller1,

that v4l2-ctl commands fetch the sensor streams, it’s saving the bayer raw data directly.
hence, you may try 3rdparty applications to view the raw files, such as 7yuv.

Ok great thanks. So I unfourtunately was not able to get a lot of progress on my side. So should I run the following command:

v4l2-ctl -d /dev/video0 --set-fmt-video=width=1920,height=1080,pixelformat=RG10 --set-ctrl bypass_mode=0 --stream-mmap --stream-count=1

with or without: stream-to.

I did try it to save to file and open it a YUV viewer such as YUVview but I never got any result out of it.

Furthermore do you have any update on being able to run my gsteamer pipeline which is not affacted by corrupted frames ? Or could I embed the v4l2-ctl command in a C++ executable ?

hello david.mueller1,

is it really fetch to sensor stream correctly?
please try having more frame counts, i.e. --stream-count=100. you should saw < below v4l2 pipeline for each success capture frames.
and… please also gather the kernel logs ($ dmesg --follow) to check whether there’s error reported.

Ah ok. So the capture thus successful. So I got the following output just using the command:

<<<<<<<<<<<<<<<<<<<<<21fps
<<<<<<<<<<<<<<<<<<<<<21fps
<<<<<<<<<<<<<<<<<<<<<21fps
<<<<<<<<<

However, I did not get any output file. If I save the file I can however still not see the results.

Thanks

hello david.mueller1,

it does look like capture success, I assume there’s no failure reported in kernel side (i.e. $ dmesg), right?
where did you write the frames to? is it due to permission issue? please try --stream-to=/tmp/, or, other path without root restriction.

I get the following output from dmesg:

[  140.500487] bwmgr API not supported
[  145.149776] bwmgr API not supported
[  154.897204] bwmgr API not supported
[  156.057182] bwmgr API not supported
[  168.193935] bwmgr API not supported
[  173.035599] bwmgr API not supported

I am writing the frames to the home directory of the user, thus I have the required permissions.

Below the output using 7yuv is displayed. I obviously use the wrong decoding of the raw data but the image get captured.

Now my preferred application would still be to use gstreamer with nvcamerasrc. However thereby I have the problem that the stream gets stopped as soon as I get a corrupted frame. Therefore is there a solution to this problem ? Or is there a way to get the images as jpeg using gstreamer with v4l2src ?

Thanks.

it looks there’s valid frame content according to your 7yuv results.

may I know what’s the error logs?
is there intermittently MIPI signaling? but it seems your v4l2-ctl shows frame outputting consistently.

according to Camera Architecture Stack, v4l2src and nvarguscamerasrc they’re using different pipeline.
you’re using a bayer sensor, right?
you cannot use v4l2src to fetch bayer sensor and converting as JPEG file directly.

Yes I am using a Bayer Filter.

Ok so I did get 100 frames without any interrupt. This is why v4l2-ctl did not fail. Usually it does take some time to get a interitten MIPI signal. However, I could try to reproduce one using v4l2-ctl and nvcamersrc and send you the output of dmesg. Is this the error logs you are looking for ?

hello david.mueller1,

it does have intermittently MIPI signaling.
so, it’s suggest to test with libargus with error handling mechanism.
you may download MMAPI, i.e. $ sudo apt install nvidia-l4t-jetson-multimedia-api
there’s Argus sample with error handling, and error resiliency, i.e. /usr/src/jetson_multimedia_api/argus/samples/userAutoExposure/

please see-also similar discussion threads for reference,


BTW,
please gather the error logs as well, it helps to dig into the issue explicitly.

So v4l2-clt does recover from intermitted MIPI signaling. Here is the output of dmesg:

[76743.365339] (NULL device *): vi_capture_control_message: NULL VI channel received
[76743.373040] t194-nvcsi 13e40000.host1x:nvcsi@15a00000: csi5_stream_close: Error in closing stream_id=2, csi_port=2
[76743.383699] (NULL device *): vi_capture_control_message: NULL VI channel received
[76743.391415] t194-nvcsi 13e40000.host1x:nvcsi@15a00000: csi5_stream_open: VI channel not found for stream- 2 vc- 0
[76743.402135] tegra-camrtc-capture-vi tegra-capture-vi: err_rec: successfully reset the capture channel
[76745.969890] tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: request timed out after 2500 ms
[76745.979025] tegra-camrtc-capture-vi tegra-capture-vi: err_rec: attempting to reset the capture channel
[76745.989333] (NULL device *): vi_capture_control_message: NULL VI channel received
[76745.997035] t194-nvcsi 13e40000.host1x:nvcsi@15a00000: csi5_stream_close: Error in closing stream_id=2, csi_port=2
[76746.007679] (NULL device *): vi_capture_control_message: NULL VI channel received
[76746.015385] t194-nvcsi 13e40000.host1x:nvcsi@15a00000: csi5_stream_open: VI channel not found for stream- 2 vc- 0
[76746.026090] tegra-camrtc-capture-vi tegra-capture-vi: err_rec: successfully reset the capture channel

The thread you mentioned is exactly what I wrote down in my first message to be one of the things I have tried out. So using the provided and updated camera drivers in this thread, I have now tried your suggested /usr/src/jetson_multimedia_api/argus/samples/userAutoExposure/.
However, I get the following error, which you can see below:

As you can see, as soon as I get an intermitted MIPI signal the application does throw an error and stops. However v4l2-ctl was able to recover.

hello david.mueller1,

the expectation is MIPI signal should coming to CSI brick continuously without errors.
by default, it’s error handling mechanism to abort when a frame becomes corrupted.

as you can see… it’s sample app to check EventTypes,
so, you may modify the code to drop error frames, and continue wait for next good frames.

Hello JerryChang

Yes thanks. This exactly what I thought. Do you have an example on how to adjust the code for dropping error frames ? I already tried that for an application using your EGLStream, however I was not able to find any documentation and therefore could not implement it myself.

hello david.mueller1,

there’re three EventTypes: EVENT_TYPE_ERROR, EVENT_TYPE_CAPTURE_STARTED and EVENT_TYPE_CAPTURE_COMPLETE.
please see-also Core Event types defined in Event.h.

currently, the way to handle corrupt frames is Argus will report it via EVENT_TYPE_ERROR, it’ll lock the queue, stopping the capture request, abort the pipeline. and the application has to shutdown.
I’m also curious why intermittently MIPI signaling occur for bayer sensor use-case. is it due to loosen hardware connections?

Hello JerryChang

Yes I already saw that. However this is not where the application is failing. If you look at the output of the terminal you can see the following output: Failed to CaptureMetadata:

From the script for autoexposure you can see the the event got the type for completion.

            if (iEvent->getEventType() == EVENT_TYPE_CAPTURE_COMPLETE) {
                frameCaptureLoop++;
                const IEventCaptureComplete* iEventCaptureComplete
                    = interface_cast<const IEventCaptureComplete>(event);
                EXIT_IF_NULL(iEventCaptureComplete, "Failed to get EventCaptureComplete Interface");

                const CaptureMetadata* metaData = iEventCaptureComplete->getMetadata();
                const ICaptureMetadata* iMetadata = interface_cast<const ICaptureMetadata>(metaData);
                EXIT_IF_NULL(iMetadata, "Failed to get CaptureMetadata Interface");

Please is there a way to continue the application wihtout the need to restart it. For my error above or the EVENT_TYPE_ERROR. Otherwise the use of MIPI Cameras is not an option for us. I have checked the hardware connection several times. We have a robotics application which is running for ours and aborting the application is not an option. We could however skip a frame.

Please could you provide a solution and / or sample application which is robust to intermitted MIPI signals ?

as mentioned, so far the way to handle corrupt frames is Argus will report it via EVENT_TYPE_ERROR, it’ll lock the queue, stopping the capture request, abort the pipeline. and the application has to shutdown.

hello david.mueller1,

since userAutoExposure is application to parse exposure settings via metadata,
it’s by design to force exit with failure reported. (i.e. return 1;)

if it’s known next frame will be a good buffer.
for your use-case to skip a frame, please give it a try to revise the code to skip current capture buffer, for example, if (iMetadata == NULL) continue; to start while loop for the next frame.