Libargus handling of bad camera packets

We are currently facing an issue with capture error handling in libargus. Our camera setup is using an FPD-Link III SerDes setup with a Jetson Xavier AGX. We are using the libargus FrameConsumer method acquireFrame to grab a frame from the existing buffer. Unfortunately, if we do not get a STATUS_OK return from this call, we are unable to acquire another frame.

So far I have tried reinitializing all of the libargus CaptureSession classes to try and reacquire a frame without restarting nvargus-daemon.service, but this has been unsuccessful. The only solution is to close out of our video capture application, restart nvargus-daemon.service, and then start our capture application again.
I did find this forum post that mentions adding libargus error handling that is describing a similar issue to us:

The overall question: Is there any error handling in libargus for bad camera packets that does not involve restarting libargus and our camera application?

Hi janderson,

unfortunately, there is no other solution. As it appears, the Argus Frame Capture method seems to hold locks (mutexes) when the frame receive error is detected and does not release the mutexes, which prevents any further processing, e.g. almost all other API methods hang infinitely. The problem is in libargus, so both the Argus daemon and applications using libargus are affected. We ended up not using Argus any longer.

Apparently, NVidia is not willing or not able to find and fix the error. The closed-source strategy prevents anyone else from doing NVidia’s work as well, so the issue is stuck.

Kind regards.

This should have been fixed in JetPack 4.5 and later. Which JetPack version are you using?

We are currently using Jetpack 4.6.

Hi,
The error handling is in the sample of Jetpack 4.6.3:

/usr/src/jetson_multimedia_api/samples/09_camera_jpeg_capture

Please get the sample of Jetpack 4.6.3 and try to run it on your Jetpack 4.6 system, and see if the error is captured and reported. The error status is defined in

void ConsumerThread::printErrorStatus(Argus::Status status);
1 Like

I was doing outage various error tests today with the 09_camera_jpeg_capture example, and I seem to be able to recover from Argus::STATUS_TIMEOUT errors most of the time by going into software standby, waiting a few seconds, and then going back into streaming mode, but was wondering if there are any other types that libargus should be able to recover from.
If so, do you have any ways that I can easily and reliably simulate these types of errors?
Also, does libargus have the capability of reseting the VI if it receives any of these errors?

Thanks!

FYI,
here’s also another way for testing this. you can use software simulated methods.
for example, you may used software commands to force stop the camera steam,
i.e. # echo 0 > /sys/kernel/debug/camera-video0/streaming
this is from software side to force-stop the video stream, and there will be timeout failures from camera pipeline. Argus will report it via EVENT_TYPE_ERROR, and the application has to shutdown.

1 Like

Thanks for all of the suggestions. I am now able to get error recovery working on the example in /usr/src/jetson_multimedia_api/samples/09_camera_jpeg_capture. I slightly modified the program so that main now looks like this:

int main(int argc, char * argv[])
{
    if (!parseCmdline(argc, argv))
    {
        printHelp();
        return EXIT_FAILURE;
    }

    NvEglRenderer *renderer;

    for (int i =0 ; i < NUM_RESTARTS; i++){
        std::cout << "Starting capture session " << i << " ..." << std::endl;

        
        renderer = NvEglRenderer::createEglRenderer("renderer0", PREVIEW_SIZE.width(),
                                            PREVIEW_SIZE.height(), 0, 0);

        if (!renderer)
            ORIGINATE_ERROR("Failed to create EGLRenderer.");

        if (!ArgusSamples::execute(renderer))
            return EXIT_FAILURE;

        std::cout << "main: Deleting renderer..." << std::endl;
        delete renderer;

        end = Clock::now();
        diff = end - start;
        std::cout << "execute:  Time to deinitialize the camera (s): " << diff.count() << std::endl;
        deinit_list.push_back(diff.count());
    }
    //Print init and deinit times
    print_init_times();

    std::cout << "main: Exiting successfully..." << std::endl;
    return EXIT_SUCCESS;
}

The gist of this is that it now initializes the camera, runs the stream for a certain amount of time, and then deletes all of the libargus objects. This is then run repeatedly for a set amount of runs.
I am able to restart the stream for around 5 runs or so using this main routine. After that, I need to restart nvargus using systemctl. I guess I have a couple of questions rewarding libargus:

  • When libargus restarts, does it reset the Video Interface (VI) on the Xavier?
  • Is there a way to manually restart the VI once a frame error is detected without using libargus in a way that doesn’t involve reinitializing all of the libargus objects, thus reducing the amount of latency between dropped frames?

Thanks!

hello janderson,

once there be timeout failures from camera pipeline. you should report the error flag EVENT_TYPE_ERROR for shutting down the application.
by using systemctl to restart Argus is a safe way to restore broken sockets, it could recover failure use-case for most of time.

nope. there’s no ways to restart VI manually. it’s actually not necessary, kernel driver will forced terminal the process.
since low-level failure it’s usually signal related, for example, a frame error is not accepted by VI driver. you need to revise such failure case carefully.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.