Reaching minimal camera latency with libargus on Jetson Xavier NX

Hello,
Me and my team we are working on computer vision system that runs on Jetson Xavier NX. It is very important for us to have low latency from the image being taken to it being available in our code.
The system utilizes Raspberry Pi Camera v2 connected to the board with CSI. At first our system utilized code that implements OpenCV and GStreamer to launch the video pipeline in a similar manner as below:

pipeline = "nvarguscamerasrc sensor_mode=2 ! nvvidconv flip-method=0 ! video/x-raw, width=1920, height=1080 ! nvvidconv ! appsink";
VideoCapture cap;
cap.open(pipeline, CAP_GSTREAMER);

We conducted Glass2Glass latency tests to establish the speed of data transfer between the camera and the machine and results were unsatisfying: 90ms per frame .
Therefore we decided to check what can we gain by using pure libargus from Jetson Multimedia API. As far as our knowledge reached we understood that by using libargus to communicate with the camera we can benefit from Nvidia hardware acceleration. We utilized a sample code for argus which can be found in samples for Jetson Multimedia API, the sample we tested was called argus_oneshot. Below I’m showing a piece of the source code modified for measuring (as we believe) the time it takes this solution to gather a frame from the camera (data transfer):

for (size_t i = 0; i < 30; i ++) {
        typedef std::chrono::high_resolution_clock Clock;
        auto t1 = Clock::now();

        uint32_t requestId = iSession->capture(request.get());
        EXIT_IF_NULL(requestId, "Failed to submit capture request");

        /*
        * Acquire a frame generated by the capture request, get the image from the frame
        * and create a .JPG file of the captured image
        */
        Argus::UniqueObj<EGLStream::Frame> frame(
            iFrameConsumer->acquireFrame(FIVE_SECONDS_IN_NANOSECONDS, &status));

        auto t2 = Clock::now();
        float currentDurration = float(std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count()) / 1000.0f;
        std::cout << "Time for the frame to arrive: " << currentDurration << " ms" << std::endl;

        EGLStream::IFrame *iFrame = Argus::interface_cast<EGLStream::IFrame>(frame);
        EXIT_IF_NULL(iFrame, "Failed to get IFrame interface");

        EGLStream::Image *image = iFrame->getImage();
        EXIT_IF_NULL(image, "Failed to get Image from iFrame->getImage()");
        
        t2 = Clock::now();
        currentDurration = float(std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count()) / 1000.0f;
        std::cout << "Time taken for one iteration: " << currentDurration << " ms" << std::endl;

        EGLStream::IImageJPEG *iImageJPEG = Argus::interface_cast<EGLStream::IImageJPEG>(image);
        EXIT_IF_NULL(iImageJPEG, "Failed to get ImageJPEG Interface");

        status = iImageJPEG->writeJPEG(FILE_PREFIX "argus_oneShot.jpg");
        EXIT_IF_NOT_OK(status, "Failed to write JPEG");

        printf("Wrote file: " FILE_PREFIX "argus_oneShot.jpg\n");
    }

We were running it in loop to see if the results can improve after a while. We measured the time for two situations:

  1. from capture to acquiring a frame
  2. from capture to getImage()
    This allowed us to not only see the overall time but also how much it takes the method getImage() to process.

Below are the results:

Time for the frame to arrive: 237.22 ms
Time taken for one iteration: 237.303 ms

They are even slower than what we achieved using OpenCV solution. Here are our questions:

  1. Does utilizing libargus with CSI camera like Raspberry Camera v2 allow us to benefit from Nvidia hardware acceleration for video processing ? What can be the lowest latency for the data transfer with 2 lanes camera communication?
  2. If the answer for the first question is: Yes!, then what are we doing wrong that we are getting such a slow time?
  3. Does anyone argue that we measured the data transfer with libargus in a correct way?
  4. Can we use OpenCV launching a proper pipeline with GStreamer to benefit from hardware acceleration?
  5. What does the getImage() method do? We couldn’t really find a source code for that.

By the way we are aware that moving to 4 lanes communication can bring some improvements but right now it is impossible for us.

1 Like

Assuming that your framerate is 30 fps, so 33ms period, for glass to glass measurement you have one frame being captured, while previous one is being debayered by argus, then converted by VIC and passed to your app, where the frame before previous one is being displayed on monitor, so I think that 90 ms is a pretty good glass to glass latency.

You may use a higher framerate for lowering more. The RPi v2 camera can do 60 fps in 720p mode, or you can also achieve 120 fps by rebuilding kernel and dtb (see 120 fps mode support removed for imx219 sensor - #9 by Honey_Patouceul for original post and Gst-launch-1.0 nvarguscamerasrc can't get right GST_ARGUS: Available Sensor modes - #12 by Honey_Patouceul for recent JP5 releases).
Also be sure of your monitor(s) framerate(s) for the experiments.

Does it make a difference if one uses the argus library or a gstreamer with an nvarguscamerasrc?
Or should they have roughly a similar performance?

I think that nvarguscamerasrc and other nvidia gstreamer elements have a pretty good implementation using Argus lib.
In some particular cases, though, you may achieve better performance using Argus with an Argus skilled programmer.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.