Speeding up Streaming & Stitching

Hello everyone,

I’m having issues with some latency in our GStreamer pipeline and I cannot find any additional resources with helpful information on how to reduce or eliminate it. Does anyone know any additional GStreamer resources on their pipelines or other methods to achieve this goal? Here’s the pipeline I have and we’re streaming three cameras simultaneously, but there’s an approximate 0.5 second lag.

DISPLAY=:0 gst-launch-1.0 nvcamerasrc sensor-id=0 fpsRange="30 30" ! 'video/x-raw(memory:NVMM), width=(int)640, height=(int)480, format=(string)I420, \
framerate=(fraction)30/1' ! nvegltransform ! nveglglessink  nvcamerasrc sensor-id=2 fpsRange="30 30" ! 'video/x-raw(memory:NVMM), width=(int)640, \
height=(int)480, format=(string)I420, framerate=(fraction)30/1' ! nvegltransform ! nveglglessink nvcamerasrc sensor-id=1 fpsRange="30 30" ! 'video/x-raw(memory:NVMM), width=(int)640, height=(int)480, format=(string)I420, framerate=(fraction)30/1' ! nvegltransform ! nveglglessink -e

The reason I’m needing to speed this up is because I’m trying to make sure we’ve optimized our pipeline before the rest of my code that stitches the feeds together. When I put the above pipeline into C++ code and use the stitching function there’s a major lag in producing a streaming stitched output. My code is below and I was hoping to get some help and also resources on how to improve this. I’ve worked on this for a while and this isn’t the first version for the software.

I’m also having an issue when trying to get three camera inputs for the stitched feed. The feed only shows two cameras and then if I put my hand in front of the third camera it’ll show that image for a few seconds and then go back to the other two-camera feed.

I’m also not convinced that it isn’t my setup for Jetpack and/or the utilization other software applications. I’m utilizing “./jetson_clocks.sh” and “nvpmodel -m 0” every time I run the Jetson TX2.

#include <opencv2/opencv.hpp>
#include <opencv2/stitching.hpp>
#include <iostream>
#include <vector>

using namespace std;
using namespace cv;

Stitcher::Mode mode = Stitcher::PANORAMA;

int streaming(VideoCapture cap1, VideoCapture cap2)
{
	Mat fr1, fr2, copy1, copy2, pano;
    	bool try_use_gpu = true;
    	vector<Mat> imgs;

        	cap1 >> fr1;
       		cap2 >> fr2;
		imgs.push_back(fr1);
		imgs.push_back(fr2);

        	Ptr<Stitcher> test = Stitcher::create(mode, try_use_gpu);
		Stitcher::Status status = test->stitch(imgs, pano);

        	if (status != Stitcher::OK)
        	{
          		cout << "Error stitching - Code: " <<int(status)<<endl;
            		return -1;
        	}

       		imshow("Stitched Image", pano);

        	if(waitKey(1) >= 0) 
            		return -1;
}

int main(int argc, char *argv[])
{
	VideoCapture cap1("nvcamerasrc sensor-id=0 ! video/x-raw(memory:NVMM), width=(int)640, height=(int)480, format=(string)I420, framerate=(fraction)30/1 ! nvvidconv flip-method=0 ! video/x-raw, format=(string)BGRx ! videoconvert ! video/x-raw, format=(string)BGR ! appsink", CAP_GSTREAMER); //middle output

	VideoCapture cap2("nvcamerasrc sensor-id=2 ! video/x-raw(memory:NVMM), width=(int)640, height=(int)480, format=(string)I420, framerate=(fraction)30/1 ! nvvidconv flip-method=0 ! video/x-raw, format=(string)BGRx ! videoconvert ! video/x-raw, format=(string)BGR ! appsink", CAP_GSTREAMER); //output closest to center of board

	if(!cap1.isOpened()){
		cout << "connection cap1 failed" << endl;
		return -1;
	}

	if(!cap2.isOpened()){
		cout << "connection cap2 failed" << endl;
		return -1;
	}
	
	while(true)
	{
		int test = 0;
		
		test = streaming(cap1, cap2);
		cout << "test: " << int(test) << endl;
   	}

    	return 0;
}

Thanks for any help and advice.

hello HoosierBeav,

  1. how did you evaluate the “0.5 second lag”, and what’s your performance expectation.
  2. you may break-down the code for profiling, which statement takes most system resources.

Hi JerryChang,

  1. They way that this “0.5 second lag” was measured was by putting a running timer on our screen computer screen that is connected to the TX@, aimed all three cameras to the screen, and took a picture of the screen with my camera phone.

  2. I’m not sure that I completely understand this question, but if I do the stitcher function would probably be using the most resources of the system.

Thanks

hello HoosierBeav,

  1. may i know which JetPack release package you’re working on?
    FYI, we have improvements for the initial capture latency on R28.2

  2. since you’re measuring the capture-display latency.
    my suggestion to break-down the time consuming is adding some debug prints between function calls, evaluate which function call takes most of time.

Hi JerryChange,

I am the teammate with HoosierBeav, and we have Jetpack 3.1 and L4T Production Release 28.1. We have already tried jetPack 3.2 and still have latency issue.

Thanks for the suggestion on the latency about debugging, we are working on and looking forward on this.