TX1 multi-channel decoding and Video stitching

I am now doing a project that requires 4-way 1920x1080 30fps decoding and the decoded video is stitched together for display.

I now use opencv to complete this function, but I found that when I use filesrc for 4-way decoding and splicing, cpu occupancy rate is full, only about 7 frames per second.

What I need is video stitching after at least 30 frames of smooth display. Will the TX1 platform in terms of performance can meet my needs, I know VIC module can do this thing, but I did not find any call on the VIC module API, Multi Media API has some VIC control code, but no video stitching related API

Please help me, or give me some advice, thanks a lot !

I use the opencv version is 3.2.0, the following is the compiler configuration:

cmake -DWITH_GSTREAMER=ON -DWITH_OPENGL=ON -DWITH_GTK_2_X=ON -DWITH_QT=ON -DWITH_CUDA=ON -DWITH_CUBLAS=ON -DWITH_NVCUVID=ON -DCUDA_ARCH_BIN="5.3" -DCUDA_ARCH_PTX="" -DBUILD_TESTS=OFF -DBUILD_PERF_TESTS=OFF -DCUDA_FAST_MATH=ON -DCMAKE_INSTALL_PREFIX=/home/ubuntu//workplace/test/opencv-3.2.0 ..

Here is my opencv test program:

#include <stdio.h>
#include <unistd.h>
#include <stdint.h>
#include <opencv2/opencv.hpp>

using namespace cv;
using namespace std;

int main(int argc, char** argv )
{
    cv::VideoCapture cap("filesrc location=camera_1080P30_8M.mp4 ! decodebin ! videoconvert ! video/x-raw, format=(string)BGR, framerate=30/1 ! appsink -e");
    if(!cap.isOpened()){ 
 		std::cout<<"read cam failed"<<std::endl;
        return -1;
	}
    cv::VideoCapture cap1("filesrc location=camera_1080P30.mp4 ! decodebin ! videoconvert ! video/x-raw, format=(string)BGR, framerate=30/1 ! appsink -e");
    if(!cap.isOpened()){ 
 		std::cout<<"read cam failed"<<std::endl;
        return -1;
	}
    cv::VideoCapture cap2("filesrc location=test.mp4 ! decodebin ! videoconvert ! video/x-raw, format=(string)BGR, framerate=30/1 ! appsink -e");
    if(!cap.isOpened()){ 
 		std::cout<<"read cam failed"<<std::endl;
        return -1;
	}
    cv::VideoCapture cap3("filesrc location=test1.mp4 ! decodebin ! videoconvert ! video/x-raw, format=(string)BGR, framerate=30/1 ! appsink -e");
    if(!cap.isOpened()){ 
 		std::cout<<"read cam failed"<<std::endl;
        return -1;
	}

	Mat frame,frame1,frame2,frame3;
    Mat combine1,combine2,combine;
	namedWindow("edges",WINDOW_OPENGL);
	int cnt = 0;

    for(;;)
    {
	const int64 start = getTickCount();

        cap >> frame; // get a new frame from file
        cap1 >> frame1; // get a new frame from  file
        cap2 >> frame2; // get a new frame from  file
        cap3 >> frame3; // get a new frame from  file

	hconcat(frame,frame1,combine1);	//Video for horizontal stitching
		
	hconcat(frame2,frame3,combine2);//Video for horizontal stitching	
		
	vconcat(combine1,combine2,combine);//Video for vertical stitching

	imshow("edges", combine);

        waitKey(1);
	const double timeSec = (getTickCount() - start) / getTickFrequency();
	cout << "Time : " << timeSec * 1000 << " ms" << endl;
		
    }
    return 0;
}

Run-time cpu usage of all 100%:

RAM 2408/3995MB (lfb 61x4MB) cpu [100%,100%,100%,100%]@1734 GR3D 27%@76 EDP limit 0
RAM 2416/3995MB (lfb 61x4MB) cpu [100%,100%,100%,100%]@1734 GR3D 0%@153 EDP limit 0
RAM 2407/3995MB (lfb 61x4MB) cpu [100%,99%,100%,100%]@1734 GR3D 18%@76 EDP limit 0
RAM 2407/3995MB (lfb 61x4MB) cpu [100%,100%,100%,100%]@1734 GR3D 6%@76 EDP limit 0
RAM 2408/3995MB (lfb 61x4MB) cpu [100%,100%,100%,100%]@1734 GR3D 0%@460 EDP limit 0
RAM 2408/3995MB (lfb 61x4MB) cpu [100%,100%,100%,100%]@1734 GR3D 14%@76 EDP limit 0

The following figure shows the current display:
External Media

hello li_lin,

could you help us to analysis the performance bottleneck?
please evaluate the function timing to narrow down where is bottleneck.
thanks

Hi li_lin,

Have you made good progress on this issue?
Any experiment result can be shared to move this issue forward?

Thanks