GPU Acceleration Support for OpenCV Gstreamer Pipeline

myagmur · July 24, 2020, 10:00am

Hi,

We run following the pipeline in OpenCV by using Raspberry Pi HQ camera.

gst_pipeline = "nvarguscamerasrc ! video/x-raw(memory:NVMM), width=4032, height=3040, format=(string)NV12, framerate=30/1 ! nvvidconv flip-method=0 ! video/x-raw, width=4032, height=3040, format=(string)BGRx ! videoconvert ! video/x-raw, format=(string)BGR ! appsink

cv2.VideoCapture(gst_pipeline,  cv2.CAP_GSTREAMER)

The camera supposed to give 12MP@30fps, but we only get 15fps with full camera resolution. It is said to us that 30fps performance is only achievable when capturing with gstreamer pipeline itself.

I wonder if there is a GPU acceleration way for OpenCV to achieve better performances.

And would you also recommend any changes in this pipeline to increase the overall performance, such as reducing CPU load, RAM usage, latency, high visual quality etc…

Thank you

Honey_Patouceul · July 24, 2020, 11:17am

An application linked to opencv would only receive frames into CPU allocated cv::Mat using opencv videoio. This is not very fast and would not work for high resolutions/framerates. Same applies to CPU Mat processing.

You would indeed use GPU processing from NVMM memory in gstreamer in such case.

You can access NVMM buffers with gstreamer plugin nvivafilter. It is intended to perform CUDA operations on NVMM hosted frames, so you can use it with opencv CUDA. You would have to output RGBA frames from this plugin. You may have a look to this example.

Also note that you can directly access from gstreamer buffer if your application builds the gstreamer pipeline.

Honey_Patouceul · July 24, 2020, 1:43pm

Additional note: The main bottleneck is opencv videoio. Another alternative is to use @dusty_nv 's jetson-utils library having much more efficient implementation.
If you’ve built and installed jetson-inference, it should already be installed in your Jetson. Note that this assumes a recent version with various video sources support, so be sure you have a version pulled after end of June 2020.

The following example reads frames from CSI camera, creates an opencv GpuMat with received image, in GPU converts BGR into HSV, extracts H for applying a binary threshold, then converts back to RGB and finally displays the transformed frame:

#include <iostream>
#include <vector>

#include <jetson-utils/videoSource.h>
#include <jetson-utils/videoOutput.h>

#include <opencv2/opencv.hpp>
#include "opencv2/cudaarithm.hpp"
#include "opencv2/cudaimgproc.hpp" 


int main(int argc, char **argv) {

	// create input stream
	videoOptions opt;
	opt.width  = 3264;
	opt.height = 2464;
	opt.frameRate = 21;
	opt.zeroCopy = false; // GPU access only for better speed
	videoSource * input = videoSource::Create("csi://0", opt);
	if (!input) {
		std::cerr << "Error: Failed to create input stream" << std::endl;
		exit(-1);
	}


	// create output stream
	videoOutput* output = videoOutput::Create("display://0");
	if( !output ) {
		std::cerr << "Error: Failed to create output stream" << std::endl;
		delete input;
		exit(-2);
	}


	// Read one frame to get resolution
	uchar3* image = NULL;
	if( !input->Capture(&image, 1000) )
	{
		std::cerr << "Error: failed to capture first video frame" << std::endl;
		delete output;
		delete input;
		exit(3);
	}


	/*
	 * processing loop
	 */
	cv::cuda::GpuMat d_Mat_HSV(input->GetHeight(), input->GetWidth(), CV_8UC3);
	std::vector<cv::cuda::GpuMat> d_hsv(3);
	double prev = (double) cv::getTickCount();
	while( 1 )
	{
		// capture next image
		if( !input->Capture(&image, 1000) )
		{
			std::cerr << "Error: failed to capture video frame" << std::endl;
			continue;
		}
		// log timing
		double cur = (double) cv::getTickCount();
		double delta = (cur - prev) / cv::getTickFrequency();
		std::cout<<"delta=" << delta << std::endl;
		prev=cur;

		// Some OpenCv processing
		cv::cuda::GpuMat frame_in(input->GetHeight(), input->GetWidth(), CV_8UC3, image);
		cv::cuda::cvtColor(frame_in, d_Mat_HSV, cv::COLOR_RGB2HSV);
		cv::cuda::split(d_Mat_HSV, d_hsv);
		cv::cuda::threshold(d_hsv[0], d_hsv[0], 100, 255, cv::THRESH_BINARY);
		cv::cuda::merge(d_hsv, d_Mat_HSV);
		cv::cuda::cvtColor(d_Mat_HSV, frame_in, cv::COLOR_HSV2RGB);

		// Display result
		output->Render((uchar3*)frame_in.data, input->GetWidth(), input->GetHeight());
		if( !output->IsStreaming() )
			break;
		if( !input->IsStreaming() )
			break;
	}

	delete input;
	delete output;
   	return 0;
}

I built against opencv-4.4.0-pre installed in /usr/local/opencv-4.4.0-pre, so:

g++ -std=c++11 -Wall -I/usr/local/opencv-4.4.0-pre/include/opencv4 -I/usr/local/cuda/targets/aarch64-linux/include test-jetson-utils-opencv.cpp -L/usr/local/opencv-4.4.0-pre/lib -lopencv_core -lopencv_cudaarithm -lopencv_cudaimgproc -ljetson-utils -o test-jetson-utils-opencv

My camera can only run at 21fps with this resolution, but it seems to work fine.

Andrey1984 · July 25, 2020, 4:43pm

trying with default preinstalled opencv from JP_4.4_GA
git clone https://github.com/dusty-nv/jetson-inference
cd jetson-inference
git submodule update --init
mkdir build
cd build
cmake ..
make -j8
sudo make install

g++ -std=c++11 -Wall -I/usr/local/opencv-4.3.0-dev/include/opencv4 -I/usr/local/cuda/targets/aarch64-linux/include test-jetson-utils-opencv.cpp -L/usr/local/opencv-4.3.0-dev/lib -lopencv_core -lopencv_cudaarithm -lopencv_cudaimgproc -ljetson-utils -o test-jetson-utils-opencv
 ./test-jetson-utils-opencv 
./test-jetson-utils-opencv: error while loading shared libraries: libopencv_cudaarithm.so.4.3: cannot open shared object file: No such file or directory
locate libopencv_cudaarithm.so.4.3
/usr/local/opencv-4.3.0-dev/lib/libopencv_cudaarithm.so.4.3
/usr/local/opencv-4.3.0-dev/lib/libopencv_cudaarithm.so.4.3.0

previously, before installing jetson inference I used to build examples with the command below, which still works

g++ -o simple_opencv -Wall -std=c++11 simple_opencv.cpp $(pkg-config --cflags --libs opencv4)
./simple_opencv 
GST_ARGUS: Creating output stream
CONSUMER: Waiting until producer is connected...

seems some PATH missed

Honey_Patouceul · July 25, 2020, 4:46pm

It seems you’ve installed opencv into /usr/local/opencv-4.3.0-dev, which is not a default path for libs.
Try:

export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/opencv-4.3.0-dev/lib

and retry.

Andrey1984 · July 25, 2020, 4:47pm

worked
Merci

Andrey1984 · July 25, 2020, 5:00pm

does the quality of the image seem reasonable? using the default AGX CSI sensor with the default code above?

Honey_Patouceul · July 25, 2020, 5:05pm

The binary threshold on Hue space has no more sense than making this example… The result may depend on the lightening and colors of objects.
You would comment it or comment the whole opencv processing for checking.

Andrey1984 · July 25, 2020, 5:15pm

I anticipate that ~~opencv 4.3.0-dev got installed by the jetson-inference package, as before the inference it was a default stock 4.4 GA OS with the preinstalled opencv~~. Probably I installed somehow the opencv vbersion a while ago somehw. I shall try some other day so that there will be more sun, also on another device [NX] that has the stock OS from 4.4.GP.
Thank you very much!

dusty_nv · July 26, 2020, 3:22am

Hi @Andrey1984, I no longer install OpenCV libraries in jetson-inference install script. And even when it did, it would have pulled OpenCV 3.2 (which is the version in Ubuntu Bionic apt repo).

Andrey1984 · July 26, 2020, 3:25am

Thank you for letting me know!
The hyphotesis seems turned out to be a ‘false positive’,

Honey_Patouceul · July 26, 2020, 10:44pm

@Andrey1984

If using sensor OV5693, you would also adjust resolution and framerate for this sensor thru nvarguscamerasrc, if not already done.

Andrey1984 · July 26, 2020, 10:47pm

yes, adjusted;
various lighting would show different images;
some of them looked like segmentation;
some of them had clear picture;
depend on lighttning

Honey_Patouceul · July 27, 2020, 7:11pm

Yes, Argus may try to optimize digital gain, exposure and wbBalance. This may lead to out of expected color range in low lightening conditions.
I don’t know any software options to set these unless recreating a dirty patch, so I’d adivise to keep lightening high, unless someone tells how to do.
Segmentation-like images are expected for a binary threshold on H. It should set color to be either green or red.

LoveNvidia · August 4, 2020, 11:47am

Hi @dusty_nv, @Honey_Patouceul
I used gstreamer+opencv for decoding the RTSP stream with python codes. like that:

gstream_elemets = (
    'rtspsrc location=rtsp latency=300 !'
    'rtph264depay ! h264parse ! '
    'omxh264dec !'
    'video/x-raw(memory:NVMM),format=(string)NV12 !'
    'nvvidconv ! video/x-raw , format=(string)BGRx !'
    'videoconvert !'
    'appsink sync=0').
cv2.VideoCapture(gstream_elemets, cv2.CAP_GSTREAMER)

As you know that’s not very efficient way for decoding, because I copied the decoded frames from NVMM buffer to CPU buffer, that cause jetson used more memory for decoding.
Q1- I before tested the deepstream for multi-stream, and that not used any memory for decoding, because this sdk used NVMM buffer directory for GPU processing, I want to know, Is there a way to use opencv + gstream python without CPU buffer copy? I compiled the opencv 4.1.1 for CUDA support.
If there is not way with opencv + gstreamer, Is these a other solution for decoding the streams without copying to CPU buffer especially python code?

Q2 - If I want to connect USB Coral TPU for other processing, I have to bring into the decodes frames from NVMM buffer to CPU buffer?

Q3- In this diagram, batching of frames is done over CPU, I want to know, even in this way(deepstream solution), for gathering batch of frames, the decoded frames copied from NVMM buffer to CPU buffer, right? what’s difference between this solution and opencv+gstreamer solution? In both way the decoded frames bring into NVMM buffer into CPU buffer, right?

Q4- For scaling and cropping part of diagram, I also can use nvvidconv plugin in gstreamer+opencv to do these operation, I want to know this plugin in gstreamer+opencv use VIC HW?

Q5- Is it possible to access NVMM buffer from CPU?

Finally, I looking for a best python solution for multi-stream decoding without 2 times copied in memory from NVMM buffer to CPU buffer, I want to use in USB Coral TPU and jetson GPU.

Q6- nvivafilter plugin has post/pre processing, How does it? do custom pre/post processing? get function for do?

You would have to output RGBA frames from this plugin.

this plugin is like nvvidconv plugin, right?

DaneLLL · August 13, 2020, 4:10am

The post is duplicate of

chealsie.bains · October 4, 2021, 8:33pm

How would I do this in python? Additionally, is it possible to stream from a usb camera instead of a csi camera with jetson-utils?

Topic		Replies	Views
Gradualy increased memory usage when use gstreamer + opencv Jetson Nano opencv , gstreamer	26	3935	October 18, 2021
CSI-Camera Raspberry Pi v2 not work on Jetson Nano Jetson Nano	11	13644	October 14, 2021
Nvvidconv colorspace conversion difficulties Jetson Nano	6	1911	October 14, 2021
OpenCV application uneven frame times Jetson Xavier NX opencv , performance , opencl	14	2780	January 19, 2022
Explore the gstreamer pipeline with opencv Jetson Nano opencv	16	3626	October 18, 2021
Need advice: 4K video capture & writing performance with OpenCV Jetson Xavier NX opencv , gstreamer	6	2191	March 2, 2022
Increasing latency when using Gstreamer in OpenCV Jetson Orin Nano camera , opencv , gstreamer	8	939	February 21, 2024
Error generated while running the code after connecting the camera Jetson Xavier NX gstreamer , nvbugs	45	1252	January 2, 2024
Unable to use the correct GStreamer pipeline for e-CAM130_CUXVR with Jetson AGX Xavier in OpenCV Python Jetson AGX Xavier	6	2580	October 18, 2021
Multiple stream from one camera via gstreamer Jetson Nano camera , gstreamer	11	6631	October 15, 2021

GPU Acceleration Support for OpenCV Gstreamer Pipeline

Related topics