Problem using gstreamer with opencv and cuda on TX2 with Jetpack 4.3

Hi! I’m currently developing an app on the Jetson TX2 using Gstreamer, OpenCV and CUDA, but I’m having trouble with the gstreamer side of the program.

This app is supposed to receive an RTSP-H264 encoded stream in opencv, process it and then stream it through UDP-VP8 to a client. I made this work in Windows but I wanted to jump to the Jetson TX2 to be able to use GPU-accelerated stream.

The pipelines I have working on windows are the following:

camera.open("rtspsrc location="+sourceIP+" latency=0 ! queue max-size-buffers=0 max-size-bytes=0 max-size-time=10 ! queue max-size-time=1 min-threshold-time=5 ! rtph264depay ! h264parse ! avdec_h264 ! videoconvert ! appsink"); 

 writer.open("appsrc ! queue ! videoconvert ! video/x-raw,width=" + to_string(img.cols) + ",height=" + to_string(img.rows) + ",framerate=" + to_string(num_fps) + "/1 ! vp8enc threads=4 deadline=1 ! rtpvp8pay ! udpsink host=224.1.1.1 port=5000 auto-multicast=true", cv::VideoWriter::fourcc('V','P','8','0'),   num_fps,cv::Size(img.cols, img.rows),   true);

I’m trying to accomplish the same functionality but using omx / nvv4l2 to accelerate the stream with GPU, but I haven’t been able to get much done that works. Using the gstreamer documentation I managed to get this:

"rtspsrc location="+sourceIP+" latency=0 ! queue max-size-buffers=0 max-size-bytes=0 max-size-time=10 ! queue max-size-time=1 min-threshold-time=5 ! rtph264depay ! h264parse ! nvv4l2decoder enable-max-performance=1 ! appsink"

pipeline = "appsrc ! queue ! videoconvert ! video/x-raw,width=" + to_string(img.cols) + ",height=" + to_string(img.rows) + ",framerate=" + to_string(num_fps) + "/1 ! omxvp8enc ! rtpvp8pay ! udpsink host=224.1.1.1 port=5000 auto-multicast=true"

but it’s performing even worse than the non-gpu pipelines, my windows app can run a 720p stream with 25-30 stable fps but the Jetson can’t get above 20 with the pipelines above, and it looks like it’s not even using the GPU (tegrastats displays max 10% usage)

Do you have any tips on why is it not using the GPU or how to improve the pipeline quality?

I’m using:
NVIDIA Jetson TX2
L4T 32.3.1 [ JetPack 4.3 ]
Ubuntu 18.04.4 LTS
Kernel Version: 4.9.140-tegra
CUDA 10.0.326
OpenCV 4.1.1
GStreamer 1.14.5-0ubuntu1~18.04.1

Hi @Kaladin,

First suggestion is to change the videoconvert element to nvvidconv, as it is the hardware accelerated version of the conversion element.

Regards,
Fabian
www.ridgerun.com

Hi @fabian.solano, thank you so much for your suggestion!

I’ve tried to change it to nvvidconv but it doesn’t seem to be an equivallent of videconvert since it just spits out an error when trying to use it in that pipeline, I’ve tried adapting it but nothing has worked so far

Can you post the error?

Ok, a little update, I made the following pipelines work:

//Reader
camera.open("rtspsrc location=rtsp://192.168.0.16:8554/video !  application/x-rtp, media=(string)video, encoding-name=(string)H264, payload=(int)96 ! rtph264depay ! h264parse ! omxh264dec ! videoconvert ! appsink");

//Writer
 writer.open("appsrc ! queue ! nvvidconv ! video/x-raw(memory:NVMM),width=" + to_string(img.cols) + ",height=" + to_string(img.rows) + ",framerate=" + to_string(num_fps) + "/1 ! omxvp8enc ! rtpvp8pay ! udpsink host=224.1.1.1 port=5000 auto-multicast=true", cv::VideoWriter::fourcc('V','P','8','0'),   num_fps,cv::Size(img.cols, img.rows),   true);

But it’s still pretty slow and doesn’t appear to be using gpu at all. I tried replacing the last videoconvert in the reader pipeline like @fabian.solano suggested but it appears to do something weird with the image opencv receives, I get this output:

(main:29201): GStreamer-CRITICAL **: 18:39:01.465: gst_caps_is_empty: assertion 'GST_IS_CAPS (caps)' failed

(main:29201): GStreamer-CRITICAL **: 18:39:01.466: gst_caps_truncate: assertion 'GST_IS_CAPS (caps)' failed

(main:29201): GStreamer-CRITICAL **: 18:39:01.466: gst_caps_fixate: assertion 'GST_IS_CAPS (caps)' failed

(main:29201): GStreamer-CRITICAL **: 18:39:01.466: gst_caps_get_structure: assertion 'GST_IS_CAPS (caps)' failed

(main:29201): GStreamer-CRITICAL **: 18:39:01.466: gst_structure_get_string: assertion 'structure != NULL' failed

(main:29201): GStreamer-CRITICAL **: 18:39:01.466: gst_mini_object_unref: assertion 'mini_object != NULL' failed
NvMMLiteOpen : Block : BlockType = 261 
NVMEDIA: Reading vendor.tegra.display-size : status: 6 
NvMMLiteBlockCreate : Block : BlockType = 261

edit: I didn’t mention that the error above doesn’t stop the program, it keeps running but the writer appears to have stopped working? At least the following pipeline that I was using to check the output of the app has stopped responding to it:

gst-launch-1.0 udpsrc multicast-group=224.1.1.1 auto-multicast=true port=5000 ! "application/x-rtp, payload=100, clock-rate=90000, media=video, encoding-name=VP8" ! rtpvp8depay ! vp8dec ! xvimagesink

The last videoconvert cannot be changed to nvvidconv because appsink and opencv only work with buffers and information that is at the CPU memory, nvvidconv uses the GPU memory.

I have seen that OpenCV is not efficient loading buffers. You might want to give a try to use a GStreamer C/C++ code to load buffers and convert them later to OpenCV matrix.

Regards,
Fabian
www.ridgerun.com

Do you know where I can find documentation on how to accomplish that? I haven’t worked much with gstreamer code

Also, do you have any idea about the reason I cannot read the udp stream to test the program anymore? It’s seems to be working fine but I cannot access it anywhere

Thank you so much for your help!

You can refer to the GStreamer appsink tutorial: https://gstreamer.freedesktop.org/documentation/tutorials/basic/short-cutting-the-pipeline.html?gi-language=c

It might be related to the assertion errors you are getting in the log.

Regards,
Fabian
www.ridgerun.com

Great, I’ll look into it, thank you for everything!

Depends on your L4T release, but for R32.4.2 on Xavier, unsure how to use HW VP8 encoding.
However, you may try the following:
I simulated a H264 encoded RTSP stream from onboard camera with:

./test-launch "nvarguscamerasrc do-timestamp=true ! video/x-raw(memory:NVMM), width=640, height=480, framerate=30/1, format=NV12 ! nvvidconv ! omxh264enc ! video/x-h264, profile=baseline, stream-format=byte-stream ! h264parse ! rtph264pay name=pay0 pt=96 config-interval=1"

Then used this gstreamer pipeline for RTSP -> H264 RTP depay -> H264 decode -> VP8 encode-> RTP pay -> UDP:

gst-launch-1.0 -e rtspsrc location=rtsp://127.0.0.1:8554/test ! queue ! rtph264depay ! video/x-h264, stream-format=byte-stream ! h264parse ! omxh264dec ! nvvidconv ! videoconvert ! vp8enc ! video/x-vp8 ! rtpvp8pay ! udpsink host=127.0.0.1 port=5000

and displaying result in X window with:

gst-launch-1.0 -e udpsrc port=5000 ! application/x-rtp, media=video, encoding-name=VP8 ! queue ! rtpvp8depay ! video/x-vp8 ! nvv4l2decoder ! nvvidconv ! videoconvert ! xvimagesink

If this works for you, you would try this opencv code (tested with opencv-4.3.0) in place of second pipeline:

#include <iostream>
#include <opencv2/core.hpp>
#include <opencv2/videoio.hpp>

int main(void)
{
	const char *gst_cap = "rtspsrc location=rtsp://127.0.0.1:8554/test ! queue ! rtph264depay ! video/x-h264, stream-format=byte-stream"
                              " ! h264parse ! omxh264dec ! nvvidconv ! video/x-raw, format=BGRx"
                              " ! videoconvert ! video/x-raw, format=BGR ! appsink";

	//putenv("GST_DEBUG=*:3");
        cv::VideoCapture cap(gst_cap, cv::CAP_GSTREAMER);
        if( !cap.isOpened() )
        {
            std::cout << "Error: Cv::VideoCapture.open() failed" << std::endl;
            return 1;
        }
	else
	    std::cout << "Cam opened" << std::endl;

    	unsigned int width = cap.get(cv::CAP_PROP_FRAME_WIDTH); 
    	unsigned int height = cap.get(cv::CAP_PROP_FRAME_HEIGHT); 
    	unsigned int pixels = width*height;
        float fps    = cap.get(cv::CAP_PROP_FPS);
        std::cout <<" Frame size : "<<width<<" x "<<height<<", "<<pixels<<" Pixels "<<fps<<" FPS"<<std::endl;

 	const char *gst_out = "appsrc ! video/x-raw, format=BGR"
			      " ! videoconvert ! vp8enc ! video/x-vp8 ! rtpvp8pay ! udpsink host=127.0.0.1 port=5000";

        /* Note that fps read from cap properties may be 0, so here we force to 30 fps according to knowledge of our source */
        cv::VideoWriter udp_out(gst_out, cv::CAP_GSTREAMER, 0, 30, cv::Size(width, height));
        if( !out.isOpened() )
        {
            std::cout << "Error: Cv::VideoWriter.open() failed" << std::endl;
            return 2;
        }
	else
	    std::cout << "Writer opened" << std::endl;

    	cv::Mat frame_in(width, height, CV_8UC3);
        for(;;)
        {
    		if (!cap.read(frame_in)) {
			std::cout<<"Capture read error"<<std::endl;
			break;
		}
		else {
			udp_out.write(frame_in);
			cv::waitKey(1); 
		}	
        }

	cap.release();
        return 0;
}

[EDIT: Also note that video encoding/decoding can be done by dedicated HW or CPU, but standard gstreamer plugins don’t use GPU for this. omx* or nvv4l2* would use dedicated NVENC or NVDEC. vp8enc is purely CPU.]

[EDIT2: Note that it might take about 10s to display, and get only 5-6 fps without boosting clocks on consumer end. Probably there is a better solution using HW VP8 encoding]

First of all sorry for the deleted post above, got a bit of progress done since that post and wanted to edit it completely so thought of reposting, didn’t know the post stayed here, my bad

Thanks to @Honey_Patouceul (thanks for your help!) I’ve managed to make gpu acceleration work for both pipelines, they now look like this:

camera.open("rtspsrc location=rtsp://192.168.0.16:8554/video !  application/x-rtp, media=(string)video, encoding-name=(string)H264, payload=(int)96 ! rtph264depay ! h264parse ! omxh264dec ! videoconvert ! appsink");

pipeline = "appsrc ! queue ! videoconvert ! video/x-raw,width=" + to_string(img.cols) + ",height=" + to_string(img.rows) + ",framerate=" + to_string(num_fps) + "/1 ! omxvp8enc ! rtpvp8pay ! udpsink host=224.1.1.1 port=5000 auto-multicast=true";
int fourcc = cv::VideoWriter::fourcc('V','P','8','0');
writer.open(pipeline, fourcc,   num_fps,cv::Size(img.cols, img.rows),   true);

However, the pipelines work but the program is still really slow:

NvMMLiteOpen : Block : BlockType = 261 
NVMEDIA: Reading vendor.tegra.display-size : status: 6 
NvMMLiteBlockCreate : Block : BlockType = 261 
Allocating new output: 1280x720 (x 11), ThumbnailMode = 0
OPENMAX: HandleNewStreamFormat: 3605: Send OMX_EventPortSettingsChanged: nFrameWidth = 1280, nFrameHeight = 720 
Framerate set to : 25 at NvxVideoEncoderSetParameterNvMMLiteOpen : Block : BlockType = 7 
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 7 
Stabilizer started

 [time :18:2:12]	[fps: 1]	[res: 1280x720]	
 [time :18:2:14]	[fps: 44]	[res: 1280x720]	
 [time :18:2:16]	[fps: 50]	[res: 1280x720]	
 [time :18:2:19]	[fps: 48]	[res: 1280x720]	
 [time :18:2:21]	[fps: 27]	[res: 1280x720]	
 [time :18:2:23]	[fps: 71]	[res: 1280x720]	
 [time :18:2:25]	[fps: 59]	[res: 1280x720]	
 [time :18:2:27]	[fps: 24]	[res: 1280x720]	
 [time :18:2:29]	[fps: 70]	[res: 1280x720]	
 [time :18:2:31]	[fps: 74]	[res: 1280x720]	
 [time :18:2:33]	[fps: 11]	[res: 1280x720]	
 [time :18:2:55]	[fps: 6]	[res: 1280x720]	
 [time :18:2:57]	[fps: 48]	[res: 1280x720]	
 [time :18:2:59]	[fps: 24]	[res: 1280x720]	
 [time :18:3:0]	[fps: 29]	[res: 1280x720]	
 [time :18:3:2]	[fps: 53]	[res: 1280x720]	

I don’t know what may be causing this, even the cpu vp8 encoding seemed to be faster than this, any idea? I tried replacing the videoconverts but I think they are necessary for appsrc and appsink

Nice to see you’ve moved forward. However, I think that for a cv writer, either you specify a file path and a 4CC in order to choose codec, either you use a gstreamer pipeline and 4CC has no meaning.
In your case it seems that omxvp8enc works. Note it is not GPU but uses dedicated NVENC HW engine.

Be aware that omx plugins in general are going deprecated on Jetson and no HW VP8 encoding is available from R32.4, at least on Xavier, so VP8 may not be your best choice for future.

On Xavier and R32.4, I can get 30 fps with the above shared gstreamer pipelines, just setting threads=6 for vp8enc, but this makes important usage of CPUs.
My advice would be to switch to VP9 for which HW encoding will be available on Jetson.
Transcoding pipeline;

gst-launch-1.0 -e rtspsrc location=rtsp://127.0.0.1:8554/test ! queue ! rtph264depay ! video/x-h264, stream-format=byte-stream ! h264parse ! omxh264dec ! nvvidconv ! nvv4l2vp9enc ! video/x-vp9 ! rtpvp9pay ! udpsink host=127.0.0.1 port=5000

Test client pipeline:

gst-launch-1.0 -ev udpsrc port=5000 ! application/x-rtp, media=video, encoding-name=VP9 ! queue ! rtpvp9depay ! video/x-vp9 ! nvv4l2decoder ! nvvidconv ! videoconvert ! fpsdisplaysink video-sink=fakesink text-overlay=false

Sadly I have to stay on VP8 because that stream is going to a client that can only take udp-vp8 streams.

I’ve also made nvv4l2vp8enc work, but same as omx it has poor fps performance in the output stream

pipeline = "appsrc ! queue ! videoconvert ! video/x-raw,width=" + to_string(img.cols) + ",height=" + to_string(img.rows) + ",framerate=" + to_string(num_fps) + "/1 ! nvvidconv ! video/x-raw(memory:NVMM) ! nvv4l2vp8enc ! rtpvp8pay ! udpsink host=224.1.1.1 port=5000 auto-multicast=true";

Maybe my best chance to get good performance is to stick to vp8enc with 6 threads and stay away from hw encoding

Does appending sync=false to your pipeline improve ?

I’d suggest to first prototype a transcoding gstreamer pipeline and later split it into opencv capture and writer in order to find the bottleneck.

It have already done the transcoding pipeline and stays stable around 30, but from the opencv code i get (with or without sync/async=false):

    /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 22, dropped: 0, current: 39,42, average: 39,42
    /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 35, dropped: 0, current: 25,71, average: 32,90
    /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 59, dropped: 0, current: 45,92, average: 37,19
    /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 74, dropped: 0, current: 28,70, average: 35,09
    /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 83, dropped: 0, current: 17,34, average: 31,58
    /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 96, dropped: 0, current: 16,49, average: 28,10
    /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 102, dropped: 0, current: 10,15, average: 25,45
    /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 109, dropped: 0, current: 10,16, average: 23,21
    /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 115, dropped: 0, current: 11,53, average: 22,04
    /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 128, dropped: 0, current: 24,68, average: 22,28
    /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 138, dropped: 0, current: 19,74, average: 22,08
    /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 144, dropped: 0, current: 11,67, average: 21,29
    /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 154, dropped: 0, current: 19,13, average: 21,13
    /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 166, dropped: 0, current: 21,67, average: 21,17
    /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 178, dropped: 0, current: 19,88, average: 21,08
    /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 193, dropped: 0, current: 25,07, average: 21,34
    /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 208, dropped: 0, current: 29,26, average: 21,77
    /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 220, dropped: 0, current: 23,20, average: 21,84
    /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 229, dropped: 0, current: 11,55, average: 21,10
    /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 236, dropped: 0, current: 13,42, average: 20,75

I think the bottleneck might be around appsrc and appsink using cpu buffers like someone mentioned above

I can run opencv code at 60 fps with only 6 CPUs on in order to mimic TX2 and without boosting clocks.

RTSP source (1280x720@120fps) with test-launch:

./test-launch "nvarguscamerasrc do-timestamp=true ! video/x-raw(memory:NVMM), width=1280, height=720, framerate=120/1, format=NV12 ! nvvidconv ! omxh264enc ! video/x-h264, profile=baseline, stream-format=byte-stream ! h264parse ! rtph264pay name=pay0 pt=96 config-interval=1"

Opencv code (opencv-4.3.0):

#include <iostream>
#include <opencv2/core.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/videoio.hpp>

int main(void)
{
 	const char *gst_cap = "rtspsrc location=rtsp://127.0.0.1:8554/test latency=200 ! application/x-rtp, media=video, encoding-name=H264, clock-rate=90000, payload=96"
                              " ! queue max-size-buffers=0 max-size-bytes=0 max-size-time=10 ! queue max-size-time=1 min-threshold-time=5" 
                              " ! rtph264depay ! video/x-h264, stream-format=byte-stream, framerate=120/1"
                              " ! h264parse ! video/x-h264, stream-format=byte-stream, framerate=120/1"
                              " ! omxh264dec ! video/x-raw(memory:NVMM), format=NV12, framerate=120/1"
                              " ! nvvidconv ! video/x-raw, format=BGRx"
                              " ! videoconvert ! video/x-raw, format=BGR ! queue ! appsink ";

        cv::VideoCapture cap(gst_cap, cv::CAP_GSTREAMER);
        if( !cap.isOpened() )
        {
            std::cout << "Error: Cv::VideoCapture.open() failed" << std::endl;
            return 1;
        }
	else
	    std::cout << "Cam opened" << std::endl;

    	unsigned int width = cap.get(cv::CAP_PROP_FRAME_WIDTH); 
    	unsigned int height = cap.get(cv::CAP_PROP_FRAME_HEIGHT); 
    	unsigned int pixels = width*height;
        float fps    = cap.get(cv::CAP_PROP_FPS);
        std::cout <<" Frame size : "<<width<<" x "<<height<<", "<<pixels<<" Pixels "<<fps<<" FPS"<<std::endl;



 	const char *gst_out = "appsrc ! video/x-raw, format=BGR, framerate=120/1"
			      " ! videoconvert ! vp8enc threads=6 deadline=1 ! video/x-vp8"
                              " ! rtpvp8pay pt=100 ! application/x-rtp, media=video, encoding-name=VP8, clock-rate=90000"
                              " ! queue ! udpsink host=224.1.1.1 port=5000 auto-multicast=true ";

        cv::VideoWriter udp_out(gst_out, cv::CAP_GSTREAMER, 0, fps, cv::Size(width, height));
        if( !udp_out.isOpened() )
        {
            std::cout << "Error: Cv::VideoWriter.open() failed" << std::endl;
            return 2;
        }
	else
	    std::cout << "Writer opened" << std::endl;


    	cv::Mat frame_in(width, height, CV_8UC3);
        for(;;)
        {
    		if (!cap.read(frame_in)) {
			std::cout<<"Capture read error"<<std::endl;
			break;
		}
		else {
 			udp_out.write(frame_in);
			cv::waitKey(1); 
		}	
        }

	cap.release();
        return 0;
}

and test client:

gst-launch-1.0 -ev udpsrc port=5000 multicast-group=224.1.1.1 auto-multicast=true ! application/x-rtp, media=video, encoding-name=VP8, payload=100 ! queue ! rtpvp8depay ! video/x-vp8 ! nvv4l2decoder ! fpsdisplaysink video-sink=fakesink text-overlay=false

Does this work for you ?

Note it is just an example showing opencv can run at 60fps (although with a 120fps input). Changing to 30 fps as in your original code gives a solid 30fps for in my case (but with 6 CPUs half-loaded).

I don’t understand why it works for you but it’s so inconsistent for me, you code works and I activated the clocks but I’m back at constant fps drops every few seconds:

/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1290, dropped: 0, current: 25,07, average: 25,12
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1305, dropped: 0, current: 26,41, average: 25,13
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1313, dropped: 0, current: 13,63, average: 25,01
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1321, dropped: 0, current: 11,33, average: 24,82
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1332, dropped: 0, current: 21,03, average: 24,79
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1348, dropped: 0, current: 30,98, average: 24,85
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1372, dropped: 0, current: 47,05, average: 25,05
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1389, dropped: 0, current: 30,22, average: 25,11
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1403, dropped: 0, current: 27,41, average: 25,13
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1412, dropped: 0, current: 17,57, average: 25,06
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1428, dropped: 0, current: 31,83, average: 25,12
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1440, dropped: 0, current: 22,91, average: 25,10
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1454, dropped: 0, current: 27,68, average: 25,12
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1467, dropped: 0, current: 25,39, average: 25,12
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1476, dropped: 0, current: 12,49, average: 24,97
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1479, dropped: 0, current: 5,60, average: 24,79
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1490, dropped: 0, current: 21,55, average: 24,77
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1514, dropped: 0, current: 43,81, average: 24,94
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1528, dropped: 0, current: 25,51, average: 24,94
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1544, dropped: 0, current: 29,73, average: 24,99
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1557, dropped: 0, current: 14,09, average: 24,83
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1578, dropped: 0, current: 41,03, average: 24,96
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1598, dropped: 0, current: 38,54, average: 25,07
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1613, dropped: 0, current: 25,61, average: 25,07
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1626, dropped: 0, current: 25,71, average: 25,08
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1636, dropped: 0, current: 15,77, average: 24,99
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1640, dropped: 0, current: 6,02, average: 24,80
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1653, dropped: 0, current: 24,31, average: 24,79
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1670, dropped: 0, current: 33,41, average: 24,86
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1691, dropped: 0, current: 41,22, average: 24,98
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1712, dropped: 0, current: 33,32, average: 25,06
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1725, dropped: 0, current: 25,08, average: 25,06
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1739, dropped: 0, current: 27,10, average: 25,07
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1753, dropped: 0, current: 27,37, average: 25,09
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1765, dropped: 0, current: 23,92, average: 25,08
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1778, dropped: 0, current: 25,64, average: 25,09
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1791, dropped: 0, current: 25,20, average: 25,09
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1794, dropped: 0, current: 3,41, average: 24,82
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1798, dropped: 0, current: 7,40, average: 24,69
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1814, dropped: 0, current: 31,02, average: 24,74
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 1842, dropped: 0, current: 54,30, average: 24,94

I’ve also tried with a transcoding pipeline, getting a pretty simmilar result:

gst-launch-1.0 -v rtspsrc location=rtsp://192.168.0.16:8554/video latency=200 ! "application/x-rtp, media=video, encoding-name=H264, clock-rate=90000, payload=96" ! queue max-size-buffers=0 max-size-bytes=0 max-size-time=10 ! queue max-size-time=1 min-threshold-time=5 ! rtph264depay ! "video/x-h264, stream-format=byte-stream, framerate=30/1" ! h264parse ! "video/x-h264, stream-format=byte-stream, framerate=30/1" ! omxh264dec ! "video/x-raw(memory:NVMM), format=NV12, framerate=30/1" ! nvvidconv ! "video/x-raw, format=BGRx" ! videoconvert ! "video/x-raw, format=BGR, framerate=30/1" ! videoconvert ! vp8enc threads=6 deadline=1 ! video/x-vp8 ! rtpvp8pay ! udpsink host=224.1.1.1 port=5000 auto-multicast=true

That one above behaves similarly inside and outside opencv but the one you shared the other day is still working at 25 stable fps (only transcoding, in opencv it always drops):

gst-launch-1.0 -e rtspsrc location=rtsp://192.168.0.16:8554/video ! queue ! rtph264depay ! video/x-h264, stream-format=byte-stream ! h264parse ! omxh264dec ! nvvidconv ! nvv4l2vp8enc ! video/x-vp8 ! rtpvp8pay ! udpsink host=127.0.0.1 port=5000

Main differences between your setup and mine are:

  • Xavier vs TX2
  • R32.4.2 vs R32.3.1
  • opencv-4.3.0 vs opencv-4.1.1
  • Other (connected devices, network activity, any customization that may have services or application running.
  • Did you try the exact code above before customizing (apart from IP addresses) ? Note I receive the RTSP stream from localhost using lo interface, not eth, it might make a difference.

I’ve ruled out the opencv version, I get same results with the NVIDIA shipped opencv-4.1.1.
I do see an important usage of EMC on Xavier, you would check with tegrastats.

I’d suggest trying to disconnect any device (I have onboard camera, mouse and keyboard only), disconnect from network and try after a fresh reboot.

Okay, got it!

I was simulating the camera RTSP stream (in the final app it’s going to be coming from a camera encoder that I don’t have at home) by streaming using VLC in another local PC, I changed it to a nvarguscamerasrc stream with test-launch and I’m finally getting 30-40 fps!

The problem with this is that probably the actual camera stream will be closer to a VLC network stream than to nvarguscamerasrc, is there any way to increase local network streaming performance?

I get the same framerate from my host being rtsp server with test-launch or VLC. Only a few computers are connected to the LAN in my case, though.
I cannot help further, but let us know how it goes when you can connect to your IP camera.