Building a GStreamer Pipeline

Hey guys, I’ve been really struggling with these questions and finally I’m going to ask them. As a newbie with basic Python, I’m really having a hard time understanding certain concepts.

Is there any where else, other than NVIDIA Jetson Linux Developer Guide, or the Jetson Linux Multimedia API Reference where I can learn more about building a GStreamer pipeline? I understand the pipeline concept, from reading the official GStreamer Foundations, as well as reading about Elements, Bins, Bus, Pads and capabilities, etc and all of this makes sense.

My questions are:

Where can I find more information to understand the parameters that you pass nvarguscamerasrc, as well as the other plugins in the pipeline, especially sinks (video or display sinks).

For example:

  1. If I select a sensor-mode = 3, which means a width = 1280 and a height = 720 pixels, do I still then need to deal with specifying a width and height in the video/x raw(memory:NVMM) part of the configuration? If so how come? What would be the application of using sensor-mode then? I’ve seen it used on some examples online, but then they start specifying a height, width, frame rate anyway. This same type of questions also applies to frame rate as well. I thought by picking a sensor mode, you automatically define a frame rate for that mode as well.

  2. Are there any other options for NVMM, other than NVMM=NV12? Can I please get a reference on what this means? Or a link to a reference where I can read up more? Or is this more of a “It just is” type of thing? I’m asking because in the NVIDIA Input and Output Formats section of the Developers Guide, it talks about different formats for nvvidconv. What’s not clear is that when we’re building the pipeline, do we always expect NV12 as an input or can you select another input format? This question is an attempt to minimize transformations and conversions that I see in almost every example of OpenCV online. If I can immediatley get a stream that is JPEG, and then stream that to an iPywidget, wouldn’t that be more effeciant than importing an image to BGR, converting to JPEG and then displaying?

  3. Where do I find information on video/x-raw? It looks like I have to specify it a height, width, but am I stuck using BGR? Or can I create a stream that outputs right to JPEG? (I get that this question may be answered already above, but in case I have #2 completely wrong, I’m asking another question).

  4. Where do I find more information on video sinks? My goal is to display an image/video stream using an iPywidget on a remote machine running Jupyter Lab (Not unlike the DLI courses, except using ethernet instead of USB). I’m already able to ssh into the nano and run juypter lab, I have even successfully diplayed an image using an iPywidget running jet-cam. However, the more I learn about GStreamer pipelines, the more questions I have on the examples I see online. They don’t seem to make sense in the context of building the right pipeline.

I’ve exhausted reading both the developer guide and the API manual but don’t seem to find the answers I’m looking for.

I have also ran the following commands:

gst-inspect-1.0 nvarguscamerasrc

As well as:

nvgstcapture-1.0 --help

But these still don’t give me a clear picture of what I’m trying to do.

Any help or advise you guys can give would be greatly appreciated, I’ve exhausted looking through the literature so need to ask.

Thanks guys.

Cheers!

Hi,
We have nvarguscamersrc open source and it is in
https://developer.nvidia.com/embedded/linux-tegra
L4T Driver Package (BSP) Sources

Please follow the guidance to compile/run it first and then you can read the code to have more understanding.

Probably the thing you would have to know is that most of the doc for a plugin is embedded into it.
A gstreamer command named gst-inspect allow to list all available plugins:

gst-inspect-1.0 --version   # Would display gstreamer version
gst-inspect-1.0             # Would list all available plugins and more

You can use it to inspect a plugin:

gst-inspect-1.0 nvarguscamerasrc

It would display the library providing it, then its SRC and SINK capabilities, that are different types of multimedia data, formats,… So you would only be able to use a plugin giving one of its listed available types/formats in SINK as input. These CAPS should also match the previous plugin SRC (output) CAPS.
So basically a pipeline goes like this:

SrcPlugin -> CAPS1 -> ProcessPlugin1 -> CAPS2 -> ProcessPlugin2 -> CAPS3 -> SinkPlugin

Src Plugin has no SINK capabilities, for example a camera. SinkPlugin has no SRC capabilities, so no output (ex a display).
CAPS1 would the belong to the intersection of SrcPlugin SRC caps and ProcessPlugin1 SINK caps, …etc.

Since we are lazy, and for some conciseness, instead of writing the caps between each pair of plugins, you may let gstreamer find for you one that matches both ends. Using gst-launch with -v (verbose) option, it will display the caps used for each plugin input and output.

For your case, I assume you’re using a RPi V2 cam with IMX219 sensor.

If you want to use a native mode of your sensor, you don’t need to specify any caps if next plugin SINK caps are ok.
You might however specify caps for resizing input into a different resolution from source or dividing framerate.

Usually for video in raw format the caps would specify type video/x-raw. This refers to standard memory and can be be used by a CPU plugin. For HW accelerators or GPU, contiguous memory is preferred. So with Jetson you have the video/x-raw(memory:NVMM) caps for using dedicated HW such as nvarguscamerasrc.
Note that NVMM=NV12 is not correct. You would have to use caps: video/x-raw(memory:NVMM),format=NV12
Plugin nvvidconv allows to copy between both types. It can also convert format and resize, but expects at least one among its input or output to be in NVMM memory.

I have no knowledge of iPywidget. Is there a way it can be connected to gstreamer ? If it can be linked to gstreamer, maybe it would use appsink for iPywidget application and you would try:

gst-launch-1.0 nvarguscamerasrc ! nvvidconv ! jpegenc ! appsink

# Or specifying some caps
gst-launch-1.0 nvarguscamerasrc ! 'video/x-raw(memory:NVMM)' ! nvvidconv ! video/x-raw ! jpegenc ! image/jpeg ! appsink

Note that in a shell command, in order to prevent it from interpreting the parenthesis, the caps have to be quoted.

If it works, you would use as pipeline from your application :

pipeline_string="nvarguscamerasrc ! video/x-raw(memory:NVMM) ! nvvidconv ! video/x-raw ! jpegenc ! image/jpeg ! appsink"
gst-inspect-1.0 | grep sink | grep video

would give most of them. They use different backends.

nvhdmioverlaysink or other EGL sinks would expect a local monitor and for EGL a X session running for the user.
xvimagesink may be used remotely with ssh -X or ssh -Y but you wouldn’t set DISPLAY for remote display.
cacasink would work on any terminal.

1 Like

I appreciate the time you spent on replying to my questions. Thank you for pointing me in some better directions!

I’ll dig a little deeper according to the suggestions and recommendations that you make above.

For all you newbies out there, if you really want a good-basic explanatino of GStreamer, and building GStreamer pipelines, check out this youtube from Paul McWhorter. This guy does a really good job of breaking down what a GStreamer pipeline is and some tests.

In terms of ipywidgets, if you look at the Jet-Cam github you’ll see a notebooks directory that uses ipywidgets and OpenCV.

My only knock on the JET-CAM library is the GStreamer pipeline it constructs. Here is the string:

nvarguscamerasrc sensor-id=%d ! video/x-raw(memory:NVMM), width=%d, height=%d, format=(string)NV12, framerate=(fraction)%d/1 ! nvvidconv ! video/x-raw, width=(int)%d, height=(int)%d, format=(string)BGRx ! videoconvert ! appsink' 

To me it seems like it goes through a whole lot of conversions, to get something to BGR, which them needs to be converted to JPEG.

I understand that with OpenCV, allot of it’s libraries need specific formats like BGR or Grey Scale. My thoughts are do the conversion in front of the library you plan on using, minimizing the amount of conversions needed throughout the pipeline.

Because the Jet-Cam library was able to work with OpenCV in displaying the camera stream in an ipywidget, I’m going to start from there first, but reduce the pipeline as per your recommendations.

Thanks for pointing me the right way!

Cheers.

You should be able to read jpeg frames from opencv with the pipeline in my previous post.
You wouldn’t be able to display these in opencv without conversion, but it should be ok to give these directly then to your ipwidget without calling bgr8_to_jpeg.
My pipeline uses CPU JPEG encoding, there may be some better HW accelerated paths. First try to get this one working.

1 Like

So far so good, thank you.

I got the following pipeline working in OpenCV

gst_pipeline = "nvarguscamerasrc sensor-id=0 ! video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080, format=(string)NV12, framerate=(fraction)30/1 ! nvvidconv ! video/x-raw ! jpegenc ! image/jpeg ! appsink"

Using as follows:

import cv2

import ipywidgets
from IPython.display import display
from IPython.display import Image

gst_pipeline = "nvarguscamerasrc sensor-id=0 ! video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080, format=(string)NV12, framerate=(fraction)30/1 ! nvvidconv ! video/x-raw ! jpegenc ! image/jpeg ! appsink"

camera = cv2.VideoCapture(gst_pipeline, cv2.CAP_GSTREAMER)

re, image = camera.read()

print(re)

print(image)

The outputs of the print statements are:

re = True
image = [[255 216 255 … 127 255 217]]

So here is the issue:

When I try and run this line:

image_widget = ipywidgets.Image(value=image, format='jpeg', width=300, height=400)

I get the following error:

TraitError: The 'value' trait of an Image instance must be a bytes object, but a value of array([[255, 216, 255, ..., 127, 255, 217]], dtype=uint8) <class 'numpy.ndarray'> was specified.

If I look at the Jet-Cam example here


Image(value=b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\xff\xdb\x00C\x00\x02\x01\x0…

I think what it’s trying to tell me is that it’s looking for binary values instead of an array?

My suspicions on this are based on this clue, found in the Jet-Cam Github, for utils.py:

import cv2

def bgr8_to_jpeg(value, quality=75):
    return bytes(cv2.imencode('.jpg', value)[1])

I’m going to go study what this peice of code does, but I’d be curious if you’ve seen this before? And if so, could the “imencode” be done via the pipeline instead of asking OpenCV to do this conversion?

Thanks!