I cannot teach gstreamer in one post, but the following may help you:
With gstreamer, all is pipeline. A pipeline is built from one source element to a sink element, and may go through other elements for processing. For example a simple pipeline using a test video source and an X window for displaying would be:
gst-launch-1.0 videotestsrc ! xvimagesink
gst-launch-1.0 is a binary that can be used to easily prototype a gstreamer pipeline.
You can have details using gst-inspect-1.0:
# Get list of available plugins with provided elements and types:
gst-inspect-1.0
# Get all elements and typefinders provided by a plugin, its library...
gst-inspect-1.0 <any_plugin_listed_from_above>
# Get details about an element such as supported input/ouput formats, properties, ...
gst-inspect-1.0 <any_element_listed_from_above>
In a pipeline, there are caps between elements that define the format of data (called buffers) exchanged between elements. There must be at least one format available as SRC (output) from previous element and as SINK (input) of next element, otherwise the pipeline would failed to link elements at init time. So the canonical form a pipeline would be:
src_element ! caps ! element ! caps ! ... ! element ! caps ! sink_element
The caps are the type information such as video/x-raw… If you don’t specify caps between two elements, gstreamer will try to negociate caps between the 2 elements and find caps available from SRC and SINK pads from each element if any. Using gst-launch-1.0 with -v flag, you’ll be able to see the used caps.
test-launch argument is not a full pipeline. test-launch will add the sink (usually it uses udpsink element).
So for details about my previous post:
I get camera feed with nvarguscamerasrc that can control the camera, debayer (and auto-tune) with ISP, that element provide raw video in NV12 format and outputs into NVMM memory that is contiguous memory convenient for DMA access from GPU, encoders/decoders and more. Then I specified caps for choosing a video mode (here 1080p30).
The following element, nvvidconv, is useful for copying to/from NVMM memory and system memory. Not sure if this is still true, but in previous L4T releases, at least one of input or output had to be in NVMM memory (ie not to be used as video/x-raw ! nvvidconv ! video/x-raw).
Further than copying between memory spaces, it also can convert video formats, rotate/flip or crop using VIC HW. It is in fact not required here as next element nvv4l2h264enc expects NV12 format in NVMM memory as already provided by nvarguscamerasrc , though nvvidconv having nothing to do it should be a very light overhead. Feel free to remove it and try.
Then nvv4l2h264enc drives the HW encoder that will produce h264 video.
nvv4l2h264enc may output a parsed H264 stream in byte-stream format, so h264parse may not be mandatory for this case.
Finally rtph264pay will manage RTP protocol for H264 format and packetize buffers to be sent to sink such as udpsink.