Multiple Camera Input stitching / concatenating for inference

Hi, I was wondering if Deepstream offers stitching / concatenating / merging multiple images before inference.

I am planning to run an inference on Jetson Nano for two video sources, then im afraid that the FPS would be significantly low if I wanna use PeopleNet as the resolution would be 960*544.

And the plugin StreamMux(muxer) seems to be offering batch inference only, which seems very different from what I am expecting.

So, I wonder:

  1. If it is able to stitch multiple videos into one frame of image (960&544 for example), and run inference using Deepstream?
  2. Is it only possible using TensorRT python script with modification? Example link (

++) 3. According to this Guideline on page 50, it says “Optical flow vectors are
useful in various use cases such as object detection and tracking, video frame rate upconversion, depth estimation, stitching, and so on.”
But, Im not sure how to employ this GST-NVOF to stitch images before run an inference.

I do appreciate for your suuport in advance.

A possible solution is to use nvcompositor plugin. Please refer to the pipeline:
Textoverlay plugin in a video after nvcompositor

For linking with DeepStream SDK, the pipeline should be like:

... ! nvcompositor ! nvvidconv ! video/x-raw ! nvvideoconvert ! nvstreammux ! nvinfer ! ...