Accelerating video decoding in Python using NVDEC



I am recently working on a new Python CV project, which requires connecting to about 30-50 cameras, on a dual-A4000 (Edge) EGX Server hardware configuration. We chose to use DeepStream SDK to do so, even though we don’t infer using DeepStream.

The original problem

Using OpenCV, connecting to these cameras and querying frames causes high CPU usage (Connecting to cameras is done through nimble RTSP, currently using H264 codec - hopefully we’ll be using H265 in the future, each camera is FHD, 12-25FPS).
To solve this problem, we’ve decided to use NVDEC to accelerate video decoding.
First of all, we’ve tried using GStreamer with nvdec plugin on OpenCV, which didn’t work at all - so we transitioned to DeepStream on Python.

DeepStream solution

Using DeepStream on Python we had great results (on a SINGLE GPU, we got 40 streams on 12FPS, with 20% CPU on a single core) - but we did had to strip down some layers of the deepstream-imagedata-multistream example (those which has things to do with inference).
After stripping down inference-related elements (nvosd, nvinfer, and so on), The resulted pipeline is:


And again, It works very well.
We want to have the frames from each camera ASAP, in the lowest latency

I’ve been running DeepStream using the official image, on a supported Ubuntu20.04 with update drivers - And faced almost no issues!


  1. Is this pipeline built well (from a Performance & Resilience POV)? Should we change anything?
  2. Is it possible to remove nvstreammux, since we don’t need batching at all, we just need to pull frames? If so, will it work the same? Will it be less efficient in the decoding somehow?
  3. Should we just build a pipeline for each camera, thanks to not needing batching? Will it use More GPU Memory, Despite being in the same process? Does it pay off, to avoid handling Camera failures?
  4. Is there a better option for our use case instead of DeepStream, To accelerate Video Decoding from RTSP?


It depends on your purpose. There are some tips for making the DeepStream pipeline faster. Troubleshooting — DeepStream 6.1.1 Release documentation

If there is no “nvstreammux”, most DeepStream plugins are not useful for you because they are all working on batch data. Only “nvvideoconvert”, “nvv4l2videodec”, “nvv4l2h264enc”, “nvv4l2h265en” are useful.

It depends on what you done in the pipelines.

As to the video decoding acceleration only, you can also use VideoCodec SDK to accelerate. NVIDIA VIDEO CODEC SDK | NVIDIA Developer

Thanks, Fiona.

In this matter, I will keep the nvstreammux to make our application future-proof, because we’re in the process of inspecting optical flow algorithms (and we might use nvof), and we might use DeepStream for YOLOv5 inference in the future (Using TensorRT).

We’re currently looking into the best options for querying camera frames using hardware - with less complexity, So using the SDK for development is not relevant for us. We have tried FFMpeg, but we had some issues with it, and OpenCV (of course) doesn’t help too much in this issue.

As for the nvstreammux element - does it work with multiple frame sizes in the same batch? For example, if I have some cameras in FHD resolution, and some in 720p, can I connect them all to the mux without concerns? If not, can I scale the frame sizes before that?
Is frame scaling possible down the pipeline anyway (for faster nvof or TRT nvinfer)?

Thanks again!

There is no update from you for a period, assuming this is not an issue anymore.
Hence we are closing this topic. If need further support, please open a new one.

nvstreammux will scale all input videos into the resolution you defined by “width” and “height” properties. You don’t need to do anything except to assign the “width” and “height” properties of nvstreammux. Gst-nvstreammux — DeepStream 6.1.1 Release documentation
Please read the document carefully.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.