16 channel video decode speed too slow

use deepstream5.0 in nx develop board, when run 16 channel 1080p video decode at the same time run 16 channel deep learning algorithm. video decode speed is too slow and video time delay is too serious. but when close algorithm, video decode speed is fast. I use gstreamer tee plug in process vdec video and down framerate for algorithm, this is parallel. how can i optimize video decode speed. video decode use nvv4l2decoder plug in, nvv4l2decoder default use /dev/nvhost-nvdec, but how nvhost-nvdec1 can be use?

Hi,
The hardware decoding engines are enabled automatically. You can execute sudo tegrastats to check if NVDEC,NVDEC1 are enabled in running the gstreamer command. For maximum performance, please set this property:

  enable-max-performance: Set to enable max performance
                        flags: readable, writable
                        Boolean. Default: false

enable-max-performance had been set true previously. my decode pipeline is appsrc->queue->h264parse/jpegparse->nvv4l2decoder. i try increase appsrs blocksize, enable enable-full-frame and increase num-extra-surfaces of nvv4l2decoder , this isn’t take effect. I fount queue buffer num is big when all applicatin run, then video delay is serious.

Hi,
We run the following commands and can see all decoding threads reaching 30fps:

$ export FPATH=/opt/nvidia/deepstream/deepstream-5.1/samples/streams/sample_1080p_h264.mp4
$ gst-launch-1.0 filesrc location=$FPATH ! qtdemux ! h264parse ! nvv4l2decoder enable-max-performance=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 -v \
filesrc location=$FPATH ! qtdemux ! h264parse ! nvv4l2decoder enable-max-performance=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 -v \
filesrc location=$FPATH ! qtdemux ! h264parse ! nvv4l2decoder enable-max-performance=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 -v \
filesrc location=$FPATH ! qtdemux ! h264parse ! nvv4l2decoder enable-max-performance=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 -v \
filesrc location=$FPATH ! qtdemux ! h264parse ! nvv4l2decoder enable-max-performance=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 -v \
filesrc location=$FPATH ! qtdemux ! h264parse ! nvv4l2decoder enable-max-performance=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 -v \
filesrc location=$FPATH ! qtdemux ! h264parse ! nvv4l2decoder enable-max-performance=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 -v \
filesrc location=$FPATH ! qtdemux ! h264parse ! nvv4l2decoder enable-max-performance=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 -v \
filesrc location=$FPATH ! qtdemux ! h264parse ! nvv4l2decoder enable-max-performance=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 -v \
filesrc location=$FPATH ! qtdemux ! h264parse ! nvv4l2decoder enable-max-performance=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 -v \
filesrc location=$FPATH ! qtdemux ! h264parse ! nvv4l2decoder enable-max-performance=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 -v \
filesrc location=$FPATH ! qtdemux ! h264parse ! nvv4l2decoder enable-max-performance=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 -v

The bottleneck should not be in hardware decoder. Please run sudo tegrastats to show the system loading and check which hardware component may cap the performance.

please run 16 channel deep learning algorithm at the time when run 16 decoding. 16chn alg + 16chn vdec , then decoding is too slow. they affect echo other.

Hi,
It looks like the model is too heavy and affects multi-threading. Decoding should be good. If you use nvinfer plugin, we suggest set interval property. We have seen performance issue in running 8 input sources on Jetson Nano with resnet10 model, and adjust the property in config file. You may refer to

/opt/nvidia/deepstream/deepstream-5.1/samples/configs/deepstream-app/source8_1080p_dec_infer-resnet_tracker_tiled_display_fp16_nano.txt