A100 nvdec usage is insufficient

Hello, i’m trying to use Gstreamer to decode an H.265 video file via NVDEC but i find that nvdec usage is insufficient.
We try to run this:
gst-launch-1.0 filesrc location=/opt/nvidia/deepstream/deepstream-7.1/test.mp4 ! qtdemux ! queue ! h265parse ! queue ! nvv4l2decoder gpu-id=3 ! fakesink
The nvdec utilization rate of A100 measured by nvidia-smi dmon is only close to 20%. It seems that only one of the nvdec is used. How can we call all of them?


In addition, we try to run two of the above gst commands at the same time, which can increase the dec utilization to about 34, but when running more gst commands at the same time (such as 5), the dec utilization is also just 30, which can not improve the nvdec utilization.

Here is the detailed configuration:

GStreamer 1.20.3
NVIDIA A100 80GB
Driver Version: 565.57.01 CUDA Version: 12.7

The pipeline could be configured to run at full speed.

gst-launch-1.0 filesrc location=/opt/nvidia/deepstream/deepstream-7.1/test.mp4 ! qtdemux ! queue ! h265parse ! queue ! nvv4l2decoder gpu-id=3 ! fakesink sync=0 async=0

How many commands have you tried to run at the same time?

We tried to run up to five decoding pipelines at the same time, but the maximum utilization did not exceed 40%

We use multiple terminals to run the gst pipeline at the same time, but the dec utilization is no more than 40% at the most, what is the best way to increase the utilization to close to 100%?

Have you tried the command to run the pipeline in full speed?

gst-launch-1.0 filesrc location=/opt/nvidia/deepstream/deepstream-7.1/test.mp4 ! qtdemux ! queue ! h265parse ! queue ! nvv4l2decoder gpu-id=3 ! fakesink sync=0 async=0

Yes,we run the gst pipeline with the following sh script, but the dec is still under-utilized by 40%

#!/bin/bash

PIPELINE1=“gst-launch-1.0 filesrc location=/opt/nvidia/deepstream/deepstream-7.1/testkv_4444.mp4 ! qtdemux ! queue ! h265parse ! queue ! nvv4l2decoder gpu-id=0 ! fakesink sync=0 async=0”
PIPELINE2=“gst-launch-1.0 filesrc location=/opt/nvidia/deepstream/deepstream-7.1/testkv_4444.mp4 ! qtdemux ! queue ! h265parse ! queue ! nvv4l2decoder gpu-id=0 ! fakesink sync=0 async=0”
PIPELINE3=“gst-launch-1.0 filesrc location=/opt/nvidia/deepstream/deepstream-7.1/testkv_4444.mp4 ! qtdemux ! queue ! h265parse ! queue ! nvv4l2decoder gpu-id=0 ! fakesink sync=0 async=0”
PIPELINE4=“gst-launch-1.0 filesrc location=/opt/nvidia/deepstream/deepstream-7.1/testkv_4444.mp4 ! qtdemux ! queue ! h265parse ! queue ! nvv4l2decoder gpu-id=0 ! fakesink sync=0 async=0”
PIPELINE5=“gst-launch-1.0 filesrc location=/opt/nvidia/deepstream/deepstream-7.1/testkv_4444.mp4 ! qtdemux ! queue ! h265parse ! queue ! nvv4l2decoder gpu-id=0 ! fakesink sync=0 async=0”

echo “Starting Pipeline 1…”
$PIPELINE1 &
PID1=$!

echo “Starting Pipeline 2…”
$PIPELINE2 &
PID2=$!

echo “Starting Pipeline 3…”
$PIPELINE3 &
PID3=$!

echo “Starting Pipeline 4…”
$PIPELINE4 &
PID4=$!

echo “Starting Pipeline 5…”
$PIPELINE5 &
PID5=$!

wait $PID1
echo “Pipeline 1 finished.”

wait $PID2
echo “Pipeline 2 finished.”

wait $PID3
echo “Pipeline 3 finished.”

wait $PID4
echo “Pipeline 4 finished.”

wait $PID5
echo “Pipeline 5 finished.”

echo “All pipelines have completed.”

What is the testkv_4444.mp4’s video format, resolution and FPS?

video format is hevc,resolution is 176x256, fps is 32

It is too small for A100. The IO speed and the mp4 demuxing with CPU may be slower than the decoding.

Can you try some videos with larger resolution?

A100 can decode 168 x 1080p@30fps HEVC videos according to Video Codec SDK | NVIDIA Developer

Thanks for your reply, we switched the video to 1408*1024 resolution and dec utilization was increased to 70%. Further, can we control which nvdec to use or how many nvdecs to use (since the official docs say that the a100 has 5 nvdecs) in order to make the small resolution video also run full eat utilization?

A100 have 5 decoder cores, they are not separated, the NVDEC cores must be used together.

Is there any way to make a low resolution video run full nvdec (like parallel or something?)?

I’ve post the A100 decoding capability.

If your CPU is good enough, you may run more decoding pipelines(much more than 168) for your 176x256 streams to consume the NVDEC resource while this may consume more CPU resources.

Thanks for the reply, do you mean I can concurrently decode multiple small resolution videos on the same gpu so I can run full dec utilization?

Yes if your CPU is good enough and the memory is large enough.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.