What is the workflow for using Gstreamer with NVDEC for decoding?

Hello, I am trying to use Gstreamer to decode an H.265 video file via NVDEC and then transfer the decoded video frames to CUDA memory.
We are using an RTX 4090 GPU, and the system is running Ubuntu 22.04.
My python code for gst pipelines is as follows:

pipeline = Gst.parse_launch(f"""
    filesrc location={video_file} !
    qtdemux !
    queue !
    h265parse !
    queue !
    nvh265dec !
    queue !
    nvvideoconvert !
    queue !
    video/x-raw(memory:NVMM)!
    fakesink name=fakesink
""")

I have two issues that need to be resolved.

  1. What is the specific workflow of the pipeline setup mentioned above, and on which buffers or memory are these pipelines executed?
  2. Does video/x-raw(memory:NVMM) exist on the dGPU (4090)? Is it possible to pass the decoded video frames directly to CUDA memory via pointers to avoid the additional transfer overhead of swapping to CPU memory?
1 Like

For your pipeline, we suggest you to use DeepStream video decoder plugin gst-nvv4l2decoder instead of nvh265dec. The gst-nvvideoconvert is a DeepStream plugin but nvh265dec is not a DeepStream plugin, we do not guarantee the compatibility.

If you use the DeepStream video decoder plugin nvv4l2decoder, the output buffer of the decoder is GPU buffer, and nvvideoconvert works on GPU buffer directly.

video/x-raw(memory:NVMM) is DeepStream specified hardware buffer type. When you use DeepStream in any DeepStream supported platform(dGPU, Jetson, IGX,…), the special hardware buffer will be used among DeepStream plugins.
Within DeepStream, the plugins handle the hardware buffers directly, no GPU to CPU copy exist. The GPU to CPU memory copy only happens when you use non-DeepStream compatible plugins to handle DeepStream output data.

Thank you very much, we will try using gst-nvv4l2decoder.

We have tried replacing the plugin with nvv4l2decoder, but the decoding speed did not improve; instead, it increased from 0.037s to 0.16s. Could you please explain the reason for this? Here is the modified code:
pipeline = Gst.parse_launch(f"“”
filesrc location={video_file} !
qtdemux !
queue !
h265parse !
queue !
nvv4l2decoder !
queue !
nvvideoconvert !
queue !
video/x-raw(memory:NVMM) !
queue !
fakesink name=fakesink
“”")
Here is the detailed system configuration.

What is your input? A local video file, network stream, camera,…?

Please use the pipeline

filesrc location={video_file} !
qtdemux !
queue !
h265parse !
queue !
nvv4l2decoder !
queue !
nvvideoconvert !
queue !
video/x-raw(memory:NVMM) !
queue !
fakesink name=fakesink sync=0 async=0

You can check the GPU performance with the command “nvidia-smi dmon”.

a mp4 local file

How did you measure the time?

by this code:

bus = pipeline.get_bus()
bus.add_signal_watch()
bus.connect(“message::eos”, on_eos, main_loop)

pipeline.set_state(Gst.State.PLAYING)

start

start_time = time.time()

try:
main_loop.run()
except KeyboardInterrupt:
pass

#end
end_time = time.time()

We used nvidia-smi dmon to monitor the NVDEC usage, as shown in the figure:

a9be2d753d3c9b49f2366a2fb52070e

If you concern about the end-to-end latency, it is recommend to use Tracing.

The nvh265dec is not compatible to DeepStream. It is no meaning to compare nvh265dec with nvv4l2decoder.

The hardware decoder performance data can be found in Video Codec SDK | NVIDIA Developer.

Another method is to calculate the FPS when running the pipeline.

gst-launch-1.0 --gst-debug=fpsdisplaysink:7 filesrc location=/opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h265.mp4 ! qtdemux name=qd qd.video_0 ! h265parse ! nvv4l2decoder ! nvvideoconvert ! fpsdisplaysink sync=false video-sink="fakesink" message-forward=TRUE text-overlay=FALSE signal-fps-measurements=TRUE

With this log, your hardware decoder is working in full speed.

We used tracing to test the performance of the nvh265dec and nvv4l2decoder elements. Their performance is shown in the figure:
620b585f06b19e2c400bdebdd0f6607
130cdd152a6fa23e4845932ee3c1550

So, can it be understood that to support DeepStream, nvv4l2decoder sacrifices some performance?

No. No relationship can be derived in this way.

Okay, I misunderstood. Thank you for your help. We’ll go test it again.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.