DeepStream 6.0 output-tensor-meta=1 extremely slow


With version 6.0, I ran into an issue where DeepStream runs extremely slow when output-tensor-meta=1.

In the example provided below, the FPS drops from 90 FPS to 20 FPS when output-tensor-meta=1 is enabled on the PGIE. Below are the steps to reproduce the issue:

  1. Unzip the source code attached below to /opt/nvidia/deepstream/deepstream-6.0/sources/apps/sample_apps/
  2. cd /opt/nvidia/deepstream/deepstream-6.0/sources/apps/sample_apps/deepstream-infer-tensor-meta-test
  3. make
  4. ./deepstream-infer-tensor-meta-app file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.h264

deepstream-infer-tensor-meta-test.tgz (8.9 MB)

The same issue can be reproduced on deepstream-app-3 available on Nvidia’s AI GitHub repo. The issue is more pronounced when using a heavier model than the default model (e.g. YOLO or SSD).

IMPORTANT: I believe that the problem lies within the gst-nvinfer plugin version 6.0.
Upon replacing the gst-nvinfer provided in 6.0 with the one provided in DeepStream 5.1 and compiling, output-tensor-meta=1 on the PGIE no longer slows down the application.

I was wondering if we can get a temporary fix and also a patch in DeepStream 6.1 addressing the issue.

Below are the specifications to my environment

• Hardware Platform (Jetson / GPU) dGPU
• DeepStream Version: 6.0
• TensorRT Version : 8.01
• NVIDIA GPU Driver Version (valid for GPU only) : 470.57.02

Thank you and best regards,

I observed the same effect (using the python bindings) in DeepStream 6.0. Hope for clarification

Any updates on this?

Hi @ljay189 ,
I think you added your own post-processor to the data exposed with “output-tensor-meta=1”, did you check the time consumption of post-processor? It’s possible the pipeline is blocked by the post-processor.

Thank you for the response.

The same issue can be reproduced using the python applications provided in deepstream-test3, where I believe there are no custom post-processor logic (using python bindings).

The attached sample might have added additional layers of complexity, so I recommend reproducing the issue using deepstream-test3 and setting output-tensor-meta=1 to save time.