DeepStream 6.0 output-tensor-meta=1 extremely slow


With version 6.0, I ran into an issue where DeepStream runs extremely slow when output-tensor-meta=1.

In the example provided below, the FPS drops from 90 FPS to 20 FPS when output-tensor-meta=1 is enabled on the PGIE. Below are the steps to reproduce the issue:

  1. Unzip the source code attached below to /opt/nvidia/deepstream/deepstream-6.0/sources/apps/sample_apps/
  2. cd /opt/nvidia/deepstream/deepstream-6.0/sources/apps/sample_apps/deepstream-infer-tensor-meta-test
  3. make
  4. ./deepstream-infer-tensor-meta-app file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.h264

deepstream-infer-tensor-meta-test.tgz (8.9 MB)

The same issue can be reproduced on deepstream-app-3 available on Nvidia’s AI GitHub repo. The issue is more pronounced when using a heavier model than the default model (e.g. YOLO or SSD).

IMPORTANT: I believe that the problem lies within the gst-nvinfer plugin version 6.0.
Upon replacing the gst-nvinfer provided in 6.0 with the one provided in DeepStream 5.1 and compiling, output-tensor-meta=1 on the PGIE no longer slows down the application.

I was wondering if we can get a temporary fix and also a patch in DeepStream 6.1 addressing the issue.

Below are the specifications to my environment

• Hardware Platform (Jetson / GPU) dGPU
• DeepStream Version: 6.0
• TensorRT Version : 8.01
• NVIDIA GPU Driver Version (valid for GPU only) : 470.57.02

Thank you and best regards,

I observed the same effect (using the python bindings) in DeepStream 6.0. Hope for clarification

Any updates on this?

Hi @ljay189 ,
I think you added your own post-processor to the data exposed with “output-tensor-meta=1”, did you check the time consumption of post-processor? It’s possible the pipeline is blocked by the post-processor.

Thank you for the response.

The same issue can be reproduced using the python applications provided in deepstream-test3, where I believe there are no custom post-processor logic (using python bindings).

The attached sample might have added additional layers of complexity, so I recommend reproducing the issue using deepstream-test3 and setting output-tensor-meta=1 to save time.

Has the issue been solved ?

This issue still persists

Hi @ljay189
This should be same as Custom YOLOv4 Model Performance - #3 by mchi

Thanks for providing the repo! I tried it on my side, it generated model_b1_gpu0_fp32.engine mode file, which means it runs at FP32 prevision, you need to change to fp16 or INT8 so that you can use Tensor Core for higher compute capability and better infer perf.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.