Regarding the understanding of networks supported by nvinfer, which kind is right?

I noticed the following in the official documentation.
Gst-nvinfer currently works on the following type of networks:
Multi-class object detection
Multi-label classification
Segmentation (semantic)
Instance Segmentation
I have two understandings.
1,The first understanding, this is because nvinfer plugin uses tensorrt inference, and the entire inference process, and the loading of data after inference, adapted to a variety of cases above.
2,The second understanding: the above indicates a small category, nvinfer is the default direct support for the above network model (for example: resnet10.caffemodel), and does not support the need to customize the implementation of the interface (for example: yolov5s).
3,Which understanding is correct?

The description is not complete. gst-nvinfer only handle video/image inferencing models. There are some typical types of models:

  1. The detection model which will output bbox of object in the video/image
  2. The classification model which will output the class type of the object, for example some models inferences on car images and recognize the color of the car.
  3. The segmentation model which will output the mask of the object.
  4. Instance segmentation model which output the mask and the object together.
  5. The other types of models which are not any of the above.

There are some default postprocessing algorithms inside gst-nvinfer such as softmax parsing, NMS clustering,… etc. Gst-nvinfer will do different postprocessing for the different types of models. And you can also customize your own postprocessing if the default postprocessing algorithms are not suitable for your model.Using a Custom Model with DeepStream — DeepStream 6.1.1 Release documentation

The gst-nvinfer is totally open source, the code itself is better than the docuemnt.

I have implemented a custom inference interface.
But what I don’t understand is about the classification of yolov5s, why do I need to implement a custom interface?
Shouldn’t it belong to Multi-class object detection?

The default postprocessing algorithm can not handle yolov5s output. So you need to customize the postprocessing.

You mean that I just need to custom process the output of yolov5s. About the default post-processing algorithm for tensorrt inference, deesptream’s gst-nvinfer already does it, right?
The default inference algorithm of deesptream contains the following:
Multi-class object detection
Multi-label classification
Segmentation (semantic)
Instance Segmentation

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one.

No. The default postprocessing algorithms are in DetectPostprocessor::fillDetectionOutput(), ClassifyPostprocessor::fillClassificationOutput() and SegmentPostprocessor::fillSegmentationOutput(). It is hard to describe the algorithms in short. please refer to the source code: /opt/nvidia/deepstream/deepstream/sources/libs/nvdsinfer/nvdsinfer_context_impl_output_parsing.cpp

There is already a yolov5 sample in NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream (, seems it is similar to yolov5s

Please read the document and samples carefully

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.