ONNX model for secondary inference with DeepStream

I am trying to integrate a ONNX model for secondary inference with DeepStream. The goal is to get the output tensors for post-processing.
This is what I have in the custom application.
pgie → nvtracker → sgie (onnx model)

About the model:
Input layer: 3x512x512
Output layer: 128x128x1

Objects are being detected by the pgie and information about them can be retrieved from the src pad probe buffer. The sgie however doesn’t seem to be processing the detection. process-mode is set to 2.

[property]
gpu-id=0
net-scale-factor=1
model-engine-file=onnx_b1_gpu0_fp32.engine
infer-dims=3;512;512
batch-size=1
force-implicit-batch-dim=0
model-color-format=0
process-mode=2
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=0
is-classifier=0
output-blob-names=model_1
input-object-min-width=100
input-object-min-height=100
operate-on-gie-id=1
operate-on-class-ids=0;1
#scaling-filter=0
#scaling-compute-hw=0
output-tensor-meta=1
gie-unique-id=4
parse-bbox-func-name=NvDsInferParseCustomOnnx
custom-lib-path=custom_bbox_onnx/libnvdsinfer_custom_bbox_onnx.so

Just as a test, I set process-mode to 1 and the parsing code did get invoked albeit with output layer data containing incorrect values in the output layer buffer (NvDsInferLayerInfo layer.buffer).

process-mode=1 does invoke the bounding box parser but the floating point values in the output layer buffer turn out to be completely incorrect (100.0 and above instead of being between 0 and 1).

Is there a way to verify if this model can work with DeepStream?

• Hardware Platform (Jetson / GPU)
GPU
• DeepStream Version
5.0
• JetPack Version (valid for Jetson only)
• TensorRT Version
7.0 TensorRT OSS
• NVIDIA GPU Driver Version (valid for GPU only)
460.39
• Issue Type( questions, new requirements, bugs)
Questions
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

For onnx model, “infer-dims” is not needed.

So your NvDsInferParseCustomOnnx implementation is wrong. You need to correct it. Since you can get the output even it is wrong, deepstream can support this model. The only thing is to implement your custom code correctly.

Thank you for the pointer. I’ll look into correcting it.

I should post more details about the model I am using to determine the right value for network-type and other nvinfer config parameters. I was looking into the nvinfer documentation and it says

Blockquote
Gst-nvinfer currently works on the following type of networks:
• Multi-class object detection
• Multi-label classification
• Segmentation

If the network-type is 2, then as per this thread Getting Segmentation Meta Data (NvDsInferSegmentationMeta) of custom segmentation model I need to implement NvDsInferSegmentationMeta.

However, this is a pose-estimation model. It outputs a heatmap (128x128) which I need to analyze. In the GitHub - NVIDIA-AI-IOT/deepstream_pose_estimation: This is a sample DeepStream application to demonstrate a human pose estimation pipeline. example, I see that then network-type is set to 100 and there is no bbox parsing function. The output tensor meta is being analyzed in the src pad buffer probe.

With the following SGIE configuration, the output tensor meta in obj_user_meta_list gives me constant float values above 1.0 but under 100.0. Now, this is without a bbox parsing function.

[property]
gpu-id=0
net-scale-factor=1
model-engine-file=onnx_b1_gpu0_fp32.engine
batch-size=1
force-implicit-batch-dim=0
model-color-format=0
process-mode=2
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=0
is-classifier=0
output-blob-names=model_1
segmentation-threshold=0.1
input-object-min-width=100
input-object-min-height=100
#network-type 0: Detector 1: Classifier 2: Segmentation 3: Instance Segmentation
network-type=100
operate-on-gie-id=1
operate-on-class-ids=0;1
#scaling-filter=0
#scaling-compute-hw=0
output-tensor-meta=1
gie-unique-id=3

How do you determine the right parameters for pose-estimation model?
Is a bbox parser even needed for such a model?

Here is what nvinfer reports when it loads the model

INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:685 [FullDims Engine Info]: layers num: 2
0   INPUT  kFLOAT model_1_input   3x512x512       min: 1x3x512x512     opt: 1x3x512x512     Max: 1x3x512x512     
1   OUTPUT kFLOAT model_1         128x128x1       min: 0               opt: 0               Max: 0

If “output-tensor-meta=1” and “network-type=100”, gst-nvinfer will not handle the model output but export all output layers with NvDsInferTensorMeta Gst-nvinfer — DeepStream DeepStream Version: 5.0 documentation, you need to parse and process the output layer data by yourself. No bbox will be generated by gst-nvinfer. We don’t know the post-processing algorithm of your model either we don’t know whether the 1.0~100.0 value is correct or not, you need to consult to the person who provide the model to you.

You can refer to the post-processing implementations in /opt/nvidia/deepstream/depstream-5.0/sources/libs/nvdsinfer/nvdsinfer_context_impl_output_parsing.cpp to judge whether your model is the standard segmentation model supported by deepstream or not.