Create Deepstream plugin that encapsulates process of my ONNX model - where to start?

• Hardware Platform (Jetson / GPU) GPU - GeForce RTX 4090
• DeepStream Version 6.2
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only) 535. 146.02

Hello! I have successfully installed Deepstream 6.2 and all other required libraries on my machine and successfully ran numerous sample applications. Now I want to integrate my ONNX model into the pipeline.

This model receives an image as an input that I would like to prepare with some scaling and affine transformations, and it outputs two small tensors with float values.

Basically what I want is a Deepstream plugin. What I learned for now:

  1. I need to create .so file that will encapsulate my ONNX model and input and post processing logic
  2. This .so file I should pass into GStreamer nvinfer plugin, that can be integrated into Deepstream pipeline

Where I am a bit confused - how to create this .so file. I have seen this beautiful tutorial about ONNX model integration, but it doesn’t quite hit the spot I need. As I understood, this post is heavily utilizing stuff that was kindly prepared by Deepstream/GStreamer developers for seamless YOLO-type models integrations.

My case is a bit different, because I have a very customized output and very customized input processing algorithm.

Could you please guide me to some materials which can be a good starting point for creating Deepstream plugin from scratch? Now I am focused on this tutorial - Using a Custom Model with DeepStream — DeepStream documentation 6.4 documentation. Something tells me this is exactly what I need. But some code bits would really help here.

Also I am going through nvdsinfer_custom_impl.h, I believe it is what I need to use to create .so plugin with my model, but I can’t seem to clearly separate where I process input, where I process output, and how the actual loading of ONNX model happens. Please let me know if I am in the right direction and if there are samples built with custom models inside.

about onnx model integration, please refer to sample deepstream_tao_apps/configs/nvinfer/yolov4_tao/pgie_yolov4_tao_config.txt at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub.
if needing custom preprocessing, you can using nvdspreprocess plugin. preprocessing will be don in nvdspreprocess, nvinfer will get preprocessed meta. please refer to sample deepstream-preprocess-test and deepstream-3d-action-recognition in DeepStream SDK.
about custom postprocessing for onnx model, please refer to post_processor, which only includes postprocessing, not includes preprocessing.

1 Like

Hello @fanzh ! Thank you for your response! I was away because I was figuring out how nvdspreprocess_plugin works.

For the time being I am having difficulties with it.

My preprocess plugin needs to do two steps:

  1. Crop a rect out of the input frame. This rect is passed by PGiE detector and I verify this information is available in the plugin through metadata
  2. Do affine transformation on the cropped image

I see that this plugin already has some cropping inside, I see a lot of comments with words “cropped ROI”. Does it mean this plugin can already satisfy first requirement or these words in this context have different meaning?

If this plugin does cropping, how can I force it to crop according to metadata from PGiE? Basically, right now I have “roi_meta” and “frame_meta”, and “roi_meta” seems to contain “processing-width” and “processing-height” of my config.

“Crop a rect out of the input frame” means ROI, please refer to doc. nvdspreprocess already supports ROI feature. you only need to add ROI configuration.

  1. is “the cropped image” “Crop a rect out of the input frame”?
  2. could you elaborate on affine transformation? what kind of affine transformation?
  3. what do you mean about “how can I force it to crop according to metadata from PGiE”? could you share the whole media pipeline? nvdspreprocess supports custom preprocess algorithm. you need to encapsulate algorithm to a so lib, and set the interface in the configuration file.
  1. Yes. PGiE detector is basically a FaceDetector. I used FaceDetect model kindly provided by NVIDIA team. I placed preprocess module after face_detector like so

if (!gst_element_link_many (streammux, queue1, primary_detector, queue2, preprocess,
queue3, nvtile, queue4, nvvidconv, queue5,
nvosd, NULL)) {
g_printerr (“Inferring and tracking elements link failure.\n”);
return -1;

where primary_detector is Face Detector

  1. I need to scale cropped face rect to a certain size which, in usual CPU code, is done via cv::warpAffine. I need this preprocessing before I can pass cropped face further down the line for SGiE detector which is my neural network (not yet integrated into the pipeline)
  2. How .so lib is created, I have already figured, thanks to your amazing guidance.

As for “how can I force it to crop according to metadata from PGiE?”

In the preprocess plugin code I see that ROI cropping/definition is done through roi_meta. What is unclear - how can I force this roi_meta to contain face rects that were tracked from PGiE? What I have tried - I just swapped roi_meta to my face detector meta, but it seems to result in CUDA memory issue, because each time face box is different and plugin creates a new chunk of memory, and it doesn’t seem to free up chunks after previous operations. I tried to clean the data myself, but it seems to impact negatively the whole processing sequece. So I thought that I might not use this cropping function right, since plugin is failing when I swapped roi_meta

In the doc that you have sent there is a text:

Streams with predefined ROIs (Region of Interests) are scaled and format converted as per the network requirements for inference. Per stream ROIs are specified in the config file.

Does it mean that cropped ROIs can only be static since they are provided via config file?

Also, here is my plugin’s config file, just to complete the picture:

The values in the config file are overridden by values set through GObject



# 0=process on objects 1=process on frames


# network-input-shape: batch, channel x sequence, height, width

2D sequence of 64 images

network-input-shape= 4;192;720;1080

2D sequence of 32 images

network-input-shape= 4;96;720;1080

# 0=RGB, 1=BGR, 2=GRAY

# 0=FP32, 1=UINT8, 2=INT8, 3=UINT32, 4=INT32, 5=FP16



# 0=NvBufSurfTransformCompute_Default 1=NvBufSurfTransformCompute_GPU
# 2=NvBufSurfTransformCompute_VIC(Jetson)
# Scaling Interpolation method
# 0=NvBufSurfTransformInter_Nearest 1=NvBufSurfTransformInter_Bilinear 2=NvBufSurfTransformInter_Algo1
# 3=NvBufSurfTransformInter_Algo2 4=NvBufSurfTransformInter_Algo3 5=NvBufSurfTransformInter_Algo4
# 6=NvBufSurfTransformInter_Default

# model input tensor pool size



2D conv custom params



  1. nvdspreprocess plugin also can be added before sgie. please refer to sample /opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/source4_1080p_dec_preprocess_infer-resnet_preprocess_sgie_tiled_display_int8.txt
  2. after pgie, the faces will be detected and the rects will be saved in object meta. you can add a new preprocess plugin before sgie and set process-on-frame to 0, then you can access all object rects in this nvdspreprocess plugin.
1 Like

Hello @fanzh !

Thanks to your amazing support, I have managed to build next pipeline:

Face Detector -> Preprocess 1 -> My model 1 (face analysis) -> ...

and now I need to build:

... -> Preprocess 2(uses cropped face rect from Preprocess 1and inference result of My model. It modifies a cropped image using inference result) -> My model 2 -> ...

So I see two ways to go about it:

  1. Create another preprocess plugin like I drawn in scheme. It will fetch cropped face, fetch inference result of Model 1, do image modifications and pass it further. Is this possible to do? If so, could you provide me some guidance how to setup config for such preprocess plugin? Maybe there is some sample application that does sequential multiple models integration. For instance, I don’t know if I need to set process-on-frame=0, because this preprocess plugin is not supposed to use metadata of FaceDetector, but the output of preprocess plugin. And at the same time it should have access to inference result of Model 1

  2. Second way I like much better - implement a post processing function that will modify input frame according to inference result. But the same question - is there some example that shows how to access both input and inference result in post-processing function? How to identify that next element in the pipeline should operate with the frame that was created from this post-processing function.

And if 2. is possible, the most important question:

You kindly referred me to this postprocessor, and I see how it utilizes NvDsInferParseCustomFunc, but this function as an output provides vector of NvDsInferObjectDetectionInfo which is a predefined for detected objects. But my model does not do face detection, I don’t need to return it, I just want to do image processing on input frame and pass it further somehow. Is there an example how to build post processing function for models other than face detection or classification?

I just found this bit in postprocess tech spec

Currently the plugin supports detection, classification and segmentation models for parsing.

Does it mean that what I require is impossible to do via postprocess plugin? I guess this question is more rhetorical, because quote speaks for itself.

could you elaborate on your use scenario? As I understood, there are three models in your pipeline. the first one is Face Detector which is used to detect faces. about the second model, there are different descriptions, as quoted. is it a detector model or face analysis model? what is the third model used to do?

As I understood, you pipeline should be “the source->streammmux->preprocess0->pgie->preprocess1->sgie->…” in preprocess0, you only need to add some ROI configurations. in preprocess1, you need to encapsulate custom algorithm to a so lib, and set the interface in the configuration file. pleas see my first comment.

what is the model’s output ? after inference, we need to do postprocess. why do you want to do image processing on input frame? you can set network-type to 100 in nvinfer’s configuration, then you can process the inference results yourself. please refer to sample deepstream-infer-tensor-meta-test.

1 Like

affine transformation includes Flip, Rotations, Translations, scaling, etc. does your affine transformation only need scaling?

1 Like

Good morning, @fanzh! Thank you for your response

I have a set of models that are supposed to do a comprehensive face analysis. First model is Face Detector, second model calculates affine transformation to align the face, then after the face is aligned, I want to crop certain squares of the face in order to do other inferences on them as well. So my use case is - set of models that work sequentially one after another, and each model requires some input preprocessing prior to the inference

Because my pipeline won’t be over after the first SGiE, first SGiE just tells me how to modify an input image so my other models will get correctly formatted inputs

  1. thanks for the sharing! is the second model used to calculate affine transformation? not face analysis?
  2. affine transformation includes Flip, Rotations, Translations, scaling, etc. does your affine transformation only need scaling?
  3. there is a face analysis sample deepstream-gaze-app. the 1st model detects faces, the 2st model detect facial landmarks, the 3st model detects a person’s eye gaze point based on facial landmarks. the preprocess requirements are different. just for reference.
1 Like

Hello @fanzh! Sorry for such a long response, I was focusing on my Deepstream implementation.

Thank you for your kind support, with your help I have managed to create Deepstream plugin with my models inside :) Instead of implementing image processings as preprocess plugins, I implemented them using torch operations that ONNX-compatible. So this way I have merged all inference work into one single ONNX file that was beautifully exported to TensorRT and used in Deepstream.

Again, thank you for your assistance, you helped me a ton in understanding Deepstream and I am grateful for it.

While I have this thread - I have another question.

Is it possible to disable face detection module under certain circumstances? For instance, I detected face bbox on previous frame and I know exactly where it is placed on the next one. So I’d rather force pipeline to use the knowledge from the previous frame than detecting face rect again.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

if the person dose not move

if the person does not move, the bbox position will not change. you can set nvinfer’s interval to G_MAXINT to disable inference. please refer to this topic. if the person will move, the bbox position will change, you have to detect the face in each frame.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.