How to pass GstBuffer to inference plugin in custom gstreamer plugin

Hi all, @mdegans, @miguel.taylor
I want to create a custom gstreamer buffer and added to deepstream.
My goal is to create inference plugin gstreamer like face recognition plugin and add to deepstream pipeline.
This work has two Question:
I want to use python in do_transform_ip() method:

1- How to pass gst-buffer to inferecne model? I don’t want to convert buffer to numpy array.
2- Crop the buffer with given coordinates(metadata) that comes from before element and then feed to model.

Are you writing python gstreamer plugin? Why don’t you just use gstnvinfer for your face recognition model?

@Fiona.Chen,
gsrnvinfer is only used for classification/detection/segmentation tasks, Is it possible to use gstnvinfer for face recognition task? Because the classification task only give us max index of class, but in the face recognition I want to give me like 128-d array. How I can to use this task in classification.
in my opinion, you suggest to use classification for face recognition, right? If so , How I can to do this?

gstnvinfer is implemented in c/C++. It supports customized model.
There are several ways to support different customized models, please refer to https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html#page/DeepStream_Development_Guide/deepstream_custom_model.html

What is your model input and output? What will you do with the “128-d array”? Is this array for drawing something, for example, the outline of the face? If so, you can use customized user meta to output the array https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html#page/DeepStream%20Plugins%20Development%20Guide/deepstream_plugin_metadata.html#wwpID0E3HA. And the gst-nvosd plugin can also draw polygon lines. https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html#page/DeepStream%20Plugins%20Development%20Guide/deepstream_plugin_details.html#wwpID0E0PW0HA

I want to push 128-d array of each face to next element for checking the similarity face in database.

You can handle the metadata in your own plugin just as dsexample does. The customized user meta is useful for your case. https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html#page/DeepStream%20Plugins%20Development%20Guide/deepstream_plugin_metadata.html#wwpID0E3HA

@Fiona.Chen,
Thanks, for passing the gst-buffer to model in custom inference plugin, What’s your suggestion? I don’t want to use converting buffer to numpy array. I saw the nvinfer plugin source code, but I don’t familiar with c/c++, please if possible give me only hint about this problem.

Q2- Is it possible to use nvinfer plugin for face recognition model? the face recognition is like classification task, but instead of to have index high score class, I want to give me a tensor like 128-d, Is it possible to use nvifer plugin for such purpose?

Q3- In the NvDsInferTensorMeta, Is it possible to give me multiple output-tensor-meta for each object of detected in frame? I want to know the nvinfer plugin for object detection is it possible to give tensor-meta for all of detected objects in fame?

Q4- Is it possible to access input tensors of streams in metadata after nvinfer plugin? If it’s possible to get input tensors of streams after nvinfer, in the next custom element I can pass all of tensors to my own model in the next element.
Does nvinfer have such a capability?

My goal is to get tensor of detected objects as metadata like tensors of detected faces as metadata, and then I use these tensors of faces as metadata passed to face recognition in the next element? Is it possible? How?

It is not recommended to develop GStreamer plugin with python. The bindings are so limited that you can not handle Nvidia customized elements and data structures freely. And there is rare sources and documents for python GStreamer plugin development. C/C++ is the correct way to develop your own plugin.

Q2: It is possible. I have mentioned this before. You need to understand the document and codes. There are several ways to support different customized models, please refer to https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html#page/DeepStream_Development_Guide/deepstream_custom_model.html .
Q3- Yes. Please refer to https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html#page/DeepStream%20Plugins%20Development%20Guide/deepstream_plugin_details.html#wwpID0E0PDB0HA
Q4-I don’t understand what the “tensors” mean. If you define “tensors” as “128-d array” which is tensorRT output, the answer is yes. I also mentioned it before, you can stored your tensorRT output with customized user metadata. https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html#page/DeepStream%20Plugins%20Development%20Guide/deepstream_plugin_metadata.html#wwpID0E3HA. Nvinfer has such capability, you need to study the answer to Q2.

Your goal can be implemented. You need to choose the suitable way in https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html#page/DeepStream_Development_Guide/deepstream_custom_model.html according to your model.

1 Like

Hi, @Fiona.Chen,
Thanks so much quickly answer the question.I appreciate you.
I really understand the python way is not suite for product, but for prototype really I need to get result of my work quickly, then for production I have to use c/c++.

Note: If you use inside Python plugin any C/C++ based libs (numpy, opencv, tensorflow, …) then in most cases there is no performance drop.

Q1- I want to know, suppose I want to use car detection and car color recognition, as you know is exist in the sample apps, I want to know when the buffer feed to inference model? Inside the nvinfer plugin the buffer is converted to tensor then pass to model? If so, How they convert buffer to tensor ? Has a function for doing this?

Q2- The output decoder of stream buffer is NV12 format, Because the inference models required RGB format and also scaling transformation. inside the infer plugin, in fact, nvinfer has converter and rescale plugin like nvvidconv? and is bin elemet?

Q1- I want to know, suppose I want to use car detection and car color recognition, as you know is exist in the sample apps, I want to know when the buffer feed to inference model? Inside the nvinfer plugin the buffer is converted to tensor then pass to model ? If so, How they convert buffer to tensor ? Has a function for doing this?
Answer Inside gstnvinfer, the buffers will be handled by pre-processing => inferrence=>post-processing. The memory type of the buffer is already adapted to tensorRT after nvvideoconvert plugin, it is not converted inside nvinfer plugin. It is not a function but the buffer is allocated with the memory type that tensorRT can accept.

Q2- The output decoder of stream buffer is NV12 format, Because the inference models required RGB format and also scaling transformation. inside the infer plugin, in fact, nvinfer has converter and rescale plugin like nvvidconv? and is bin elemet?
Answer The format is NVMM NV12 format but not NV12 format. The nvinfer itself will convert NV12 format to RGB format and scaling. These are parts of nvinfer pre-processing.

@Fiona.Chen , Thanks

The nvinfer itself will convert NV12 format to RGB format and scaling.

Q1- For Converting NV12 to RGB perform by gstreamer plugin like nvvideoconvert inside nvinfer or used functions?
I want to know in the nvinfer plugin for color conversion or scaling used gstreamer plugins or functions?

Q2- When we use nvvideoconvert plugin to convert NV12 to RGB, Then this buffer allocated in the CPU buffer,
I want to know when all of the pre-processing/inference/post-processing/drawing steps need to RGB format, why we use NV12 format in the first? Becasue the GPU buffer only supported NV12 format?

Q1- For Converting NV12 to RGB perform by gstreamer plugin like nvvideoconvert inside nvinfer or used functions?
I want to know in the nvinfer plugin for color conversion or scaling used gstreamer plugins or functions?
Answer Pre-processing is done inside nvinfer plugin, it is part of the nvinfer plugin. It is done by CUDA. Pre-processing is implemented in libnvds_infer.so. The source codes are in /opt/nvidia/deepstream/deepstream-5.0/sources/libs/nvdsinfer.

Q2- When we use nvvideoconvert plugin to convert NV12 to RGB, Then this buffer allocated in the CPU buffer,
I want to know when all of the pre-processing/inference/post-processing/drawing steps need to RGB format, why we use NV12 format in the first? Becasue the GPU buffer only supported NV12 format?
Answer You don’t need to convert NV12 to RGB, nvinfer itself can do it. What you need to do is to write the nvinfer configuration file correctly. “model-color-format” is to set the format your model wants. https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html#page/DeepStream%20Plugins%20Development%20Guide/deepstream_plugin_details.html#wwpID0E04DB0HA
Please read gst-nvinfer document carefully. https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html#page/DeepStream%20Plugins%20Development%20Guide/deepstream_plugin_details.html#wwpID0E0YFB0HA

The reason to involve NV12 format first in the pipeline is because HW video decoder output NV12 format video. and nvstreammux supports multiple streams of NV12 format videos.

Our sample codes are all good references for you to understand the data flow.