How to make the Secondary Classifier to operate on custom metadata instead of cropped frame of previ...

I’m trying to figure out how to stack custom ML models using NvInfer element.

The pipeline is like this:
object-detection → pose estimation → action recognition.

The problem is my pose estimation’s output (human skeleton) is the input for the action recognition model in the pipeline.

So I don’t know how to make a nvinfer element to operate/inference on the previous nvinfer element’s TensorMeta or ObjectMeta in the buffer.

Is there any way to do it?

Hi,

Our default deepstream-app usually consists of one detector and one classifier.
Detector is triggered regularly while classifier inferences depends on the pipeline workload.
As a result, it’s not easy to use default deepstream-app, which controlled via config file, to achieve your use case.

It’s recommended to implement your own use case following with this sample:
/opt/nvidia/deepstream/deepstream-4.0/sources/gst-plugins/gst-dsexample/

For metadata implementation, you can check this sample for more information:
/opt/nvidia/deepstream/deepstream-4.0/sources/apps/sample_apps/deepstream-user-metadata-test

Thanks.

Hi @AastaLLL,

I checked the sample to write custom plugin and user metadata test.

But what I need in my use case is make the custom metadata (human skeleton) become input for the next ML model (action recognition) in DeepStream pipeline, instead of using the frame as input.

More specifically, the action recognition model’s input is not an image cropped from the frame, but a tensor with shape (1, 17, 2): batch-size=1, number-of-keypoint=17, location-dim-x-y=2.

Is there any way I can accomplish this, even if I have to modify the nvinfer plugin or something?

Thanks,

Hi,

A possible solution is treat pose estimation & action recognition as a single model, and insert a customized plugin layer in between.

So you can pass the combined model just as a secondary classifier.
And you can decide how to pass the tensor from pose estimation into action recognition in the TensorRT plugin layer.

Thanks.

Hi AstaLLL,

Thank you very much for your solution.
The solution is a great way to work around.

Unfortunately, my action recognition model doesn’t work on a single frame of human skeleton but a time-series of them.

I hope my requested feature: “make the custom metadata (may be many metadata) to be input(s) for the next nvinfer element in the pipeline” is going to be available in the next version of NVIDIA DeepStream.

It’s going to make NVIDIA DeepStream SDK so much flexible, just like the “ensemble models” feature of NVIDIA TensorRT Inference Server: https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/models_and_schedulers.html#ensemble-models.

Thanks,

Hi,

Thanks for your update.

We have passed your request to our internal Deepstream team.
Will update more information here once we got any feedback.

On the other side, please noticed that TensorRT plugin allow users to allocate memory.
So it is possible to reserve information cross different frames.

Thanks.