Workflow containing using multiple models for inference

I trained an ActionRecgonitionNet model by using the provided Colab sample code and exported it to get a .etlt file.

I provided it to my colleague who’s been working on DeepStream. He tried the model and it worked correctly, but he tried it on a video clip containing multiple persons.

The result of the action recognition kept changing as people in the said video clip were doing different actions.

I wonder if it’s possible to do the following thing on DeepStream:

  1. First, detect all the persons appearing in the image using models like YOLO-v4 or PeopleNet.
  2. Suppose that N persons are detected as described above and we have their positions. For each person, do action recognition respectively, which means that their appearance on the image is used as the input for the action recognition net. Therefore, if there’re N persons detected, the inference of the action recognition net has to be done N times.

It is supported to use nvpreprocess + nvinfer as SGIE since DeepStream 6.2. Gst-nvdspreprocess (Alpha) — DeepStream 6.2 Release documentation. Please use “process-on-frame” of nvdspreprocess to control the nvpreprocess to work as PGIE or SGIE. For such action recognization model, since it needs continuous images of the same person, it is recommend to add nvtracker to track the person so that you can identify the bbox for the same person in nvdspreprocess library with track-id.

We will publish a similar sample to show how to use nvdspreprocess as SGIE and how to collect succeeded bboxes for the same person with track-id in nvdspreprocess library. Please wait for the new sample.


The nvpreprocess+nvinfer(nvinferserver) as SGIE sample is published: deepstream_tao_apps/apps/tao_others/deepstream-pose-classification at master · NVIDIA-AI-IOT/deepstream_tao_apps (

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.