How to deploy skeleton-based action recognition model to deepstream?

clancydo · July 10, 2024, 2:34am

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU): Jetson Orin NX
• DeepStream Version: nvcr.io/nvidia/deepstream:7.0-triton-multiarch
• JetPack Version (valid for Jetson only): Jetpack 6.0
• TensorRT Version: 8.6.2.3
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs): questions

Hi everyone,

I followed DeepStream-Yolo-Pose to run Yolov8-pose on Jetson Orin and it worked. I also added NvDCF tracker so now I can get person id and pose after nvtracker. I’m using Python.

After that, I want to implement ST-GCN action classifier as sgie, which takes an input sequence of these poses with same tracker id and outputs the action class. The model has 2 inputs, their shapes are input1:batchx2x15x17; input2:batchx2x14x17. 17 is number of skeleton joints, 15 is sequence size and input2 is the motion subtraction of skeletons in the input1 so its sequence size is 14.

So, I have some questions:

How can I obtain the input of ST-GCN model I mentioned above which is 2x15x17?
How to implement ST-GCN model as sgie, especially my model have two inputs? Does it have any example similar/related to this task?
Do you guys have any suggestion or recommendation for me about possible approaches to deploy skeleton-based acction recognition on deepstream?

I’m quite new to deepstream so hopefully you guys can help me. Thanks in advanced.

Fiona.Chen · July 10, 2024, 3:33am

Please consult the author of the Yolov8-pose model for how to make the model output the skeleton joints data as you want.

For how to implement pose classifier model as SGIE, we have a TAO pose classifier model Pose Classification | NVIDIA NGC and the TAO bodypose model BodyPose3DNet | NVIDIA NGC. The DeepStream sample is in deepstream_tao_apps/apps/tao_others/deepstream-pose-classification at master · NVIDIA-AI-IOT/deepstream_tao_apps (github.com)

clancydo · July 10, 2024, 4:47am

Thank you, i will check it later.

Actually, the output of Yolov8-pose model is similar to normal Yolov8 model so I can access the skeletons by using

obj_meta = pyds.NvDsObjectMeta.cast(l_obj.data)
data = obj_meta.mask_params.get_mask_array()

So with the ID assigned by the tracker, how can I create the sequence of skeletons as I mentioned (2x15x17) so I can use it for the SGIE? Do you have any suggestion or ideal to do it? Thank you very much.

Fiona.Chen · July 10, 2024, 7:46am

clancydo:

Actually, the output of Yolov8-pose model is similar to normal Yolov8 model so I can access the skeletons by using
obj_meta = pyds.NvDsObjectMeta.cast(l_obj.data)
data = obj_meta.mask_params.get_mask_array()
So with the ID assigned by the tracker, how can I create the sequence of skeletons as I mentioned (2x15x17) so I can use it for the SGIE? Do you have any suggestion or ideal to do it?

Are you asking for the algorithm of getting the 17 skeleton joints coordinates from the yolov8-pose output mask data? It depends on the models you are using. Please consult the guys who provide the models.

clancydo · July 10, 2024, 8:12am

No, I can get 17 skeleton joints from the Yolov8-pose. But when I cast data from ObjectMeta, I only have the people ID and their skeleton joints in the current frame while the ST-GCN requires 15 consecutive skeletons as input.

Assume that I have a person with ID 1 in the video stream, how do I stack the skeleton joints of this person into sequence of 15 so I can put it throught the ST-GCN model?

Fiona.Chen · July 10, 2024, 8:18am

Please refer to the deepstream_tao_apps/apps/tao_others/deepstream-pose-classification at master · NVIDIA-AI-IOT/deepstream_tao_apps (github.com) sample. The TAO pose classifier model( Pose Classification | NVIDIA NGC) needs succeeded 300 frames 34 key points coordinates, please refer to our sample.

clancydo · July 10, 2024, 8:24am

Thank you, I will check it.

clancydo · July 11, 2024, 2:34am

Hi @Fiona.Chen, I ran the deepstream-pose-classification. In this sample, the pipeline is pgie(peoplenet detect person) → tracker → sgie0(extract skeletons) → nvpreprocess1(preprocess skeletons) → sgie1(predict action) right?

Now, I want to change the pgie into Yolov8-pose so the pipeline will be like pgie(Yolov8-pose) → tracker → sgie(ST-GCN predict action). But your pretrained ST-GCN has “nvidia” graph_layout and it requires 34 joints meanwhile the Yolov8-pose only provide 17 joints. So, seem like your pretrained ST-GCN can’t be used along with Yolov8-pose right? Please correct if I’m wrong. Therefore, I want to use my own ST-GCN so the pipeline will be pgie(Yolov8-pose) → tracker → sgie(custom ST-GCN predict action)

I have some questions:

My ST-GCN model have two inputs as I mentioned before, how do I change the pipeline as I described? Can you guide me the steps and the things that I need to do to modify the pipeline base on the deepstream-pose-classification sample?
I saw the labels of the NVIDIA dataset here. I’m not access it yet but can it be transformed into COCO format?
Your pretrained ST-GCN can only infer a single person. But I want to infer multiple people, so do I have to re-train it?

Thank you very much.

Fiona.Chen · July 11, 2024, 3:07am

No. The two models can’t be used together if no change.

The gst-nvinfer and gst-nvdspreprocess are all open source. You can modify and customize them to make them adapt to your model

Please raise topic in TAO forum for the dataset and model related questions. Latest Intelligent Video Analytics/TAO Toolkit topics - NVIDIA Developer Forums

clancydo · July 11, 2024, 3:33am

Thank you very much. I will ask on the forum if I have any issue.

clancydo · July 16, 2024, 7:30am

Hi @Fiona.Chen,

I tried to modify nvinfer for my custom SGIE but I face this error:

0:00:08.748820341 783977 0xaaab07815b00 INFO                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<sgie> NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:2095> [UID = 2]: deserialized trt engine from :/data/st_gcn_model/st_gcn.onnx_b32_gpu0_fp32.engine
INFO: [FullDims Engine Info]: layers num: 3
0   INPUT  kFLOAT batch_vid       2x15x13         min: 1x2x15x13       opt: 32x2x15x13      Max: 32x2x15x13      
1   INPUT  kFLOAT mot             2x14x13         min: 1x2x14x13       opt: 32x2x14x13      Max: 32x2x14x13      
2   OUTPUT kFLOAT output_action   8               min: 0               opt: 0               Max: 0               

0:00:09.108132840 783977 0xaaab07815b00 INFO                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<sgie> NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2198> [UID = 2]: Use deserialized engine model: /data/st_gcn_model/st_gcn.onnx_b32_gpu0_fp32.engine
0:00:09.108391563 783977 0xaaab07815b00 ERROR                nvinfer gstnvinfer.cpp:676:gst_nvinfer_logger:<sgie> NvDsInferContext[UID 2]: Error in NvDsInferContextImpl::preparePreprocess() <nvdsinfer_context_impl.cpp:1035> [UID = 2]: RGB/BGR input format specified but network input channels is not 3
ERROR: Infer Context prepare preprocessing resource failed., nvinfer error:NVDSINFER_CONFIG_FAILED
0:00:09.136593031 783977 0xaaab07815b00 WARN                 nvinfer gstnvinfer.cpp:912:gst_nvinfer_start:<sgie> error: Failed to create NvDsInferContext instance
0:00:09.137384722 783977 0xaaab07815b00 WARN                 nvinfer gstnvinfer.cpp:912:gst_nvinfer_start:<sgie> error: Config file path: config_infer_second_st_gcn.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED

ERROR: gst-resource-error-quark: Failed to create NvDsInferContext instance (1): /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp(912): gst_nvinfer_start (): /GstPipeline:pipeline0/GstNvInfer:sgie:
Config file path: config_infer_second_st_gcn.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED

Here is the config_infer_second_st_gcn.txt file:

[property]
gpu-id=0
net-scale-factor=1
onnx-file=/data/st_gcn_model/st_gcn.onnx
model-engine-file=/data/st_gcn_model/st_gcn.onnx_b32_gpu0_fp32.engine
#custom-lib-path=/data/nvdsinfer_custom_st_gcn/nvdspreprocess_lib/libcustom2d_preprocess.so
network-type=1
network-mode=0
batch-size=32
process-mode=2
gie-unique-id=2
operate-on-class-ids=0
#input-tensor-meta=1
output-blob-names=output_action
# Adjust network-input-dims to match your model's input dimensions
#network-input-dims=2;15;13;2;14;13
parse-classifier-func-name=NvDsParseCustomPoseClassification
custom-lib-path=/data/nvdsinfer_custom_st_gcn/infer_pose_classification_parser/libnvdsinfer_pose_classfication_parser.so
classifier-threshold=0.51

[user-configs]
#actual sequence length of frames
frames-sequence-length=15

Can you help me with this problem? Thank you very much.

Fiona.Chen · July 16, 2024, 7:35am

Your model is not a standard classifier. Please refer to deepstream_tao_apps/apps/tao_others/deepstream-pose-classification at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub for proper settings.

clancydo · July 16, 2024, 8:00am

Thank you very much. I just modify it and now I face this error:

0:00:09.996076057 814161 0xaaab281b4500 INFO                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<sgie> NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:2095> [UID = 2]: deserialized trt engine from :/data/st_gcn_model/st_gcn.onnx_b32_gpu0_fp32.engine
INFO: [FullDims Engine Info]: layers num: 3
0   INPUT  kFLOAT batch_vid       2x15x13         min: 1x2x15x13       opt: 32x2x15x13      Max: 32x2x15x13      
1   INPUT  kFLOAT mot             2x14x13         min: 1x2x14x13       opt: 32x2x14x13      Max: 32x2x14x13      
2   OUTPUT kFLOAT output_action   8               min: 0               opt: 0               Max: 0               

0:00:10.405892658 814161 0xaaab281b4500 INFO                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<sgie> NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2198> [UID = 2]: Use deserialized engine model: /data/st_gcn_model/st_gcn.onnx_b32_gpu0_fp32.engine
0:00:10.413936882 814161 0xaaab281b4500 WARN                 nvinfer gstnvinfer.cpp:679:gst_nvinfer_logger:<sgie> NvDsInferContext[UID 2]: Warning from NvDsInferContextImpl::initNonImageInputLayers() <nvdsinfer_context_impl.cpp:1622> [UID = 2]: More than one input layers but custom initialization function not implemented
0:00:10.413996437 814161 0xaaab281b4500 ERROR                nvinfer gstnvinfer.cpp:676:gst_nvinfer_logger:<sgie> NvDsInferContext[UID 2]: Error in NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1386> [UID = 2]: Failed to initialize non-image input layers
0:00:10.442405376 814161 0xaaab281b4500 WARN                 nvinfer gstnvinfer.cpp:912:gst_nvinfer_start:<sgie> error: Failed to create NvDsInferContext instance
0:00:10.444120849 814161 0xaaab281b4500 WARN                 nvinfer gstnvinfer.cpp:912:gst_nvinfer_start:<sgie> error: Config file path: config_infer_second_st_gcn.txt, NvDsInfer Error: NVDSINFER_CUSTOM_LIB_FAILED

ERROR: gst-resource-error-quark: Failed to create NvDsInferContext instance (1): /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp(912): gst_nvinfer_start (): /GstPipeline:pipeline0/GstNvInfer:sgie:
Config file path: config_infer_second_st_gcn.txt, NvDsInfer Error: NVDSINFER_CUSTOM_LIB_FAILED

How do I create custom function to handle 2 inputs? Can you give me some suggestions? Do you have any example or sample similar/related to this?

Fiona.Chen · July 16, 2024, 8:48am

gst-nvinfer is open source. You need to modify the plugin to accept two input layers.

clancydo · July 17, 2024, 9:57am

Do you have any example or document similar to this?

Fiona.Chen · July 18, 2024, 1:33am

There is no such example and there is graph for the gst-nvinfer source code. DeepStream SDK FAQ - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums

clancydo · July 24, 2024, 4:29am

Hi, @Fiona.Chen. I saw this on website documentation.

Where can I find the objectDetector_FasterRCNN sample because I don’t see it in deepstream-7.0? Thank you very much.

Fiona.Chen · July 24, 2024, 4:44am

The fastRCNN sample is removed since TensorRT 8.x does not support the caffe model.

You can refer to the sample
deepstream_tao_apps/apps/tao_others/deepstream-pose-classification at master · NVIDIA-AI-IOT/deepstream_tao_apps (github.com) for how to implement NvDsInferInitializeInputLayers() interface in /opt/nvidia/deepstream/deepstream/sources/includes/nvdsinfer_custom_impl.h

clancydo · July 24, 2024, 4:54am

Thank you, I will check it.

clancydo · July 25, 2024, 3:39am

Hi @Fiona.Chen, I follow your lastest reply and I can initialize 2 inputs model as SGIE.

According to my knowledge, now the SIGE requires input tensor meta and I have to create a nvdspreprocess before SGIE to form tensors which are fitted to my SGIE. So, my question is how to copy tensors to the buffer in nvdspreprocess because my model has 2 inputs? I saw a sample about modifying nvdspreprocess of deepstream-pose-classification here but I’m still confuse about it.

Can you walk me through this? Thank you very much.

Topic		Replies	Views
Use YOLO Keypoints for Secondary GIE (LSTM Classifier) DeepStream SDK	25	1420	July 30, 2024
Cannot obtain Classifier raw tensor output or classifier meta DeepStream SDK tensorrt , cuda , jetson-inference , gstreamer , jetson , deepstream	20	436	February 14, 2025
Cannot find the objectDetector_FastRCNN example DeepStream SDK deepstream	45	928	October 14, 2024
Integrating action recognition custom model into nvinfer as sgie DeepStream SDK deepstream	6	573	June 25, 2024
Challenges in Implementing PoseClassificationNet in DeepStream-6.2 DeepStream SDK	26	1124	August 21, 2023
Implementation of skeleton-based action recognition model in deepstream DeepStream SDK	3	527	March 30, 2023
Developing and Deploying Your Custom Action Recognition Application Without Any AI Expertise Using NVIDIA TAO and NVIDIA DeepStream Technical Blog	7	1525	March 29, 2023
Using Custom action recognition Model in Deepstream 3D action recognition DeepStream SDK tensorrt , gstreamer , deepstream	38	3200	June 6, 2022
Custom Yolov8n-face and FER Model Integration into Deepstream DeepStream SDK tensorrt , cuda , tensorflow , ubuntu , gstreamer , docker , python , deepstream	65	1204	January 16, 2025
Deepstream python action recognition as sgie - how to get results? DeepStream SDK gstreamer , python , deepstream	9	769	March 25, 2024

How to deploy skeleton-based action recognition model to deepstream?

Related topics