Use YOLO Keypoints for Secondary GIE (LSTM Classifier)

alankardutta · May 26, 2024, 3:44am

• Hardware Platform - RTX 3090
• DeepStream Version - 6.3
• TensorRT Version - 10
• Driver Version: 535.171.04
• CUDA Version: 12.1

Hi, I wanted to use a custom LSTM Pose Classifier model on top of the yolo generated keypoints. I’m currently using this implementation as reference. I want yoloV8 keypoints and apply post processing on them to fit into my classifier model. I tried implementing a custom parser by modifying the nvdsparsepose_Yolo.cpp file, however I could not understand how can i access the Primary GIE’s output (yolo keypoints) and pass them to another Secondary GIE.

The expected output format for the yolo keypoints should be of the shape (1 x 17 x 2) which, after some post processing should be converted to a tensor of shape (1 x42). I need the converted tensor as input to my LSTM Classifier model. If I cannot pass in tensor data as input for secondary GIE, please provide me with some resources to implement this feature into my project.
Thanks.

Fiona.Chen · May 27, 2024, 2:37am

Please refer to deepstream_tao_apps/apps/tao_others/deepstream-pose-classification at master · NVIDIA-AI-IOT/deepstream_tao_apps (github.com).

Please ask the guy who provide the models to you for how to convert the data.

alankardutta · May 27, 2024, 4:24am

Hi, thanks for the reference, I have been looking at this for a while but I was facing an issue of model inputs, with the issue Infer Context default input_layer is not a image[CHW] as the model input format is 1 x 42 [lstm model]. I have the post processing script in python, I need to convert it to deepstream however i need the keypoints and bounding box information as input for the post processing and the LSTM model takes in a vector of the post processed data [1x42 dims] as input, and the result is 1 out of 3 possible classes.

I want to test out the obtained frame metadata as well, without launching a complete pipeline, is there a way to obtain frame metadata and object metadata of the sgie and pgie without setting up a pipeline or just viewing the raw output of the inferred frames?

Also, If i can store the generated keypoints in a variable, can i pass them directly to the sgie without using bounding box information? or performing the post processing to the generated keypoints and then passing it to the sgie without forwarding the pgie metadata? or is there a simpler way to go about it?

Thanks!

Fiona.Chen · May 27, 2024, 5:25am

In https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/tree/master/apps/tao_others/deepstream-pose-classification, the SGIE Pose Classification 's input is NOT image either. Please refer to the sample for how to transfer the non-image data from PGIE to SGIE.

alankardutta · May 27, 2024, 6:15am

Hi, the codebase looks quite intimidatingly large to me and I’m having troubles understanding the workings of each component as I’m new to c/c++ development, could you please point me to the exact function definitions that I need to look as a reference, thanks a lot!

Also there are multiple parsing functions, did they modify any plugins as well to pass in the values?

Fiona.Chen · May 27, 2024, 6:42am

For you need customization for your own model, the basic knowledge for the mechanism of GStreamer and DeepStream SDK APIs are necessary.

For the deepstream_tao_apps/apps/tao_others/deepstream-pose-classification at master · NVIDIA-AI-IOT/deepstream_tao_apps (github.com) sample, there are three models, the PGIE is person detecttion model to identify the persons, the first SGIE is bodypose3d model which can output bodypose 2.5D or 3D key points. The second SGIE model get the bodypose 3D key points sequence as input and output the pose classification result. The bodypose key points coordinates are got and stored in metadata in sgie_src_pad_buffer_probe() function in deepstream_tao_apps/apps/tao_others/deepstream-pose-classification/deepstream_pose_classification_app.cpp at master · NVIDIA-AI-IOT/deepstream_tao_apps (github.com). And the second SGIE is composed by a customized nvdspreprocess plugin and a nvinfer plugin. The key points of the bodypose are got in the customized preprocessing function NvDsParseCustomPoseClassification() in deepstream_tao_apps/apps/tao_others/deepstream-pose-classification/infer_pose_classification_parser/infer_pose_classification_parser.cpp at master · NVIDIA-AI-IOT/deepstream_tao_apps (github.com).

It is hard to explain and discuss the details without the knowledge of DeepStream components.

alankardutta · May 27, 2024, 6:56am

Hi this is very helpful! I’ll ask on the forum if I have any more issues, thanks a lot with all the help!

alankardutta · May 29, 2024, 6:40am

Hi, I checked out the repo and tried some stuff out but i cannot initialize the pipeline correctly and i’m getting wrong input dims error still, also the pgie instance is not being set at all when i try running my code

CONFIG_INFER: config_infer_primary_yoloV8_pose.txt
CONFIG_INFER_SGIE: config_infer_secondary_lstm.txt
STREAMMUX_BATCH_SIZE: 1
STREAMMUX_WIDTH: 1920
STREAMMUX_HEIGHT: 1080
GPU_ID: 0
PERF_MEASUREMENT_INTERVAL_SEC: 5
JETSON: FALSE

WARNING: ../nvdsinfer/nvdsinfer_model_builder.cpp:1487 Deserialize engine failed because file path: /opt/nvidia/deepstream/deepstream-6.3/sources/apps/sample_apps/DeepStream-Yolo-Pose/lstm_model_b1_gpu0_fp16.engine open error
0:00:02.009103178 3583489 0x560a1e28cac0 WARN                 nvinfer gstnvinfer.cpp:679:gst_nvinfer_logger:<secondary-nvinfer> NvDsInferContext[UID 2]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1976> [UID = 2]: deserialize engine from file :/opt/nvidia/deepstream/deepstream-6.3/sources/apps/sample_apps/DeepStream-Yolo-Pose/lstm_model_b1_gpu0_fp16.engine failed
0:00:02.098353434 3583489 0x560a1e28cac0 WARN                 nvinfer gstnvinfer.cpp:679:gst_nvinfer_logger:<secondary-nvinfer> NvDsInferContext[UID 2]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2081> [UID = 2]: deserialize backend context from engine from file :/opt/nvidia/deepstream/deepstream-6.3/sources/apps/sample_apps/DeepStream-Yolo-Pose/lstm_model_b1_gpu0_fp16.engine failed, try rebuild
0:00:02.098375576 3583489 0x560a1e28cac0 INFO                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<secondary-nvinfer> NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:2002> [UID = 2]: Trying to create engine from model files
WARNING: [TRT]: onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
WARNING: [TRT]: TensorRT encountered issues when converting weights between types and that could affect accuracy.
WARNING: [TRT]: If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
WARNING: [TRT]: Check verbose logs for the list of affected weights.
WARNING: [TRT]: - 8 weights are affected by this issue: Detected subnormal FP16 values.
WARNING: [TRT]: - 9 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
0:00:04.629407449 3583489 0x560a1e28cac0 INFO                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<secondary-nvinfer> NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:2034> [UID = 2]: serialize cuda engine to file: /opt/nvidia/deepstream/deepstream-6.3/sources/apps/sample_apps/DeepStream-Yolo-Pose/lstm_model.onnx_b1_gpu0_fp16.engine successfully
WARNING: [TRT]: The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:610 [Implicit Engine Info]: layers num: 2
0   INPUT  kFLOAT input           1x42            
1   OUTPUT kFLOAT output          3               

0:00:04.729503108 3583489 0x560a1e28cac0 ERROR                nvinfer gstnvinfer.cpp:676:gst_nvinfer_logger:<secondary-nvinfer> NvDsInferContext[UID 2]: Error in NvDsInferContextImpl::initInferenceInfo() <nvdsinfer_context_impl.cpp:1124> [UID = 2]: Infer Context default input_layer is not a image[CHW]
ERROR: nvdsinfer_context_impl.cpp:1286 Infer context initialize inference info failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:00:04.730927414 3583489 0x560a1e28cac0 WARN                 nvinfer gstnvinfer.cpp:898:gst_nvinfer_start:<secondary-nvinfer> error: Failed to create NvDsInferContext instance
0:00:04.731585374 3583489 0x560a1e28cac0 WARN                 nvinfer gstnvinfer.cpp:898:gst_nvinfer_start:<secondary-nvinfer> error: Config file path: config_infer_secondary_lstm.txt, NvDsInfer Error: NVDSINFER_TENSORRT_ERROR
WARNING: ../nvdsinfer/nvdsinfer_model_builder.cpp:1487 Deserialize engine failed because file path: /opt/nvidia/deepstream/deepstream-6.3/sources/apps/sample_apps/DeepStream-Yolo-Pose/lstm_model_b1_gpu0_fp16.engine open error
0:00:06.319810908 3583489 0x560a1e28cac0 WARN                 nvinfer gstnvinfer.cpp:679:gst_nvinfer_logger:<secondary-nvinfer> NvDsInferContext[UID 2]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1976> [UID = 2]: deserialize engine from file :/opt/nvidia/deepstream/deepstream-6.3/sources/apps/sample_apps/DeepStream-Yolo-Pose/lstm_model_b1_gpu0_fp16.engine failed
0:00:06.411203001 3583489 0x560a1e28cac0 WARN                 nvinfer gstnvinfer.cpp:679:gst_nvinfer_logger:<secondary-nvinfer> NvDsInferContext[UID 2]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2081> [UID = 2]: deserialize backend context from engine from file :/opt/nvidia/deepstream/deepstream-6.3/sources/apps/sample_apps/DeepStream-Yolo-Pose/lstm_model_b1_gpu0_fp16.engine failed, try rebuild
0:00:06.411227197 3583489 0x560a1e28cac0 INFO                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<secondary-nvinfer> NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:2002> [UID = 2]: Trying to create engine from model files
WARNING: [TRT]: TensorRT encountered issues when converting weights between types and that could affect accuracy.
WARNING: [TRT]: If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
WARNING: [TRT]: Check verbose logs for the list of affected weights.
WARNING: [TRT]: - 8 weights are affected by this issue: Detected subnormal FP16 values.
WARNING: [TRT]: - 9 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
0:00:08.862598014 3583489 0x560a1e28cac0 INFO                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<secondary-nvinfer> NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:2034> [UID = 2]: serialize cuda engine to file: /opt/nvidia/deepstream/deepstream-6.3/sources/apps/sample_apps/DeepStream-Yolo-Pose/lstm_model.onnx_b1_gpu0_fp16.engine successfully
WARNING: [TRT]: The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:610 [Implicit Engine Info]: layers num: 2
0   INPUT  kFLOAT input           1x42            
1   OUTPUT kFLOAT output          3               

0:00:09.059504321 3583489 0x560a1e28cac0 ERROR                nvinfer gstnvinfer.cpp:676:gst_nvinfer_logger:<secondary-nvinfer> NvDsInferContext[UID 2]: Error in NvDsInferContextImpl::initInferenceInfo() <nvdsinfer_context_impl.cpp:1124> [UID = 2]: Infer Context default input_layer is not a image[CHW]
ERROR: nvdsinfer_context_impl.cpp:1286 Infer context initialize inference info failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:00:09.060735130 3583489 0x560a1e28cac0 WARN                 nvinfer gstnvinfer.cpp:898:gst_nvinfer_start:<secondary-nvinfer> error: Failed to create NvDsInferContext instance
0:00:09.060743606 3583489 0x560a1e28cac0 WARN                 nvinfer gstnvinfer.cpp:898:gst_nvinfer_start:<secondary-nvinfer> error: Config file path: config_infer_secondary_lstm.txt, NvDsInfer Error: NVDSINFER_TENSORRT_ERROR
ERROR: Failed to set pipeline to playing

as you can see, only sgie is being set up and pgie is not initialized for some reason, i can parse the pose data from pgie but how can i attach the pose data to tensor meta data or forward it via obj_meta to the sgie, please let me know if you have any ideas!

Fiona.Chen · May 29, 2024, 7:35am

Your SGIE configuration is wrong. The deepstream_tao_apps/apps/tao_others/deepstream-pose-classification at master · NVIDIA-AI-IOT/deepstream_tao_apps (github.com) sample has told you to use “nvdspreprocess+nvinfer” as the SGIE for such case. Please make sure you have understood the sample first.

You must know what the input of your SGIE model should be and customize the nvdspreprocess library to generate such input tensor data.

alankardutta · May 29, 2024, 7:11pm

Hi, my bad, I need to customize the nvdspreprocess, any ideas on how i can attach my keypoints data and forward it to the nvdspreprocessor, also i can apply all transformations with the preprocess steps here right? and finally sgie takes in the processed tensors and does inferencing is that correct? I am still clueless as to how they got to attach the metadata of keypoints to the preprocessor, if you could clear up the confusion by explaining how the preprocessor functions work? I couldn’t really understand it clearly from the codebase and documentation, sorry for any inconveniences and thanks!

This is how I’m currently getting my keypoints as joints:

I want to attach this joint_info data to nvdspreprocess and then apply my transformations to tensor over there and then infer the tensor data with sgie, please let me know if you need any more details to get a clearer idea.

Fiona.Chen · May 30, 2024, 2:09am

Yes. the nvinfer takes the tensors from nvdspreprocess and do the inferencing.

The nvdspreprocess plugin is open source and the nvinfer plugin is open source.
The source code is in /opt/nvidia/deepstream/deepstream/sources/gst-plugins/gst-nvdspreprocess and /opt/nvidia/deepstream/deepstream/sources/gst-plugins/gst-nvinfer. And before you start with DeepStream, you must be familiar with GStreamer knowledge and basic coding skills GStreamer: open source multimedia framework.

Suppose you have read and understood all the above documents and codes, there are three models in our pose classification sample app. The PGIE model is a detector for person, the first SGIE model is the bodypose3D model which get the body key points from the person, the second SGIE model is the pose classification model which inference the pose type from the sequenced body key points.

The workflow for key points is that the “parse_25dpose_from_tensor_meta()” is evoked in the bodypose3D SGIE src pad probe function deepstream_tao_apps/apps/tao_others/deepstream-pose-classification/deepstream_pose_classification_app.cpp at master · NVIDIA-AI-IOT/deepstream_tao_apps (github.com) to do the postprocess of the first SGIE, in parse_25dpose_from_tensor_meta(), the customized NvDsJoints user meta is attached to the object meta(the object is person). In the nvdspreprocess customized library for the second SGIE deepstream_tao_apps/apps/tao_others/deepstream-pose-classification/nvdspreprocess_lib/nvdspreprocess_lib.cpp at master · NVIDIA-AI-IOT/deepstream_tao_apps (github.com), the customized NvDsJoints user meta is got from the object meta and the key points coordinates are calculated and recombined to generate the “3 X 300 X 34 X 1 (C T V M)” input tensor needed by Pose Classification | NVIDIA NGC.

It is impossible to explain the code line by line to you, please spend some time to study DeepStream document and the sample code. The codes are the best explanation. We can’t explain more.

alankardutta · June 3, 2024, 5:53pm

Hi, Thank you so much for your time and efforts with this issue, I’ve gone through the sample apps and created my preprocessing script, I just need to set up the final pipeline and display elements. Can you just let me know how to set up the pipeline if I’m obtaining keypoints as mask params from object metadata and my preprocessing script is also using those mask params and them applying custom functions and attaching it to a memory buffer on the user metadata as per deepstream_pose_classification_app. If my current pipeline is streammux → pgie → tracker → preprocess1 → sgie , how can I access the metadata from preprocess1’s output, would it be available in the tensor metadata? Please let me know if my current pipeline setup will be working for my use case of displaying the detected class of pose as output on the frame.

Fiona.Chen · June 4, 2024, 1:41am

Yes. The preprocess prepare the input tensor meta for nvinfer.

You can use the same pipeline as deepstream_pose_classification_app

alankardutta · June 5, 2024, 10:05am

Hi, Thank you for the heads up, I implemented the pipeline however I cannot set up the pgie instance with the deepstream-pose-classification, I managed to run the sgie and preprocessing scripts i think.

This is the error I am getting currently, as you can clearly see, my pgie instance is not created as only [UID 2] is being started.
This is my pgie config file :

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
onnx-file=../models/yolov8s-pose.onnx
model-engine-file=../models/yolov8s-pose.onnx_b1_gpu0_fp32.engine
#int8-calib-file=calib.table
labelfile-path=labels/labels.txt
batch-size=1
network-mode=0
num-detected-classes=1
interval=0
gie-unique-id=1
process-mode=1
network-type=3
cluster-mode=4
maintain-aspect-ratio=1
symmetric-padding=1
#workspace-size=2000
parse-bbox-instance-mask-func-name=NvDsInferParseYoloPose
custom-lib-path=../parsers/libnvdsinfer_custom_impl_Yolo_pose.so
output-instance-mask=1

[class-attrs-all]
pre-cluster-threshold=0.25
topk=300

this is the app_config.yml :

source-list:
   list: file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_walk.mov

streammux:
  width: 1280
  height: 720
  batched-push-timeout: 40000

tracker:
  enable: 1
  ll-lib-file: /opt/nvidia/deepstream/deepstream-6.3/lib/libnvds_nvmultiobjecttracker.so
  ll-config-file: /opt/nvidia/deepstream/deepstream-6.3/samples/configs/deepstream-app/config_tracker_NvDCF_accuracy.yml

primary-gie:
  plugin-type: 0
  config-file-path: yolo_pose_config.txt
  #config-file-path: ../triton/peoplenet_tao/config_infer_primary_peoplenet.yml
  #config-file-path: ../triton-grpc/peoplenet_tao/config_infer_primary_peoplenet.yml

secondary-preprocess1:
  config-file-path: preprocess.txt

secondary-gie1:
  plugin-type: 0
  config-file-path: lstm_config.txt
  #config-file-path: ../triton/bodypose_classification_tao/config_infer_third_bodypose_classification.yml
  #config-file-path: ../triton-grpc/bodypose_classification_tao/config_infer_third_bodypose_classification.yml

sink:
  #0 fakesink 
  #1 filesink generate the out.mp4 file in the current directory
  #2 rtspsink publish at rtsp://localhost:8554/ds-test
  #3 displaysink
  sink-type: 1
  #encoder type 0=Hardware 1=Software
  enc-type: 0

Please let me know if you need any more details, as I cannot seem to initiate the pgie instance at all, even though I am using the same initialization steps as in deepstream-pose-classification. Thanks!

Fiona.Chen · June 5, 2024, 10:23am

The pipeline met tracker error when start up. Can you follow the instruction of using tracker? Gst-nvtracker — DeepStream documentation 6.4 documentation

alankardutta · June 5, 2024, 4:26pm

Hi, I managed to fix the tracker issue by changing its input parameters, I’m getting a running pipeline however I cannot seem to understand how can I get the output class of the sgie predictions to be displayed on screen.

Does this successful model generation imply both my models are working? if so how can i attach the output class of the sgie to the final output video or view the output class, I have already set up the pose classifier parser as per deepstream-pose-classification app’s classifier function. Please suggest a method to display my generated output class on the video output, Thanks!

Fiona.Chen · June 6, 2024, 1:47am

In our sample deepstream-pose-classification app’s classifier function, the pose classification result is attached to the batch meta with this customized postprocessing function deepstream_tao_apps/apps/tao_others/deepstream-pose-classification/infer_pose_classification_parser/infer_pose_classification_parser.cpp at master · NVIDIA-AI-IOT/deepstream_tao_apps (github.com)

The classification result will be put in NVIDIA DeepStream SDK API Reference: _NvDsClassifierMeta Struct Reference | NVIDIA Docs structure inside gst-nvinfer.

Your LSTM model output layer dimension is “3”, seems you need to customize your own postprocessing function.

alankardutta · June 6, 2024, 7:26am

I’m using the updated softmax parsing function from /opt/nvidia/deepstream/deepstream-6.3/sources/libs/nvdsinfer_customparser/libnvds_infercustomparser.so and I am supposed to obtain a probabilities softmax output of 3 dimensions, but I cannot seem to display it or check if my sgie is even working correctly or not, the main issue with deepstream I found was related to individual component testing, I cannot seem to debug anything until its all connected in the pipeline and individual components cannot be tested or maybe I cannot seem to find a way to test it. Please suggest a way to atleast check the output of my sgie model and determine a proper parsing function for the same. I am using classifier meta data as you mentioned earlier but I don’t get a response at all even if I’m printing something in the pipeline, also tried attaching sgie pad buffer probe, but it seemed to do nothing either. Please let me know if I’m missing something or atleast let me be sure that the sgie is configured properly. Thanks!

Fiona.Chen · June 6, 2024, 7:51am

The source code is there. You can debug with the source code. It is just a software, you can debug with any debugging methods with the background knowledge of your models.

Fiona.Chen · June 6, 2024, 7:55am

Is this a company’s project? If it is, can you tell us about your company and the project?

Topic		Replies	Views
Could not find output coverage layer for parsing objects DeepStream SDK	31	1021	May 16, 2024
How to deploy skeleton-based action recognition model to deepstream? DeepStream SDK	49	259	August 27, 2024
Challenges in Implementing PoseClassificationNet in DeepStream-6.2 DeepStream SDK	27	769	August 21, 2023
Using custom model in deepstream DeepStream SDK jetson-inference , python , deepstream	40	52	September 10, 2024
Depth estimation with deepstream DeepStream SDK	7	338	June 14, 2024
[Secondary GIE] Custom Classifier in sgie outputs only random entry in label.txt DeepStream SDK	30	2588	September 4, 2021
Issue with Custom Sequence Preprocess library in combination with SGIE DeepStream SDK deepstream	13	91	October 15, 2024
NVIDIA-AI-IOT / deepstream_lpr_app is not working when only using LPD and LPR model DeepStream SDK	23	669	June 8, 2023
How to run BodyPose3D and POSECLASSIFICATION on one pipeline DeepStream SDK korean	10	1069	March 21, 2023
Unable to draw bounding boxes in deepstream using a multi-task classifier model DeepStream SDK	12	1036	December 19, 2022

Use YOLO Keypoints for Secondary GIE (LSTM Classifier)

Related topics