Human Pose detection model - Isses with converted model output in Deepstream

Platform: Jetson Xavier NX - JetPack 4.6

I’m developing an application which depends on Human body pose estimation deep learning models. I’m looking for an accurate and lightweight model that I can deploy on an edge computing device such as the Jetson Xavier.

Having tried to run the model in Python/TensorRT, it was advised that I try to deploy the model using Deepstream to improve performance. For more information please refer to: Post.

Model

The model in question is Movenet which can be downloaded from the TensorFlowHub site.
To build the TensorRT engine required by the NvInfer plugin in Deepstream, I converted the TF model into TensorRT taking the following steps:

1. tf → onnx (x86)

With the onnx/tensorflow-onnx conversion tools,

python3 -m tf2onnx.convert --saved-model ./movenet_singlepose_lightning_4/ --output mnli_nchw.onnx --inputs-as-nchw input

The order of the layers must be NCHW for Deepstream therefore I set the argument input-as-nchw for the input of the network.

2. onnx → TensorRT (Xavier)

/usr/src/tensorrt/bin/trtexec --onnx=mnli_nchw.onnx --saveEngine=mnli_nchw.engine --verbose

Deepstream - Python Bindings

I decided to reference apps/deepstream-test1 from deepstream_python_apps repo. In the modified script input is taken as video in the .H264 format and inference is performeed in each of the frames. A sink probe is used to access the meta data and tensor information given by the Gst-nvinfer and used to draw the output on the screen using the Gst-nvdsosd plugin. Please refer to the file:
ds_movenet_pipeline.py (10.9 KB)

The configuration for the inference engine is given by:
ds_pgie_config.txt (2.6 KB)

The output tesor has the dimension [1x17x3] with the first 2 channels of the last dimension being the yx coordinates of the body landmarks (Nose, Left eye, Right Eye… )and the third dimension of the last channel representing the prediction confidence scores. Ref

Note that In the script I split the output of the tensor into 2 arrays, a [17,2] shaped array with the coordinates (xy) and a [17,1] array with the score information.

However the output of the network , as shown below, is wrong.

Screenshot 2022-07-22 at 11.51.15 AM

I was wondering whether someone could help me debug the application so that I can run this model on the Jetson Xavier.

Thanks a lot!

Hi @joaquinsd10 , could you confirm that the model can correctly output point information? Is the video resolution you setted(640x480) to the streammux right?
Also, you may refer the link below. We have an example for pose estimation witch wrote in C++.
https://github.com/NVIDIA-AI-IOT/deepstream_pose_estimation

The output of the TensorFlow model when I run the inference in a Jupyter Notebook on my desktop is the following.

running_inf

I set the input resolution to (640x480) and I scale output coordinates accordingly.

I’d like to use the Movenet Model as I’ve gotten good results for my application, I also plan to run the model in multiple platforms in the future so I’d like to keep the models the same if possible.

Hi @joaquinsd10 , It is not about changing your model that I attached our pose estimation demo link. You can just compare the config file with it. Such as network-mode: 0: FP32 1: INT8 2: FP16 network-type:0: Detector 1: Classifier 2: Segmentation . You should set the right parameters.
Also, What’s your deepstream version? We suggest you use the 6.1.0 version to test it.
You can refer the link below about how to set the config file and how to convert the coordinate.
https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/tree/master/apps/tao_others/deepstream-bodypose2d-app

Hello @joaquinsd10 , still waiting for your update, thank you.

Currently I’m using DS 6.0. I’ll upgrade to 6.1 to see if there is any difference in the results.

I’ve tried different settings for the NVINFER config file.

network-type: I’m able to run the program when this field is either set to 100 (Others) or 1 (Classifier). For any other network-type the program return an error message or crashes.

network-mode: I get varying results as shown below:

network-mode:0

image

network-mode:1

image

network-mode:2

image

Comparing against deepstream-bodypose2d-app that you linked the configuration files are quite similar already.

Could it be any issues with layer support? Any layers that are not supported by DS?

We prefer waiting the result on DS6.1, so let’s keep the topic open and waiting for your result on DS6.1, thanks.

I tried running the pipeline using the deepstream:6.1-devel container and installting the DS python bindings .After running the python script I get the following output:


As you can see the pose landmarks for the Tai Chi demo video are also wrong, just like the previous example (with the person running)

There is no update from you for a period, assuming this is not an issue anymore.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Cause it runs well in a Jupyter Notebook on your desktop. Could you help to compare the coordinate generated between your own inference and deepstream with the same source stream? Also you should set the same scaling for the test.

Hi YuWeiw,

I’m really sorry for the late reply.

Jupyter Notebook (Colab) output

array([[[[0.20632008, 0.5264436 , 0.5500399 ],
         [0.19438183, 0.5423147 , 0.5101172 ],
         [0.19940856, 0.525432  , 0.5194653 ],
         [0.20174555, 0.5762809 , 0.6259115 ],
         [0.2074162 , 0.5403847 , 0.36001524],
         [0.24626063, 0.61766636, 0.5737878 ],
         [0.28041622, 0.56218743, 0.43795484],
         [0.28720313, 0.70531404, 0.41041154],
         [0.3735156 , 0.5596073 , 0.25380737],
         [0.38415706, 0.6840706 , 0.5852949 ],
         [0.38440403, 0.4962165 , 0.5360341 ],
         [0.4712838 , 0.6524867 , 0.53902715],
         [0.47330508, 0.59636825, 0.5243149 ],
         [0.6045654 , 0.74344385, 0.52052164],
         [0.5830768 , 0.49873745, 0.4820709 ],
         [0.7081698 , 0.8761561 , 0.538491  ],
         [0.7466623 , 0.5525544 , 0.5850646 ]]]], dtype=float32)

Deepstream 6.1 Output

array([[[0.995168  , 0.6820843 , 0.13510899],
        [0.991431  , 0.692891  , 0.10633575],
        [0.99171036, 0.6779563 , 0.16247985],
        [0.9736485 , 0.6928706 , 0.10316519],
        [0.97919816, 0.6698861 , 0.25518477],
        [0.99069613, 0.6899451 , 0.1052241 ],
        [0.9894345 , 0.6395569 , 0.08138786],
        [0.99406284, 0.7011148 , 0.03583372],
        [0.48855978, 0.2775157 , 0.00691245],
        [0.6344738 , 0.6794286 , 0.00502333],
        [0.9739734 , 0.68779314, 0.05142531],
        [0.44666773, 0.63642704, 0.00864163],
        [0.4438305 , 0.41869292, 0.01471352],
        [0.54173696, 0.6425294 , 0.01174647],
        [0.5202968 , 0.39811012, 0.03536683],
        [0.8326208 , 0.5646678 , 0.01073348],
        [0.75177586, 0.42859802, 0.00596068]]])

The corresponding outputs are generated using the same source image of the runner.

Can you confirm that these two arrays are the output of the same video frame?

Yes the arrays are the output of the same video frame

IMG_6051

Additionally I do not set any scaling parameter in neither of the inference engines.

There is no update from you for a period, assuming this is not an issue anymore.
Hence we are closing this topic. If need further support, please open a new one.
Thanks


This is the result from our demo code deepstream-bodypose2d-app and model that I attached.So the problem should still be on your model or your code. You can refer our demo to debug the reason.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.