Deepstream Pose Estimation is not able to predict joints for individual person

aniket.manoj · June 3, 2021, 7:14am

Description

I was trying Deepstream Pose Estimation which uses tfpose model. (GitHub - NVIDIA-AI-IOT/deepstream_pose_estimation: This is a sample DeepStream application to demonstrate a human pose estimation pipeline.).

I am getting a very good results for single and dual person in a frame but as soon as the 3rd person comes in the frame, it mess up the 2nd person skeleton. The issue seems to be while assigning the joints to the individuals. Its not able to determine which joint belongs to which person. In the images attached below, you can see that for 2 person it works well but as soon as the 3rd person comes into the picture, the joints of the 2nd person scatter across other persons in the frame.

Q1 . Is there a way to get the joins properly on the individual in case of more than 2 people?

Environment

TensorRT Version : 7.0.0.11
GPU Type : GTX 1650
Nvidia Driver Version : 460.73.01
CUDA Version : 10.2
CUDNN Version : -
Operating System + Version : Ubuntu 18.04
Python Version (if applicable) : 3.6.9
TensorFlow Version (if applicable) : -
PyTorch Version (if applicable) : -
Baremetal or Container (if container which image + tag) : Deepstream 5.0/5.1 container

Relevant Files

Reference Results:

Steps To Reproduce

Follow the steps from here to reproduce the result.

mchi · June 4, 2021, 1:58am

will check and get back to you

kn1ght · June 4, 2021, 8:34am

Hi @aniket.manoj. You can try using PeopleNet or any other object detector with a “person” class as PGIE and use the pose estimation model as SGIE, i.e, perform inference on the crops of the person detection.

aniket.manoj · June 4, 2021, 8:55am

Hi @kn1ght. I can try any person detector to crop the person and pass each person as input the the pose estimation model. But It will be again single person pose estimation correct?

kn1ght · June 4, 2021, 9:03am

That’s correct. But it will circumvent your issue of not being able to get results for all people in a frame. Your problem will be more significant with multiple people in the frame.

aniket.manoj · June 4, 2021, 9:14am

That’s correct. But adding second model will take a toll on the FPS. Also if there is a scenario where 2 or more person overlaps, then the person detector model might crop a frame with multiple faces or body parts. In this case there is a high chance that the model might mix the joints of multiple person.

kn1ght · June 4, 2021, 9:26am

When people overlap, you will have a problem irrespective of using a person detector or not. :)
It’s pretty much a trade-off between how accurate you want to be and how fast you want to run. In my experiment, I have seen significantly better accuracy when clubbed with a detector.

aniket.manoj · June 4, 2021, 9:42am

cool! I will try adding a detector once and will let you know how things goes. Thanks

aniket.manoj · June 7, 2021, 6:24am

@kn1ght, I can see that the model works on multi-person, Only issue i see is while mapping the joints. I would like to know, how the model is deciding that which joint belongs to which person. May be that will help me figure out the way to organise the skeleton for multi-person.

mchi · June 7, 2021, 7:56am

Hi @aniket.manoj ,
Is it possible to share the test video ?

aniket.manoj · June 7, 2021, 10:16am

Hi @mchi ,

Please get the test video from here.

mchi · June 7, 2021, 1:49pm

Sorry! I mean the source file so that I can test with the same video on my side.

Thanks!

aniket.manoj · June 7, 2021, 2:21pm

Sorry! Here is the source video.

aniket.manoj · June 9, 2021, 4:20am

Hello @mchi,

Were you able to find any solution? How did it go at your end?

aniket.manoj · June 14, 2021, 5:32am

hello @mchi,

Did you find any solution?

mchi · June 15, 2021, 8:44am

yes, I can reproduce the issue and is still checking it

aniket.manoj · June 25, 2021, 9:36am

I saw this post on pose estimation on tlt: https://ngc.nvidia.com/catalog/models/nvidia:tlt_bodyposenet.

Can you help me out with the inference script

mchi · June 25, 2021, 9:46am

We are still checking this issue, it may be caused by the model.

you can use this deepstream_tao_apps/configs at release/tlt3.0 · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub for the TLT bodypose2d

jaybdub · June 25, 2021, 8:36pm

Hi All,

Regarding the incorrect detection for more than two people, I believe it may have to do with the max_num_parts=2. This appears in this line of code

While I haven’t yet tested with the DeepStream implementation, I would suspect that this is limiting the number of parts (ie: noses) to 2 per image. This seems to agree with the results posted above.

Testing on the original project trt_pose, max_num_parts is set to 100.

With this value, I do not see this limitation.

Are you able to test with a larger max_num_parts in the DeepStream implementation to see if this issue persists?

Please let me know if this helps or you run into any issues.

Best,
John

mchi · June 26, 2021, 1:27am

Thanks @jaybdub !

Confirmed the change as below with @aniket.manoj’s source video , the issue is gone, and deepstream_pose_estimation works very well.

diff --git a/deepstream_pose_estimation_app.cpp b/deepstream_pose_estimation_app.cpp
index c368ba8..d417cc4 100644
--- a/deepstream_pose_estimation_app.cpp
+++ b/deepstream_pose_estimation_app.cpp
@@ -51,7 +51,7 @@ parse_objects_from_tensor_meta(NvDsInferTensorMeta *tensor_meta)

   float threshold = 0.1;
   int window_size = 5;
-  int max_num_parts = 2;
+  int max_num_parts = 100;
   int num_integral_samples = 7;
   float link_threshold = 0.1;
   int max_num_objects = 100;