Deepstream Pose Estimation is not able to predict joints for individual person

Description

I was trying Deepstream Pose Estimation which uses tfpose model. (GitHub - NVIDIA-AI-IOT/deepstream_pose_estimation: This is a sample DeepStream application to demonstrate a human pose estimation pipeline.).

I am getting a very good results for single and dual person in a frame but as soon as the 3rd person comes in the frame, it mess up the 2nd person skeleton. The issue seems to be while assigning the joints to the individuals. Its not able to determine which joint belongs to which person. In the images attached below, you can see that for 2 person it works well but as soon as the 3rd person comes into the picture, the joints of the 2nd person scatter across other persons in the frame.

Q1 . Is there a way to get the joins properly on the individual in case of more than 2 people?

Environment

TensorRT Version : 7.0.0.11
GPU Type : GTX 1650
Nvidia Driver Version : 460.73.01
CUDA Version : 10.2
CUDNN Version : -
Operating System + Version : Ubuntu 18.04
Python Version (if applicable) : 3.6.9
TensorFlow Version (if applicable) : -
PyTorch Version (if applicable) : -
Baremetal or Container (if container which image + tag) : Deepstream 5.0/5.1 container

Relevant Files

Reference Results:

Steps To Reproduce

Follow the steps from here to reproduce the result.

will check and get back to you

Hi @aniket.manoj. You can try using PeopleNet or any other object detector with a “person” class as PGIE and use the pose estimation model as SGIE, i.e, perform inference on the crops of the person detection.

Hi @kn1ght. I can try any person detector to crop the person and pass each person as input the the pose estimation model. But It will be again single person pose estimation correct?

That’s correct. But it will circumvent your issue of not being able to get results for all people in a frame. Your problem will be more significant with multiple people in the frame.

That’s correct. But adding second model will take a toll on the FPS. Also if there is a scenario where 2 or more person overlaps, then the person detector model might crop a frame with multiple faces or body parts. In this case there is a high chance that the model might mix the joints of multiple person.

When people overlap, you will have a problem irrespective of using a person detector or not. :)
It’s pretty much a trade-off between how accurate you want to be and how fast you want to run. In my experiment, I have seen significantly better accuracy when clubbed with a detector.

cool! I will try adding a detector once and will let you know how things goes. Thanks

@kn1ght, I can see that the model works on multi-person, Only issue i see is while mapping the joints. I would like to know, how the model is deciding that which joint belongs to which person. May be that will help me figure out the way to organise the skeleton for multi-person.

Hi @aniket.manoj ,
Is it possible to share the test video ?

Hi @mchi ,

Please get the test video from here.

Sorry! I mean the source file so that I can test with the same video on my side.

Thanks!

Sorry! Here is the source video.

Hello @mchi,

Were you able to find any solution? How did it go at your end?

hello @mchi,

Did you find any solution?

yes, I can reproduce the issue and is still checking it

I saw this post on pose estimation on tlt: NVIDIA NGC.

Can you help me out with the inference script

We are still checking this issue, it may be caused by the model.

you can use this deepstream_tlt_apps/configs at release/tlt3.0 · NVIDIA-AI-IOT/deepstream_tlt_apps · GitHub for the TLT bodypose2d

Hi All,

Regarding the incorrect detection for more than two people, I believe it may have to do with the max_num_parts=2. This appears in this line of code

While I haven’t yet tested with the DeepStream implementation, I would suspect that this is limiting the number of parts (ie: noses) to 2 per image. This seems to agree with the results posted above.

Testing on the original project trt_pose, max_num_parts is set to 100.

With this value, I do not see this limitation.

Are you able to test with a larger max_num_parts in the DeepStream implementation to see if this issue persists?

Please let me know if this helps or you run into any issues.

Best,
John

1 Like

Thanks @jaybdub !

Confirmed the change as below with @aniket.manoj’s source video , the issue is gone, and deepstream_pose_estimation works very well.

diff --git a/deepstream_pose_estimation_app.cpp b/deepstream_pose_estimation_app.cpp
index c368ba8..d417cc4 100644
--- a/deepstream_pose_estimation_app.cpp
+++ b/deepstream_pose_estimation_app.cpp
@@ -51,7 +51,7 @@ parse_objects_from_tensor_meta(NvDsInferTensorMeta *tensor_meta)

   float threshold = 0.1;
   int window_size = 5;
-  int max_num_parts = 2;
+  int max_num_parts = 100;
   int num_integral_samples = 7;
   float link_threshold = 0.1;
   int max_num_objects = 100;