I am getting a very good results for single and dual person in a frame but as soon as the 3rd person comes in the frame, it mess up the 2nd person skeleton. The issue seems to be while assigning the joints to the individuals. Its not able to determine which joint belongs to which person. In the images attached below, you can see that for 2 person it works well but as soon as the 3rd person comes into the picture, the joints of the 2nd person scatter across other persons in the frame.
Q1 . Is there a way to get the joins properly on the individual in case of more than 2 people?
Environment
TensorRT Version : 7.0.0.11 GPU Type : GTX 1650 Nvidia Driver Version : 460.73.01 CUDA Version : 10.2 CUDNN Version : - Operating System + Version : Ubuntu 18.04 Python Version (if applicable) : 3.6.9 TensorFlow Version (if applicable) : - PyTorch Version (if applicable) : - Baremetal or Container (if container which image + tag) : Deepstream 5.0/5.1 container
Hi @aniket.manoj. You can try using PeopleNet or any other object detector with a “person” class as PGIE and use the pose estimation model as SGIE, i.e, perform inference on the crops of the person detection.
Hi @kn1ght. I can try any person detector to crop the person and pass each person as input the the pose estimation model. But It will be again single person pose estimation correct?
That’s correct. But it will circumvent your issue of not being able to get results for all people in a frame. Your problem will be more significant with multiple people in the frame.
That’s correct. But adding second model will take a toll on the FPS. Also if there is a scenario where 2 or more person overlaps, then the person detector model might crop a frame with multiple faces or body parts. In this case there is a high chance that the model might mix the joints of multiple person.
When people overlap, you will have a problem irrespective of using a person detector or not. :)
It’s pretty much a trade-off between how accurate you want to be and how fast you want to run. In my experiment, I have seen significantly better accuracy when clubbed with a detector.
@kn1ght, I can see that the model works on multi-person, Only issue i see is while mapping the joints. I would like to know, how the model is deciding that which joint belongs to which person. May be that will help me figure out the way to organise the skeleton for multi-person.
Regarding the incorrect detection for more than two people, I believe it may have to do with the max_num_parts=2. This appears in this line of code
While I haven’t yet tested with the DeepStream implementation, I would suspect that this is limiting the number of parts (ie: noses) to 2 per image. This seems to agree with the results posted above.
Testing on the original project trt_pose, max_num_parts is set to 100.