Suggestions on improving Nvidia 2D Body Pose Estimation Model

Hello to all.

Our team from Darwin Edge has been evaluating the Nvidia 2D Body Pose Estimation model as part of our product development.

It was recently released as part of the TLT 3.0 (Now TAO)

Our initial bench marking results (published in Towards Data Science - article) seem to suggest that the model we were using before: OpenPifPaf - seems to provide better performance than the Nvidia Body Pose Net model.

We would really appreciate all the recommendations and help on how to improve the model. We are quite keen to use it on Nvidia NX devices.

• Hardware (Nvidia NX)
• Network Type (Body Pose Net)
• TLT Version (v3.0-py3)
bpnet_train_m1_coco_training.yaml (2.8 KB)

It is not an apple-to-apple comparison. And also the result mentioned in Hands-on: Optimizing and benchmarking Body Pose Estimation models | by Debmalya Biswas | Sep, 2021 | Towards Data Science does not match the result which is posted in Training and Optimizing a 2D Pose Estimation Model with NVIDIA TAO Toolkit, Part 2 | NVIDIA Developer Blog .
In the nv blog, it mentioned that “We use a default size 288×384 in this post.”
Its result for the pruned model is:

After retraining the pruned model with pth 0.2, you can observe an accuracy of 57.5% AP with multiscale inference. Here are the metrics on COCO validation set:

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.575
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.789
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.621
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.563
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.603

And also, it said that,

You can expect to see a 7-10% AP increase in the area=medium category when going from 224×320 to 288×384 and an additional 7-10% AP when you choose 320×448.

So, you will expect AP increase when going from 288×384 to 368x368.

For the blog Hands-on: Optimizing and benchmarking Body Pose Estimation models | by Debmalya Biswas | Sep, 2021 | Towards Data Science, its result is based on 368x368.

More, nv blog is using vgg while Hands-on: Optimizing and benchmarking Body Pose Estimation models | by Debmalya Biswas | Sep, 2021 | Towards Data Science is using resnet50 as backbone.

You can try to train according to Training and Optimizing a 2D Pose Estimation Model with NVIDIA TAO Toolkit, Part 1 | NVIDIA Developer Blog or run evaluation with NVIDIA NGC

1 Like

Thank you for a really fast reply. We really appreciate it.

Our team will analyze this and we will post here our results when we do a retraining and bench marking.

1 Like