Our team from Darwin Edge has been evaluating the Nvidia 2D Body Pose Estimation model as part of our product development.
It was recently released as part of the TLT 3.0 (Now TAO)
Our initial bench marking results (published in Towards Data Science - article) seem to suggest that the model we were using before: OpenPifPaf - seems to provide better performance than the Nvidia Body Pose Net model.
We would really appreciate all the recommendations and help on how to improve the model. We are quite keen to use it on Nvidia NX devices.
After retraining the pruned model with pth 0.2, you can observe an accuracy of 57.5% AP with multiscale inference. Here are the metrics on COCO validation set:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.575
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.789
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.621
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.563
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.603
And also, it said that,
You can expect to see a 7-10% AP increase in the area=medium category when going from 224×320 to 288×384 and an additional 7-10% AP when you choose 320×448.
So, you will expect AP increase when going from 288×384 to 368x368.